Connect with us

AI

Using speaker diarization for streaming transcription with Amazon Transcribe and Amazon Transcribe Medical

Conversational audio data that requires transcription, such as phone calls, doctor visits, and online meetings, often has multiple speakers. In these use cases, it’s important to accurately label the speaker and associate them to the audio content delivered. For example, you can distinguish between a doctor’s questions and a patient’s responses in the transcription of […]

Published

on

Conversational audio data that requires transcription, such as phone calls, doctor visits, and online meetings, often has multiple speakers. In these use cases, it’s important to accurately label the speaker and associate them to the audio content delivered. For example, you can distinguish between a doctor’s questions and a patient’s responses in the transcription of a live medical consultation.

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to applications. With the launch of speaker diarization for streaming transcriptions, you can use Amazon Transcribe and Amazon Transcribe Medical to label the different speakers in real-time customer service calls, conference calls, live broadcasts, or clinical visits. Speaker diarziation or speaker labeling is critical to creating accurate transcription because of its ability to distinguish what each speaker said. This is typically represented by speaker A and speaker B. Speaker identification usually refers to when the speakers are specifically identified as Sally or Alfonso. With speaker diarization, you can request Amazon Transcribe and Amazon Transcribe Medical to accurately label up to five speakers in an audio stream. Although Amazon Transcribe can label more than five speakers in a stream, the accuracy of speaker diarization decreases if you exceed that number. In some cases, the different speakers may be on different channels (e.g. Call Center). In those cases you can use Amazon Transcribe Channel Identification to separate multiple channels from within a live audio stream to generate transcripts that label each audio channel

This post uses an example application to show you how to use the AWS SDK for Java to start a stream that enables you to stream your conversational audio from your microphone to Amazon Transcribe, and receive transcripts in real time with speaker labeling. The solution is a Java application that you can use to transcribe streaming audio from multiple speakers in real time. The application labels each speaker in the transcription results, which can be exported.

You can find the application in the GitHub repo. We include detailed steps to set up and run the application in this post.

Prerequisites

You need an AWS account to proceed with the solution. Additionally, the AmazonTranscribeFullAccess policy is attached to the AWS Identity and Access Management (IAM) role you use for this demo. To create an IAM role with the necessary permissions, complete the following steps:

  1. Sign in to the AWS Management Console and open the IAM console.
  2. On the navigation pane, under Access management, choose Roles.
  3. You can use an existing IAM role to create and run transcription jobs, or choose Create role.
  4. Under Common use cases, choose EC2. You can select any use case, but EC2 is one of the most straightforward ones.
  5. Choose Next: Permissions.
  6. For the policy name, enter AmazonTranscribeFullAccess.
  7. Choose Next: Tags.
  8. Choose Next: Review.
  9. For Role name, enter a role name.
  10. Remove the text under Role description.
  11. Choose Create role.
  12. Choose the role you created.
  13. Choose Trust relationships.
  14. Choose Edit trust relationship.
  15. Replace the trust policy text in your role with the following code:
{"Version": "2012-10-17", "Statement": [ {"Effect": "Allow", "Principal": {"Service": "transcribe.amazonaws.com" }, "Action": "sts:AssumeRole" } ]
} 

Solution overview

Amazon Transcribe streaming transcription enables you to send a live audio stream to Amazon Transcribe and receive a stream of text in real time. You can label different speakers in either HTTP/2 or Websocket streams. Speaker diarization works best for labeling between two and five speakers. Although Amazon Transcribe can label more than five speakers in a stream, the accuracy of speaker separation decreases if you exceed five speakers.

To start an HTTP/2 stream, we specify the ShowSpeakerLabel request parameter of the StartStreamTranscription operation in our demo solution. See the following code:

 private StartStreamTranscriptionRequest getRequest(Integer mediaSampleRateHertz) { return StartStreamTranscriptionRequest.builder() .languageCode(LanguageCode.EN_US.toString()) .mediaEncoding(MediaEncoding.PCM) .mediaSampleRateHertz(mediaSampleRateHertz) .showSpeakerLabel(true) .build(); }

Amazon Transcribe streaming returns a “result” object as part of the transcription response element that can be used to label the speakers in the transcript. To learn more about the parameters in this result object, see Response Syntax.

"TranscriptEvent": { "Transcript": { "Results": [ { "Alternatives": [ { "Items": [ { "Content": "string", "EndTime": number, "Speaker": "string", "StartTime": number, "Type": "string", "VocabularyFilterMatch": boolean } ], "Transcript": "string" } ], "EndTime": number, "IsPartial": boolean, "ResultId": "string", "StartTime": number } ] } }

Our solution demonstrates speaker diarization during transcription for real-time audio captured via the microphone. Amazon Transcribe breaks your incoming audio stream based on natural speech segments, such as a change in speaker or a pause in the audio. The transcription is returned progressively to your application, with each response containing more transcribed speech until the entire segment is transcribed. For more information, see Identifying Speakers.

Launching the application

Complete the following prerequisites to launch the Java application. If you already have JavaFX or Java and Maven installed, you can skip the first two sections (Installing JavaFX and Installing Maven). For all environment variables mentioned in the following steps, a good option is to add it to the ~/.bashrc file and apply these variables as required by typing “source ~/.bashrc” after you open a shell.

Installing JDK

As your first step, download and install Java SE. When the installation is complete, set the JAVA_HOME variable (see the following code). Make sure to select the path to the correct Java version and confirm the path is valid.

export JAVA_HOME=path-to-your-install-dir/jdk-14.0.2.jdk/Contents/Home

Installing JavaFX

For instructions on downloading and installing JavaFX, see Getting Started with JavaFX. Set up the environment variable as described in the instructions or by entering for following code (replace path/to with the directory where you installed JavaFX):

export PATH_TO_FX='path/to/javafx-sdk-14/lib'

Test your JavaFX installation as shown in the sample application on GitHub.

Installing Maven

Download the latest version of Apache Maven. For installation instructions, see Installing Apache Maven.

Installing the AWS CLI (Optional)

As an optional step, you can install the AWS Command Line Interface (AWS CLI). For instructions, see Installing, updating, and uninstalling the AWS CLI version 2. You can use the AWS CLI to validate and troubleshoot the solution as needed.

Setting up AWS access

Lastly, set up your access key and secret access key required for programmatic access to AWS. For instructions, see Programmatic access. Choose a Region closest to your location. For more information, see the Amazon Transcribe Streaming section in Service Endpoints.

When you know the Region and access keys, open a terminal window in your computer and assign them to environment variables for access within our solution:

  • export AWS_ACCESS_KEY_ID=<access-key>
  • export AWS_SECRET_ACCESS_KEY=<secret-access-key>
  • export AWS_REGION=<aws region>

Solution demonstration

The following video demonstrates how you can compile and run the Java application presented in this post. Use the following sections to walk through these steps yourself.

The quality of the transcription results depends on many factors. For example, the quality can be affected by artifacts such as background noise, speakers talking over each other, complex technical jargon, the volume disparity between speakers, and the audio recording devices you use. You can use a variety of capabilities provided by Amazon Transcribe to improve transcription quality. For example, you can use custom vocabularies to recognize out-of-lexicon terms. You can even use custom language models, which enables you to use your own data to build domain-specific models. For more information, see Improving Domain-Specific Transcription Accuracy with Custom Language Models.

Setting up the solution

To implement the solution, complete the following steps:

  1. Clone the solution’s GitHub repo in your local computer using the following command:
git clone https://github.com/aws-samples/aws-transcribe-speaker-identification-java

  1. Navigate to the main directory of the solution aws-transcribe-streaming-example-java with the following code:
cd aws-transcribe-streaming-example-java

  1. Compile the source code and build a package for running our solution:
    1. Enter mvn compile. If the compile is successful, you should a BUILD SUCCESS message. If there are errors in compilation, it’s most likely related to JavaFX path issues. Fix the issues based on the instructions in the Installing JavaFX section in this post.
    2. Enter mvn clean package. You should see a BUILD SUCCESS message if everything went well. This command compiles the source files and creates a packaged JAR file that we use to run our solution. If you’re repeating the build exercise, you don’t need to enter mvn compile every time.
  2. Run the solution by entering the following code:
--module-path $PATH_TO_FX --add-modules javafx.controls -jar target/aws-transcribe-sample-application-1.0-SNAPSHOT-jar-with-dependencies.jar

If you receive an error, it’s likely because you already had a version of Java or JavaFX and Maven installed and skipped the steps to install JDK and JavaFX in this post. In so, enter the following code:

java -jar target/aws-transcribe-sample-application-1.0-SNAPSHOT-jar-with-dependencies.jar

You should see a Java UI window open.

Running the demo solution

Follow the steps in this section to run the demo yourself. You need two to five speakers present to try out the speaker diarization functionality. This application requires that all speakers use the same audio input when speaking.

  1. Choose Start Microphone Transcription in the Java UI application.
  2. Use your computer’s microphone to stream audio of two or more people (not more than five) conversing.
  3. As of this writing, Amazon Transcribe speaker labeling supports real-time streams that are in US English

You should see the speaker designations and the corresponding transcript appearing in the In-Progress Transcriptions window as the conversation progresses. When the transcript is complete, it should appear in the Final Transcription window.

  1. Choose Save Full Transcript to store the transcript locally in your computer.

Conclusion

This post demonstrated how you can easily infuse your applications with real-time ASR capabilities using Amazon Transcribe streaming and showcased an important new feature that enables speaker diarization in real-time audio streams.

With Amazon Transcribe and Amazon Transcribe Medical, you can use speaker separation to generate real-time insights from your conversations such as in-clinic visits or customer service calls and send these to downstream applications for natural language processing, or you can send it to human loops for review using Amazon Augmented AI (Amazon A2I). For more information, see Improving speech-to-text transcripts from Amazon Transcribe using custom vocabularies and Amazon Augmented AI.


About the Authors

Prem Ranga is an Enterprise Solutions Architect based out of Houston, Texas. He is part of the Machine Learning Technical Field Community and loves working with customers on their ML and AI journey. Prem is passionate about robotics, is an Autonomous Vehicles researcher, and also built the Alexa-controlled Beer Pours in Houston and other locations.

Talia Chopra is a Technical Writer in AWS specializing in machine learning and artificial intelligence. She works with multiple teams in AWS to create technical documentation and tutorials for customers using Amazon SageMaker, MxNet, and AutoGluon. In her free time, she enjoys meditating, studying machine learning, and taking walks in nature.

Parsa Shahbodaghi is a Technical Writer in AWS specializing in machine learning and artificial intelligence. He writes the technical documentation for Amazon Transcribe and Amazon Transcribe Medical. In his free time, he enjoys meditating, listening to audiobooks, weightlifting, and watching stand-up comedy. He will never be a stand-up comedian, but at least his mom thinks he’s funny.

Mahendar Gajula is a Sr. Data Architect at AWS. He works with AWS customers in their journey to the cloud with a focus on data lake, data warehouse, and AI/ML projects. In his spare time, he enjoys playing tennis and spending time with his family.

Source: https://aws.amazon.com/blogs/machine-learning/using-speaker-diarization-for-streaming-transcription-with-amazon-transcribe-and-amazon-transcribe-medical/

AI

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage. Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, […]

Published

on

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage.

Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, which enables you to search and explore Hungarian cultural heritage, including 600,000 faces over 500,000 images. For example, you can find historical works by author Mór Jókai or photos on topics like weddings. The Arcanum team chose Amazon Rekognition to free valuable staff from time and cost-intensive manual labeling, and improved label accuracy to make 200,000 previously unsearchable images (approximately 40% of image inventory), available to users.

Amazon Rekognition makes it easy to add image and video analysis to your applications using highly scalable machine learning (ML) technology that requires no previous ML expertise to use. Amazon Rekognition also provides highly accurate facial recognition and facial search capabilities to detect, analyze, and compare faces.

Arcanum uses this facial recognition feature in their image database services to help you find particular people in Arcanum’s articles. This post discusses their challenges and why they chose Amazon Rekognition as their solution.

Automated image labeling challenges

Arcanum dedicated a team of three people to start tagging and labeling content for Hungaricana. The team quickly learned that they would need to invest more than 3 months of time-consuming and repetitive human labor to provide accurate search capabilities to their customers. Considering the size of the team and scope of the existing project, Arcanum needed a better solution that would automate image and object labelling at scale.

Automated image labeling solutions

To speed up and automate image labeling, Arcanum turned to Amazon Rekognition to enable users to search photos by keywords (for example, type of historic event, place name, or a person relevant to Hungarian history).

For the Hungaricana project, preprocessing all the images was challenging. Arcanum ran a TensorFlow face search across all 28 million pages on a machine with 8 GPUs in their own offices to extract only faces from images.

The following screenshot shows what an extract looks like (image provided by Arcanum Database Ltd).

The images containing only faces are sent to Amazon Rekognition, invoking the IndexFaces operation to add a face to the collection. For each face that is detected in the specified face collection, Amazon Rekognition extracts facial features into a feature vector and stores it in an Amazon Aurora database. Amazon Rekognition uses feature vectors when it performs face match and search operations using the SearchFaces and SearchFacesByImage operations.

The image preprocessing helped create a very efficient and cost-effective way to index faces. The following diagram summarizes the preprocessing workflow.

As for the web application, the workflow starts with a Hungaricana user making a face search request. The following diagram illustrates the application workflow.

The workflow includes the following steps:

  1. The user requests a facial match by uploading the image. The web request is automatically distributed by the Elastic Load Balancer to the webserver fleet.
  2. Amazon Elastic Compute Cloud (Amazon EC2) powers application servers that handle the user request.
  3. The uploaded image is stored in Amazon Simple Storage Service (Amazon S3).
  4. Amazon Rekognition indexes the face and runs SearchFaces to look for a face similar to the new face ID.
  5. The output of the search face by image operation is stored in Amazon ElastiCache, a fully managed in-memory data store.
  6. The metadata of the indexed faces are stored in an Aurora relational database built for the cloud.
  7. The resulting face thumbnails are served to the customer via the fast content-delivery network (CDN) service Amazon CloudFront.

Experimenting and live testing Hungaricana

During our test of Hungaricana, the application performed extremely well. The searches not only correctly identified people, but also provided links to all publications and sources in Arcanum’s privately owned database where found faces are present. For example, the following screenshot shows the result of the famous composer and pianist Franz Liszt.

The application provided 42 pages of 6×4 results. The results are capped to 1,000. The 100% scores are the confidence scores returned by Amazon Rekognition and are rounded up to whole numbers.

The application of Hungaricana has always promptly, and with a high degree of certainty, presented results and links to all corresponding publications.

Business results

By introducing Amazon Rekognition into their workflow, Arcanum enabled a better customer experience, including building family trees, searching for historical figures, and researching historical places and events.

The concept of face searching using artificial intelligence certainly isn’t new. But Hungaricana uses it in a very creative, unique way.

Amazon Rekognition allowed Arcanum to realize three distinct advantages:

  • Time savings – The time to market speed increased dramatically. Now, instead of spending several months of intense manual labor to label all the images, the company can do this job in a few days. Before, basic labeling on 150,000 images took months for three people to complete.
  • Cost savings – Arcanum saved around $15,000 on the Hungaricana project. Before using Amazon Rekognition, there was no automation, so a human workforce had to scan all the images. Now, employees can shift their focus to other high-value tasks.
  • Improved accuracy – Users now have a much better experience regarding hit rates. Since Arcanum started using Amazon Rekognition, the number of hits has doubled. Before, out of 500,000 images, about 200,000 weren’t searchable. But with Amazon Rekognition, search is now possible for all 500,000 images.

 “Amazon Rekognition made Hungarian culture, history, and heritage more accessible to the world,” says Előd Biszak, Arcanum CEO. “It has made research a lot easier for customers building family trees, searching for historical figures, and researching historical places and events. We cannot wait to see what the future of artificial intelligence has to offer to enrich our content further.”

Conclusion

In this post, you learned how to add highly scalable face and image analysis to an enterprise-level image gallery to improve label accuracy, reduce costs, and save time.

You can test Amazon Rekognition features such as facial analysis, face comparison, or celebrity recognition on images specific to your use case on the Amazon Rekognition console.

For video presentations and tutorials, see Getting Started with Amazon Rekognition. For more information about Amazon Rekognition, see Amazon Rekognition Documentation.


About the Authors

Siniša Mikašinović is a Senior Solutions Architect at AWS Luxembourg, covering Central and Eastern Europe—a region full of opportunities, talented and innovative developers, ISVs, and startups. He helps customers adopt AWS services as well as acquire new skills, learn best practices, and succeed globally with the power of AWS. His areas of expertise are Game Tech and Microsoft on AWS. Siniša is a PowerShell enthusiast, a gamer, and a father of a small and very loud boy. He flies under the flags of Croatia and Serbia.

Cameron Peron is Senior Marketing Manager for AWS Amazon Rekognition and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport, spending time with his family and friends, and is an avid fan of Euro-league basketball.

Source: https://aws.amazon.com/blogs/machine-learning/arcanum-makes-hungarian-heritage-accessible-with-amazon-rekognition/

Continue Reading

AI

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage. Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, […]

Published

on

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage.

Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, which enables you to search and explore Hungarian cultural heritage, including 600,000 faces over 500,000 images. For example, you can find historical works by author Mór Jókai or photos on topics like weddings. The Arcanum team chose Amazon Rekognition to free valuable staff from time and cost-intensive manual labeling, and improved label accuracy to make 200,000 previously unsearchable images (approximately 40% of image inventory), available to users.

Amazon Rekognition makes it easy to add image and video analysis to your applications using highly scalable machine learning (ML) technology that requires no previous ML expertise to use. Amazon Rekognition also provides highly accurate facial recognition and facial search capabilities to detect, analyze, and compare faces.

Arcanum uses this facial recognition feature in their image database services to help you find particular people in Arcanum’s articles. This post discusses their challenges and why they chose Amazon Rekognition as their solution.

Automated image labeling challenges

Arcanum dedicated a team of three people to start tagging and labeling content for Hungaricana. The team quickly learned that they would need to invest more than 3 months of time-consuming and repetitive human labor to provide accurate search capabilities to their customers. Considering the size of the team and scope of the existing project, Arcanum needed a better solution that would automate image and object labelling at scale.

Automated image labeling solutions

To speed up and automate image labeling, Arcanum turned to Amazon Rekognition to enable users to search photos by keywords (for example, type of historic event, place name, or a person relevant to Hungarian history).

For the Hungaricana project, preprocessing all the images was challenging. Arcanum ran a TensorFlow face search across all 28 million pages on a machine with 8 GPUs in their own offices to extract only faces from images.

The following screenshot shows what an extract looks like (image provided by Arcanum Database Ltd).

The images containing only faces are sent to Amazon Rekognition, invoking the IndexFaces operation to add a face to the collection. For each face that is detected in the specified face collection, Amazon Rekognition extracts facial features into a feature vector and stores it in an Amazon Aurora database. Amazon Rekognition uses feature vectors when it performs face match and search operations using the SearchFaces and SearchFacesByImage operations.

The image preprocessing helped create a very efficient and cost-effective way to index faces. The following diagram summarizes the preprocessing workflow.

As for the web application, the workflow starts with a Hungaricana user making a face search request. The following diagram illustrates the application workflow.

The workflow includes the following steps:

  1. The user requests a facial match by uploading the image. The web request is automatically distributed by the Elastic Load Balancer to the webserver fleet.
  2. Amazon Elastic Compute Cloud (Amazon EC2) powers application servers that handle the user request.
  3. The uploaded image is stored in Amazon Simple Storage Service (Amazon S3).
  4. Amazon Rekognition indexes the face and runs SearchFaces to look for a face similar to the new face ID.
  5. The output of the search face by image operation is stored in Amazon ElastiCache, a fully managed in-memory data store.
  6. The metadata of the indexed faces are stored in an Aurora relational database built for the cloud.
  7. The resulting face thumbnails are served to the customer via the fast content-delivery network (CDN) service Amazon CloudFront.

Experimenting and live testing Hungaricana

During our test of Hungaricana, the application performed extremely well. The searches not only correctly identified people, but also provided links to all publications and sources in Arcanum’s privately owned database where found faces are present. For example, the following screenshot shows the result of the famous composer and pianist Franz Liszt.

The application provided 42 pages of 6×4 results. The results are capped to 1,000. The 100% scores are the confidence scores returned by Amazon Rekognition and are rounded up to whole numbers.

The application of Hungaricana has always promptly, and with a high degree of certainty, presented results and links to all corresponding publications.

Business results

By introducing Amazon Rekognition into their workflow, Arcanum enabled a better customer experience, including building family trees, searching for historical figures, and researching historical places and events.

The concept of face searching using artificial intelligence certainly isn’t new. But Hungaricana uses it in a very creative, unique way.

Amazon Rekognition allowed Arcanum to realize three distinct advantages:

  • Time savings – The time to market speed increased dramatically. Now, instead of spending several months of intense manual labor to label all the images, the company can do this job in a few days. Before, basic labeling on 150,000 images took months for three people to complete.
  • Cost savings – Arcanum saved around $15,000 on the Hungaricana project. Before using Amazon Rekognition, there was no automation, so a human workforce had to scan all the images. Now, employees can shift their focus to other high-value tasks.
  • Improved accuracy – Users now have a much better experience regarding hit rates. Since Arcanum started using Amazon Rekognition, the number of hits has doubled. Before, out of 500,000 images, about 200,000 weren’t searchable. But with Amazon Rekognition, search is now possible for all 500,000 images.

 “Amazon Rekognition made Hungarian culture, history, and heritage more accessible to the world,” says Előd Biszak, Arcanum CEO. “It has made research a lot easier for customers building family trees, searching for historical figures, and researching historical places and events. We cannot wait to see what the future of artificial intelligence has to offer to enrich our content further.”

Conclusion

In this post, you learned how to add highly scalable face and image analysis to an enterprise-level image gallery to improve label accuracy, reduce costs, and save time.

You can test Amazon Rekognition features such as facial analysis, face comparison, or celebrity recognition on images specific to your use case on the Amazon Rekognition console.

For video presentations and tutorials, see Getting Started with Amazon Rekognition. For more information about Amazon Rekognition, see Amazon Rekognition Documentation.


About the Authors

Siniša Mikašinović is a Senior Solutions Architect at AWS Luxembourg, covering Central and Eastern Europe—a region full of opportunities, talented and innovative developers, ISVs, and startups. He helps customers adopt AWS services as well as acquire new skills, learn best practices, and succeed globally with the power of AWS. His areas of expertise are Game Tech and Microsoft on AWS. Siniša is a PowerShell enthusiast, a gamer, and a father of a small and very loud boy. He flies under the flags of Croatia and Serbia.

Cameron Peron is Senior Marketing Manager for AWS Amazon Rekognition and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport, spending time with his family and friends, and is an avid fan of Euro-league basketball.

Source: https://aws.amazon.com/blogs/machine-learning/arcanum-makes-hungarian-heritage-accessible-with-amazon-rekognition/

Continue Reading

AI

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage. Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, […]

Published

on

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage.

Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, which enables you to search and explore Hungarian cultural heritage, including 600,000 faces over 500,000 images. For example, you can find historical works by author Mór Jókai or photos on topics like weddings. The Arcanum team chose Amazon Rekognition to free valuable staff from time and cost-intensive manual labeling, and improved label accuracy to make 200,000 previously unsearchable images (approximately 40% of image inventory), available to users.

Amazon Rekognition makes it easy to add image and video analysis to your applications using highly scalable machine learning (ML) technology that requires no previous ML expertise to use. Amazon Rekognition also provides highly accurate facial recognition and facial search capabilities to detect, analyze, and compare faces.

Arcanum uses this facial recognition feature in their image database services to help you find particular people in Arcanum’s articles. This post discusses their challenges and why they chose Amazon Rekognition as their solution.

Automated image labeling challenges

Arcanum dedicated a team of three people to start tagging and labeling content for Hungaricana. The team quickly learned that they would need to invest more than 3 months of time-consuming and repetitive human labor to provide accurate search capabilities to their customers. Considering the size of the team and scope of the existing project, Arcanum needed a better solution that would automate image and object labelling at scale.

Automated image labeling solutions

To speed up and automate image labeling, Arcanum turned to Amazon Rekognition to enable users to search photos by keywords (for example, type of historic event, place name, or a person relevant to Hungarian history).

For the Hungaricana project, preprocessing all the images was challenging. Arcanum ran a TensorFlow face search across all 28 million pages on a machine with 8 GPUs in their own offices to extract only faces from images.

The following screenshot shows what an extract looks like (image provided by Arcanum Database Ltd).

The images containing only faces are sent to Amazon Rekognition, invoking the IndexFaces operation to add a face to the collection. For each face that is detected in the specified face collection, Amazon Rekognition extracts facial features into a feature vector and stores it in an Amazon Aurora database. Amazon Rekognition uses feature vectors when it performs face match and search operations using the SearchFaces and SearchFacesByImage operations.

The image preprocessing helped create a very efficient and cost-effective way to index faces. The following diagram summarizes the preprocessing workflow.

As for the web application, the workflow starts with a Hungaricana user making a face search request. The following diagram illustrates the application workflow.

The workflow includes the following steps:

  1. The user requests a facial match by uploading the image. The web request is automatically distributed by the Elastic Load Balancer to the webserver fleet.
  2. Amazon Elastic Compute Cloud (Amazon EC2) powers application servers that handle the user request.
  3. The uploaded image is stored in Amazon Simple Storage Service (Amazon S3).
  4. Amazon Rekognition indexes the face and runs SearchFaces to look for a face similar to the new face ID.
  5. The output of the search face by image operation is stored in Amazon ElastiCache, a fully managed in-memory data store.
  6. The metadata of the indexed faces are stored in an Aurora relational database built for the cloud.
  7. The resulting face thumbnails are served to the customer via the fast content-delivery network (CDN) service Amazon CloudFront.

Experimenting and live testing Hungaricana

During our test of Hungaricana, the application performed extremely well. The searches not only correctly identified people, but also provided links to all publications and sources in Arcanum’s privately owned database where found faces are present. For example, the following screenshot shows the result of the famous composer and pianist Franz Liszt.

The application provided 42 pages of 6×4 results. The results are capped to 1,000. The 100% scores are the confidence scores returned by Amazon Rekognition and are rounded up to whole numbers.

The application of Hungaricana has always promptly, and with a high degree of certainty, presented results and links to all corresponding publications.

Business results

By introducing Amazon Rekognition into their workflow, Arcanum enabled a better customer experience, including building family trees, searching for historical figures, and researching historical places and events.

The concept of face searching using artificial intelligence certainly isn’t new. But Hungaricana uses it in a very creative, unique way.

Amazon Rekognition allowed Arcanum to realize three distinct advantages:

  • Time savings – The time to market speed increased dramatically. Now, instead of spending several months of intense manual labor to label all the images, the company can do this job in a few days. Before, basic labeling on 150,000 images took months for three people to complete.
  • Cost savings – Arcanum saved around $15,000 on the Hungaricana project. Before using Amazon Rekognition, there was no automation, so a human workforce had to scan all the images. Now, employees can shift their focus to other high-value tasks.
  • Improved accuracy – Users now have a much better experience regarding hit rates. Since Arcanum started using Amazon Rekognition, the number of hits has doubled. Before, out of 500,000 images, about 200,000 weren’t searchable. But with Amazon Rekognition, search is now possible for all 500,000 images.

 “Amazon Rekognition made Hungarian culture, history, and heritage more accessible to the world,” says Előd Biszak, Arcanum CEO. “It has made research a lot easier for customers building family trees, searching for historical figures, and researching historical places and events. We cannot wait to see what the future of artificial intelligence has to offer to enrich our content further.”

Conclusion

In this post, you learned how to add highly scalable face and image analysis to an enterprise-level image gallery to improve label accuracy, reduce costs, and save time.

You can test Amazon Rekognition features such as facial analysis, face comparison, or celebrity recognition on images specific to your use case on the Amazon Rekognition console.

For video presentations and tutorials, see Getting Started with Amazon Rekognition. For more information about Amazon Rekognition, see Amazon Rekognition Documentation.


About the Authors

Siniša Mikašinović is a Senior Solutions Architect at AWS Luxembourg, covering Central and Eastern Europe—a region full of opportunities, talented and innovative developers, ISVs, and startups. He helps customers adopt AWS services as well as acquire new skills, learn best practices, and succeed globally with the power of AWS. His areas of expertise are Game Tech and Microsoft on AWS. Siniša is a PowerShell enthusiast, a gamer, and a father of a small and very loud boy. He flies under the flags of Croatia and Serbia.

Cameron Peron is Senior Marketing Manager for AWS Amazon Rekognition and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport, spending time with his family and friends, and is an avid fan of Euro-league basketball.

Source: https://aws.amazon.com/blogs/machine-learning/arcanum-makes-hungarian-heritage-accessible-with-amazon-rekognition/

Continue Reading
AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI13 hours ago

Pros and Cons of using cloud platforms for building chatbots

AI14 hours ago

From Knowledge Databases To Knowledge Graphs And Conversational AI

AI14 hours ago

Model selection with cross-validation: A quest for an elite model

AI15 hours ago

Celebrating 10 Years of Innovation, Excellence, and Trust

AI1 day ago

Executive Interview: Brian Gattoni, CTO, Cybersecurity & Infrastructure Security Agency 

AI1 day ago

Making Use Of AI Ethics Tuning Knobs In AI Autonomous Cars 

AI1 day ago

Application of AI to IT Service Ops by IBM and ServiceNow Exemplifies a Trend 

AI1 day ago

Testing Finds Automated Driver Assistance Systems to be Unreliable 

Trending