Connect with us

AI

Deploying your own data processing code in an Amazon SageMaker Autopilot inference pipeline

The machine learning (ML) model-building process requires data scientists to manually prepare data features, select an appropriate algorithm, and optimize its model parameters. It involves a lot of effort and expertise. Amazon SageMaker Autopilot removes the heavy lifting required by this ML process. It inspects your dataset, generates several ML pipelines, and compares their performance […]

Published

on

The machine learning (ML) model-building process requires data scientists to manually prepare data features, select an appropriate algorithm, and optimize its model parameters. It involves a lot of effort and expertise. Amazon SageMaker Autopilot removes the heavy lifting required by this ML process. It inspects your dataset, generates several ML pipelines, and compares their performance to produce a leaderboard of candidate pipelines. Each candidate pipeline is a combination of data preprocessing steps, an ML algorithm, and its optimized hyperparameters. You can easily deploy any of these candidate pipelines to use for real-time prediction or batch prediction.

But what if you want to preprocess the data before invoking Amazon SageMaker Autopilot? For example, you might have a dataset with several features and need customized feature selection to remove irrelevant variables before using it to train a model in an Autopilot job. Then you need to incorporate your custom processing code into the pipeline when deploying it to a real-time endpoint or for batch processing. This post shows you how to customize an Autopilot inference pipeline with your own data processing code. The code from this post is available in the GitHub repo.

Solution overview

The solution of customizing a pipeline that combines custom feature selection with Autopilot models includes the following steps:

  1. Prepare a dataset with 100 features as the example dataset for this post and upload it to Amazon Simple Storage Service (Amazon S3).
  2. Train the feature selection model and prepare the dataset using sagemaker-scikit-learn-container to feed to Autopilot.
  3. Configure and launch the Autopilot job.
  4. Create an inference pipeline that combines feature selection with the Autopilot models.
  5. Make predictions with the inference pipeline.

The following diagram outlines the architecture of this workflow.

Preparing and uploading the dataset

First, we generate a regression dataset using sklearn.datasets.make_regression. Set the number of features to 100. Five of these features are informative. The 100 variable names are indexed as x_i and the name of the target variable is y:

X, y = make_regression(n_features = 100, n_samples = 1500, n_informative = 5, random_state=0)
df_X = pd.DataFrame(X).rename(columns=lambda x: 'x_'+ str(x))
df_y = pd.DataFrame(y).rename(columns=lambda x: 'y')
df = pd.concat([df_X, df_y], axis=1)

The following screenshot shows the data generated. You upload this dataset to Amazon S3 to use in later steps.

Training the feature selection model and preparing the dataset

Feature selection is the process of selecting a subset of the most relevant features on which to train an ML model. This simplification shortens training time and reduces the chance of overfitting. The sklearn.feature_selection module contains several feature selection algorithms. For this post, we use the following:

  • feature_selection.RFE – The recursive feature elimination (RFE) algorithm selects features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained. Then, the least important features are pruned from the current set of features. We use Epsilon-Support Vector Regression (sklearn.svm.SVR) as our learning estimator for RFE.
  • feature_selection.SelectKBest – The SelectKBest algorithm selects the k features that have the highest scores of a specified metric. We use mutual information and f regression as the score functions—both methods measure the dependency between variables. For more information about f regression and mutual information, see Feature Selection.

We stack these three feature selection algorithms into one sklearn.pipeline.Pipeline. RFE by default eliminates 50% of the total features. We use SelectKBest to select the top 30 features using the f_regression method and reduce the number of features to 10 using the mutual_info_regression method. Note that the feature selection algorithms used here are for demonstration purposes only. You can update the script to incorporate feature selection algorithm of your choice.

We also create a Python script for feature selection. In the following code example, we build a sklearn Pipeline object that implements the method we described:

'''Feature selection pipeline'''
feature_selection_pipe = pipe = Pipeline([ ('svr', RFE(SVR(kernel="linear"))), ('f_reg',SelectKBest(f_regression, k=30)), ('mut_info',SelectKBest(mutual_info_regression, k=10)) ])
feature_selection_pipe.fit(X_train,y_train)

To provide visibility on which features are selected, we use the following script to generate and save the names of selected features as a list:

 '''Save selected feature names''' feature_names = concat_data.columns[:-1] feature_names = feature_names[pipe.named_steps['svr'].get_support()] feature_names = feature_names[pipe.named_steps['f_reg'].get_support()] feature_names = feature_names[pipe.named_steps['mut_info'].get_support()]

We use the Amazon SageMaker SKLearn Estimator with a feature selection script as an entry point. The script is very similar to a training script you might run outside of Amazon SageMaker, but you can access useful properties about the training environment through various environment variables, such as SM_MODEL_DIR, which represents the path to the directory inside the container to write model artifacts to. These artifacts are uploaded to the Amazon S3 output path by the Amazon SageMaker training job. After training is complete, we save model artifacts and selected column names for use during inference to SM_MODEL_DIR. See the following code:

 joblib.dump(feature_selection_pipe, os.path.join(args.model_dir, "model.joblib")) ... joblib.dump(feature_selection_pipe, os.path.join(args.model_dir, "selected_feature_names.joblib"))

Although we use feature selection algorithms in this post, you can customize and add additional data preprocessing code, such as code for data imputation or other forms of data cleaning, to this entry point script.

Now that our feature selection model is properly fitted, we transform the raw input data to the training dataset with selected features. To use Amazon SageMaker batch transform to directly process the raw data and store back to Amazon S3, enter the following code:

# Define a SKLearn Transformer from the trained SKLearn Estimator
transformer_output = os.path.join('s3://',bucket, prefix, 'Feature_selection_output/')
transformer = sklearn_preprocessor.transformer( instance_count=1, instance_type='ml.m4.xlarge', output_path = transformer_output, assemble_with = 'Line', accept = 'text/csv') transformer.transform(train_input, content_type='text/csv') 

The notebook contains an additional step that adds the selected column names as headers to the generated CSV data files.

Configuring and launching the Autopilot job

The output from batch transform is the new training dataset for Autopilot. The new dataset has 10 features. To use Autopilot, we simply provide our new dataset and choose the target column to be y. Autopilot automatically inspects our dataset and runs several candidates to determine the optimal combination of data preprocessing steps, ML algorithms, and hyperparameters. Before launching the Autopilot job, we define the job input configuration, output configuration, and stopping criteria:

input_data_config = [{ 'DataSource': { 'S3DataSource': { 'S3DataType': 'S3Prefix', 'S3Uri': 's3://{}/{}/training_data_new'.format(bucket,prefix) } }, 'TargetAttributeName': 'y' } ] output_data_config = { 'S3OutputPath': 's3://{}/{}/autopilot_job_output'.format(bucket,prefix) } AutoML_Job_Config = { 'CompletionCriteria': { 'MaxCandidates': 50, 'MaxAutoMLJobRuntimeInSeconds': 1800 } }

Then we call the create_auto_ml_job API to launch the Autopilot job:

sm = boto3.Session().client(service_name='sagemaker',region_name=region)
timestamp_suffix = strftime('%d-%H-%M-%S', gmtime()) auto_ml_job_name = 'automl-blog' + timestamp_suffix
print('AutoMLJobName: ' + auto_ml_job_name) sm.create_auto_ml_job(AutoMLJobName=auto_ml_job_name, InputDataConfig=input_data_config, OutputDataConfig=output_data_config, AutoMLJobConfig = AutoML_Job_Config, RoleArn=role)

Creating an inference pipeline that combines feature selection with Autopilot models

So far, we have created a model that takes raw data with 100 features and selects the 10 most relevant features. We also used Autopilot to create data processing and ML models to predict y. We now combine the feature selection model with Autopilot models to create an inference pipeline. After defining the models and assigning names, we create a PipelineModel that points to our preprocessing and prediction models. The pipeline.py file is available on GitHub. See the following code:

sklearn_image = sklearn_preprocessor.image_name
container_1_source = os.path.join("s3://", sagemaker_session.default_bucket(), sklearn_preprocessor.latest_training_job.job_name, "sourcedir.tar.gz" )
inference_containers = [ { 'Image': sklearn_image, 'ModelDataUrl': sklearn_preprocessor.model_data, 'Environment': { 'SAGEMAKER_SUBMIT_DIRECTORY':container_1_source, 'SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT': "text/csv", 'SAGEMAKER_PROGRAM':'sklearn_feature_selection.py' } }] inference_containers.extend(best_candidate['InferenceContainers']) response = sagemaker.create_model( ModelName=pipeline_name, Containers=inference_containers, ExecutionRoleArn=role)

We then deploy the pipeline model to a single endpoint:

response = sagemaker.create_endpoint( EndpointName=pipeline_endpoint_name, EndpointConfigName=pipeline_endpoint_config_name, )

Making predictions with the inference pipeline

We can test our pipeline by sending data for prediction. The pipeline accepts raw data, transforms it using the feature selection model, and creates a prediction using the models Autopilot generated.

First, we define a payload variable that contains the data we want to send through the pipeline. We use the first five rows of the training data as our payload. Then we define a predictor using our pipeline endpoint, send the payload to the predictor, and print the model prediction:

from sagemaker.predictor import RealTimePredictor, csv_serializer
from sagemaker.content_types import CONTENT_TYPE_CSV
predictor = RealTimePredictor( endpoint=pipeline_endpoint_name, serializer=csv_serializer, sagemaker_session=sagemaker_session, content_type=CONTENT_TYPE_CSV, accept=CONTENT_TYPE_CSV) predictor.content_type = 'text/csv'
predictor.predict(test_data.to_csv(sep=',', header=True, index=False)).decode('utf-8')

Our Amazon SageMaker endpoint returns one prediction for each corresponding row of the data sent. See the following code:

'-102.248855591n-165.823532104n115.50453186n111.306632996n5.91651535034'

Deleting the endpoint

When we are finished with the endpoint, we delete it to save cost:

sm_client = sagemaker_session.boto_session.client('sagemaker')
sm_client.delete_endpoint(EndpointName=pipeline_endpoint_name)

Conclusions

In this post, we demonstrated how to customize an Autopilot inference pipeline with your own data processing code. We first trained a feature selection model and converted our raw data using the trained feature selection model. Then we launched an Amazon SageMaker Autopilot job that automatically trained and tuned the best ML models for our regression problem. We also built an inference pipeline that combined feature selection with the Autopilot models. Lastly, we made predictions with the inference pipeline. For more about Amazon SageMaker Pilot, please see Amazon SageMaker Autopilot.


About the Authors

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Piali Das is a Senior Software Engineer in the AWS SageMaker Autopilot team. She previously contributed to building SageMaker Algorithms. She enjoys scientific programming in general and has developed an interest in machine learning and distributed systems.

Source: https://aws.amazon.com/blogs/machine-learning/deploying-your-own-data-processing-code-in-an-amazon-sagemaker-autopilot-inference-pipeline/

AI

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage. Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, […]

Published

on

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage.

Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, which enables you to search and explore Hungarian cultural heritage, including 600,000 faces over 500,000 images. For example, you can find historical works by author Mór Jókai or photos on topics like weddings. The Arcanum team chose Amazon Rekognition to free valuable staff from time and cost-intensive manual labeling, and improved label accuracy to make 200,000 previously unsearchable images (approximately 40% of image inventory), available to users.

Amazon Rekognition makes it easy to add image and video analysis to your applications using highly scalable machine learning (ML) technology that requires no previous ML expertise to use. Amazon Rekognition also provides highly accurate facial recognition and facial search capabilities to detect, analyze, and compare faces.

Arcanum uses this facial recognition feature in their image database services to help you find particular people in Arcanum’s articles. This post discusses their challenges and why they chose Amazon Rekognition as their solution.

Automated image labeling challenges

Arcanum dedicated a team of three people to start tagging and labeling content for Hungaricana. The team quickly learned that they would need to invest more than 3 months of time-consuming and repetitive human labor to provide accurate search capabilities to their customers. Considering the size of the team and scope of the existing project, Arcanum needed a better solution that would automate image and object labelling at scale.

Automated image labeling solutions

To speed up and automate image labeling, Arcanum turned to Amazon Rekognition to enable users to search photos by keywords (for example, type of historic event, place name, or a person relevant to Hungarian history).

For the Hungaricana project, preprocessing all the images was challenging. Arcanum ran a TensorFlow face search across all 28 million pages on a machine with 8 GPUs in their own offices to extract only faces from images.

The following screenshot shows what an extract looks like (image provided by Arcanum Database Ltd).

The images containing only faces are sent to Amazon Rekognition, invoking the IndexFaces operation to add a face to the collection. For each face that is detected in the specified face collection, Amazon Rekognition extracts facial features into a feature vector and stores it in an Amazon Aurora database. Amazon Rekognition uses feature vectors when it performs face match and search operations using the SearchFaces and SearchFacesByImage operations.

The image preprocessing helped create a very efficient and cost-effective way to index faces. The following diagram summarizes the preprocessing workflow.

As for the web application, the workflow starts with a Hungaricana user making a face search request. The following diagram illustrates the application workflow.

The workflow includes the following steps:

  1. The user requests a facial match by uploading the image. The web request is automatically distributed by the Elastic Load Balancer to the webserver fleet.
  2. Amazon Elastic Compute Cloud (Amazon EC2) powers application servers that handle the user request.
  3. The uploaded image is stored in Amazon Simple Storage Service (Amazon S3).
  4. Amazon Rekognition indexes the face and runs SearchFaces to look for a face similar to the new face ID.
  5. The output of the search face by image operation is stored in Amazon ElastiCache, a fully managed in-memory data store.
  6. The metadata of the indexed faces are stored in an Aurora relational database built for the cloud.
  7. The resulting face thumbnails are served to the customer via the fast content-delivery network (CDN) service Amazon CloudFront.

Experimenting and live testing Hungaricana

During our test of Hungaricana, the application performed extremely well. The searches not only correctly identified people, but also provided links to all publications and sources in Arcanum’s privately owned database where found faces are present. For example, the following screenshot shows the result of the famous composer and pianist Franz Liszt.

The application provided 42 pages of 6×4 results. The results are capped to 1,000. The 100% scores are the confidence scores returned by Amazon Rekognition and are rounded up to whole numbers.

The application of Hungaricana has always promptly, and with a high degree of certainty, presented results and links to all corresponding publications.

Business results

By introducing Amazon Rekognition into their workflow, Arcanum enabled a better customer experience, including building family trees, searching for historical figures, and researching historical places and events.

The concept of face searching using artificial intelligence certainly isn’t new. But Hungaricana uses it in a very creative, unique way.

Amazon Rekognition allowed Arcanum to realize three distinct advantages:

  • Time savings – The time to market speed increased dramatically. Now, instead of spending several months of intense manual labor to label all the images, the company can do this job in a few days. Before, basic labeling on 150,000 images took months for three people to complete.
  • Cost savings – Arcanum saved around $15,000 on the Hungaricana project. Before using Amazon Rekognition, there was no automation, so a human workforce had to scan all the images. Now, employees can shift their focus to other high-value tasks.
  • Improved accuracy – Users now have a much better experience regarding hit rates. Since Arcanum started using Amazon Rekognition, the number of hits has doubled. Before, out of 500,000 images, about 200,000 weren’t searchable. But with Amazon Rekognition, search is now possible for all 500,000 images.

 “Amazon Rekognition made Hungarian culture, history, and heritage more accessible to the world,” says Előd Biszak, Arcanum CEO. “It has made research a lot easier for customers building family trees, searching for historical figures, and researching historical places and events. We cannot wait to see what the future of artificial intelligence has to offer to enrich our content further.”

Conclusion

In this post, you learned how to add highly scalable face and image analysis to an enterprise-level image gallery to improve label accuracy, reduce costs, and save time.

You can test Amazon Rekognition features such as facial analysis, face comparison, or celebrity recognition on images specific to your use case on the Amazon Rekognition console.

For video presentations and tutorials, see Getting Started with Amazon Rekognition. For more information about Amazon Rekognition, see Amazon Rekognition Documentation.


About the Authors

Siniša Mikašinović is a Senior Solutions Architect at AWS Luxembourg, covering Central and Eastern Europe—a region full of opportunities, talented and innovative developers, ISVs, and startups. He helps customers adopt AWS services as well as acquire new skills, learn best practices, and succeed globally with the power of AWS. His areas of expertise are Game Tech and Microsoft on AWS. Siniša is a PowerShell enthusiast, a gamer, and a father of a small and very loud boy. He flies under the flags of Croatia and Serbia.

Cameron Peron is Senior Marketing Manager for AWS Amazon Rekognition and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport, spending time with his family and friends, and is an avid fan of Euro-league basketball.

Source: https://aws.amazon.com/blogs/machine-learning/arcanum-makes-hungarian-heritage-accessible-with-amazon-rekognition/

Continue Reading

AI

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage. Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, […]

Published

on

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage.

Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, which enables you to search and explore Hungarian cultural heritage, including 600,000 faces over 500,000 images. For example, you can find historical works by author Mór Jókai or photos on topics like weddings. The Arcanum team chose Amazon Rekognition to free valuable staff from time and cost-intensive manual labeling, and improved label accuracy to make 200,000 previously unsearchable images (approximately 40% of image inventory), available to users.

Amazon Rekognition makes it easy to add image and video analysis to your applications using highly scalable machine learning (ML) technology that requires no previous ML expertise to use. Amazon Rekognition also provides highly accurate facial recognition and facial search capabilities to detect, analyze, and compare faces.

Arcanum uses this facial recognition feature in their image database services to help you find particular people in Arcanum’s articles. This post discusses their challenges and why they chose Amazon Rekognition as their solution.

Automated image labeling challenges

Arcanum dedicated a team of three people to start tagging and labeling content for Hungaricana. The team quickly learned that they would need to invest more than 3 months of time-consuming and repetitive human labor to provide accurate search capabilities to their customers. Considering the size of the team and scope of the existing project, Arcanum needed a better solution that would automate image and object labelling at scale.

Automated image labeling solutions

To speed up and automate image labeling, Arcanum turned to Amazon Rekognition to enable users to search photos by keywords (for example, type of historic event, place name, or a person relevant to Hungarian history).

For the Hungaricana project, preprocessing all the images was challenging. Arcanum ran a TensorFlow face search across all 28 million pages on a machine with 8 GPUs in their own offices to extract only faces from images.

The following screenshot shows what an extract looks like (image provided by Arcanum Database Ltd).

The images containing only faces are sent to Amazon Rekognition, invoking the IndexFaces operation to add a face to the collection. For each face that is detected in the specified face collection, Amazon Rekognition extracts facial features into a feature vector and stores it in an Amazon Aurora database. Amazon Rekognition uses feature vectors when it performs face match and search operations using the SearchFaces and SearchFacesByImage operations.

The image preprocessing helped create a very efficient and cost-effective way to index faces. The following diagram summarizes the preprocessing workflow.

As for the web application, the workflow starts with a Hungaricana user making a face search request. The following diagram illustrates the application workflow.

The workflow includes the following steps:

  1. The user requests a facial match by uploading the image. The web request is automatically distributed by the Elastic Load Balancer to the webserver fleet.
  2. Amazon Elastic Compute Cloud (Amazon EC2) powers application servers that handle the user request.
  3. The uploaded image is stored in Amazon Simple Storage Service (Amazon S3).
  4. Amazon Rekognition indexes the face and runs SearchFaces to look for a face similar to the new face ID.
  5. The output of the search face by image operation is stored in Amazon ElastiCache, a fully managed in-memory data store.
  6. The metadata of the indexed faces are stored in an Aurora relational database built for the cloud.
  7. The resulting face thumbnails are served to the customer via the fast content-delivery network (CDN) service Amazon CloudFront.

Experimenting and live testing Hungaricana

During our test of Hungaricana, the application performed extremely well. The searches not only correctly identified people, but also provided links to all publications and sources in Arcanum’s privately owned database where found faces are present. For example, the following screenshot shows the result of the famous composer and pianist Franz Liszt.

The application provided 42 pages of 6×4 results. The results are capped to 1,000. The 100% scores are the confidence scores returned by Amazon Rekognition and are rounded up to whole numbers.

The application of Hungaricana has always promptly, and with a high degree of certainty, presented results and links to all corresponding publications.

Business results

By introducing Amazon Rekognition into their workflow, Arcanum enabled a better customer experience, including building family trees, searching for historical figures, and researching historical places and events.

The concept of face searching using artificial intelligence certainly isn’t new. But Hungaricana uses it in a very creative, unique way.

Amazon Rekognition allowed Arcanum to realize three distinct advantages:

  • Time savings – The time to market speed increased dramatically. Now, instead of spending several months of intense manual labor to label all the images, the company can do this job in a few days. Before, basic labeling on 150,000 images took months for three people to complete.
  • Cost savings – Arcanum saved around $15,000 on the Hungaricana project. Before using Amazon Rekognition, there was no automation, so a human workforce had to scan all the images. Now, employees can shift their focus to other high-value tasks.
  • Improved accuracy – Users now have a much better experience regarding hit rates. Since Arcanum started using Amazon Rekognition, the number of hits has doubled. Before, out of 500,000 images, about 200,000 weren’t searchable. But with Amazon Rekognition, search is now possible for all 500,000 images.

 “Amazon Rekognition made Hungarian culture, history, and heritage more accessible to the world,” says Előd Biszak, Arcanum CEO. “It has made research a lot easier for customers building family trees, searching for historical figures, and researching historical places and events. We cannot wait to see what the future of artificial intelligence has to offer to enrich our content further.”

Conclusion

In this post, you learned how to add highly scalable face and image analysis to an enterprise-level image gallery to improve label accuracy, reduce costs, and save time.

You can test Amazon Rekognition features such as facial analysis, face comparison, or celebrity recognition on images specific to your use case on the Amazon Rekognition console.

For video presentations and tutorials, see Getting Started with Amazon Rekognition. For more information about Amazon Rekognition, see Amazon Rekognition Documentation.


About the Authors

Siniša Mikašinović is a Senior Solutions Architect at AWS Luxembourg, covering Central and Eastern Europe—a region full of opportunities, talented and innovative developers, ISVs, and startups. He helps customers adopt AWS services as well as acquire new skills, learn best practices, and succeed globally with the power of AWS. His areas of expertise are Game Tech and Microsoft on AWS. Siniša is a PowerShell enthusiast, a gamer, and a father of a small and very loud boy. He flies under the flags of Croatia and Serbia.

Cameron Peron is Senior Marketing Manager for AWS Amazon Rekognition and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport, spending time with his family and friends, and is an avid fan of Euro-league basketball.

Source: https://aws.amazon.com/blogs/machine-learning/arcanum-makes-hungarian-heritage-accessible-with-amazon-rekognition/

Continue Reading

AI

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage. Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, […]

Published

on

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage.

Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, which enables you to search and explore Hungarian cultural heritage, including 600,000 faces over 500,000 images. For example, you can find historical works by author Mór Jókai or photos on topics like weddings. The Arcanum team chose Amazon Rekognition to free valuable staff from time and cost-intensive manual labeling, and improved label accuracy to make 200,000 previously unsearchable images (approximately 40% of image inventory), available to users.

Amazon Rekognition makes it easy to add image and video analysis to your applications using highly scalable machine learning (ML) technology that requires no previous ML expertise to use. Amazon Rekognition also provides highly accurate facial recognition and facial search capabilities to detect, analyze, and compare faces.

Arcanum uses this facial recognition feature in their image database services to help you find particular people in Arcanum’s articles. This post discusses their challenges and why they chose Amazon Rekognition as their solution.

Automated image labeling challenges

Arcanum dedicated a team of three people to start tagging and labeling content for Hungaricana. The team quickly learned that they would need to invest more than 3 months of time-consuming and repetitive human labor to provide accurate search capabilities to their customers. Considering the size of the team and scope of the existing project, Arcanum needed a better solution that would automate image and object labelling at scale.

Automated image labeling solutions

To speed up and automate image labeling, Arcanum turned to Amazon Rekognition to enable users to search photos by keywords (for example, type of historic event, place name, or a person relevant to Hungarian history).

For the Hungaricana project, preprocessing all the images was challenging. Arcanum ran a TensorFlow face search across all 28 million pages on a machine with 8 GPUs in their own offices to extract only faces from images.

The following screenshot shows what an extract looks like (image provided by Arcanum Database Ltd).

The images containing only faces are sent to Amazon Rekognition, invoking the IndexFaces operation to add a face to the collection. For each face that is detected in the specified face collection, Amazon Rekognition extracts facial features into a feature vector and stores it in an Amazon Aurora database. Amazon Rekognition uses feature vectors when it performs face match and search operations using the SearchFaces and SearchFacesByImage operations.

The image preprocessing helped create a very efficient and cost-effective way to index faces. The following diagram summarizes the preprocessing workflow.

As for the web application, the workflow starts with a Hungaricana user making a face search request. The following diagram illustrates the application workflow.

The workflow includes the following steps:

  1. The user requests a facial match by uploading the image. The web request is automatically distributed by the Elastic Load Balancer to the webserver fleet.
  2. Amazon Elastic Compute Cloud (Amazon EC2) powers application servers that handle the user request.
  3. The uploaded image is stored in Amazon Simple Storage Service (Amazon S3).
  4. Amazon Rekognition indexes the face and runs SearchFaces to look for a face similar to the new face ID.
  5. The output of the search face by image operation is stored in Amazon ElastiCache, a fully managed in-memory data store.
  6. The metadata of the indexed faces are stored in an Aurora relational database built for the cloud.
  7. The resulting face thumbnails are served to the customer via the fast content-delivery network (CDN) service Amazon CloudFront.

Experimenting and live testing Hungaricana

During our test of Hungaricana, the application performed extremely well. The searches not only correctly identified people, but also provided links to all publications and sources in Arcanum’s privately owned database where found faces are present. For example, the following screenshot shows the result of the famous composer and pianist Franz Liszt.

The application provided 42 pages of 6×4 results. The results are capped to 1,000. The 100% scores are the confidence scores returned by Amazon Rekognition and are rounded up to whole numbers.

The application of Hungaricana has always promptly, and with a high degree of certainty, presented results and links to all corresponding publications.

Business results

By introducing Amazon Rekognition into their workflow, Arcanum enabled a better customer experience, including building family trees, searching for historical figures, and researching historical places and events.

The concept of face searching using artificial intelligence certainly isn’t new. But Hungaricana uses it in a very creative, unique way.

Amazon Rekognition allowed Arcanum to realize three distinct advantages:

  • Time savings – The time to market speed increased dramatically. Now, instead of spending several months of intense manual labor to label all the images, the company can do this job in a few days. Before, basic labeling on 150,000 images took months for three people to complete.
  • Cost savings – Arcanum saved around $15,000 on the Hungaricana project. Before using Amazon Rekognition, there was no automation, so a human workforce had to scan all the images. Now, employees can shift their focus to other high-value tasks.
  • Improved accuracy – Users now have a much better experience regarding hit rates. Since Arcanum started using Amazon Rekognition, the number of hits has doubled. Before, out of 500,000 images, about 200,000 weren’t searchable. But with Amazon Rekognition, search is now possible for all 500,000 images.

 “Amazon Rekognition made Hungarian culture, history, and heritage more accessible to the world,” says Előd Biszak, Arcanum CEO. “It has made research a lot easier for customers building family trees, searching for historical figures, and researching historical places and events. We cannot wait to see what the future of artificial intelligence has to offer to enrich our content further.”

Conclusion

In this post, you learned how to add highly scalable face and image analysis to an enterprise-level image gallery to improve label accuracy, reduce costs, and save time.

You can test Amazon Rekognition features such as facial analysis, face comparison, or celebrity recognition on images specific to your use case on the Amazon Rekognition console.

For video presentations and tutorials, see Getting Started with Amazon Rekognition. For more information about Amazon Rekognition, see Amazon Rekognition Documentation.


About the Authors

Siniša Mikašinović is a Senior Solutions Architect at AWS Luxembourg, covering Central and Eastern Europe—a region full of opportunities, talented and innovative developers, ISVs, and startups. He helps customers adopt AWS services as well as acquire new skills, learn best practices, and succeed globally with the power of AWS. His areas of expertise are Game Tech and Microsoft on AWS. Siniša is a PowerShell enthusiast, a gamer, and a father of a small and very loud boy. He flies under the flags of Croatia and Serbia.

Cameron Peron is Senior Marketing Manager for AWS Amazon Rekognition and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport, spending time with his family and friends, and is an avid fan of Euro-league basketball.

Source: https://aws.amazon.com/blogs/machine-learning/arcanum-makes-hungarian-heritage-accessible-with-amazon-rekognition/

Continue Reading
AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI10 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI14 hours ago

Pros and Cons of using cloud platforms for building chatbots

AI14 hours ago

From Knowledge Databases To Knowledge Graphs And Conversational AI

AI15 hours ago

Model selection with cross-validation: A quest for an elite model

AI15 hours ago

Celebrating 10 Years of Innovation, Excellence, and Trust

AI1 day ago

Executive Interview: Brian Gattoni, CTO, Cybersecurity & Infrastructure Security Agency 

AI1 day ago

Making Use Of AI Ethics Tuning Knobs In AI Autonomous Cars 

AI1 day ago

Application of AI to IT Service Ops by IBM and ServiceNow Exemplifies a Trend 

Trending