Connect with us


Getting a batch job completion message from Amazon Translate

Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Neural machine translation is a form of language translation automation that uses deep learning models to deliver more accurate and natural-sounding translation than traditional statistical and rule-based translation algorithms. The translation service is trained on a wide variety of […]



Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Neural machine translation is a form of language translation automation that uses deep learning models to deliver more accurate and natural-sounding translation than traditional statistical and rule-based translation algorithms. The translation service is trained on a wide variety of content across different use cases and domains to perform well on many kinds of content.

The Amazon Translate asynchronous batch processing capability enables organizations to translate a large collection of text or HTML documents. They can translate the collection of documents from one language to another with just a single API call. The ability to process data at scale is becoming important to organizations across all industries. In this blog post, we are going to demonstrate how you can build a notification mechanism to message you when a batch translation job is complete. This can enable end-end automation by triggering other Lambda functions or integrate with SQS for any post processing steps.

Solution overview

The following diagram illustrates the high-level architecture of the solution.

Architecture diagram depicting polling mechanism for batch translation job

The solution contains the following steps:

  1. A user starts a batch translation job.
  2. An Amazon CloudWatch Events rule picks up the event and triggers the AWS Step Functions
  3. The Job Poller AWS Lambda function polls the job status every 5 minutes.
  4. When the Amazon Translate batch job is complete, an email notification is sent via an Amazon Simple Notification Service (Amazon SNS) topic.

To implement this solution, you must create the following:

  1. An SNS topic
  2. An AWS Identity and Access Management (IAM) role
  3. A Lambda function
  4. A Step Functions state machine
  5. A CloudWatch Events rule

Creating an SNS topic

To create an SNS topic, complete the following steps:

  1. On the Amazon SNS console, create a new topic.
  2. For Topic name, enter a name (for example, TranslateJobNotificationTopic).
  3. Choose Create topic.

You can now see the TranslateJobNotificationTopic page. The Details section displays the topic’s name, ARN, display name (optional), and the AWS account ID of the Topic owner.

  1. In the Details section, copy the topic ARN to the clipboard (arn:aws:sns:us-east-1:123456789012:TranslateJobNotificationTopic).
  2. On the left navigation pane, choose Subscriptions.
  3. Choose Create subscription.
  4. On the Create subscription page, enter the topic ARN of the topic you created earlier (arn:aws:sns:us-east-1:123456789012:TranslateJobNotificationTopic).
  5. For Protocol, select Email.
  6. For Endpoint, enter an email address that can receive notifications.
  7. Choose Create subscription.

For email subscriptions, you have to first confirm the subscription by choosing the confirm subscription link in the email you received.

Creating an IAM role for the Lambda function

To create an IAM role, complete the following steps. For more information, see Creating an IAM Role.

  1. On the IAM console, choose Policies.
  2. Choose Create Policy.
  3. On the JSON tab, enter the following IAM policy:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "translate:DescribeTextTranslationJob", "Resource": "*" }, { "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/lambda/TranslateJobStatusPoller:*" } ]

Update the resource property for CloudWatch Logs permission to reflect your configuration for Region, AWS account ID, and the Lambda function name.

  1. Choose Review policy.
  2. Enter a name (MyLambdaPolicy) for this policy and choose Create policy.
  3. Record the name of this policy for later steps.
  4. On the left navigation pane, choose Roles.
  5. Choose Create role.
  6. On the Select role type page, choose Lambda and the Lambda use case.
  7. Choose Next: Permissions.
  8. Filter policies by the policy name that you just created, and select the check-box.
  9. Choose Next: Tags.
  10. Add an appropriate tag.
  11. Choose Next: Review.
  12. Give this IAM role an appropriate name, and note it for future use.
  13. Choose Create role.

Creating a Lambda function

To create a Lambda function, complete the following steps. For more information, see Create a Lambda Function with the Console.

  1. On the Lambda console, choose Author from scratch.
  2. For Function Name, enter the name of your function (for example, TranslateJobStatusPoller).
  3. For Runtime, choose Python 3.8.
  4. For Execution role, select Use an existing role.
  5. Choose the IAM role you created in the previous step.
  6. Choose Create Function.
  7. Remove the default function and enter the following code into the Function Code window:
# Copyright 2020, Inc. or its affiliates. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at##
# or in the "license" file accompanying this file.
# either express or implied. See the License for the specific language governing permissions
# and limitations under the License.
# Description: This Lambda function is part of the a step function that checks the status of Amazon translate batch job. # Author: Sudhanshu Malhotra
import boto3
import logging
import os from botocore.exceptions import ClientError logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__) def msgpublish(jobid): client = boto3.client('translate') try: response = client.describe_text_translation_job(JobId=jobid) logger.debug('Job Status is: {}' .format(response['TextTranslationJobProperties']['JobStatus'])) return(response['TextTranslationJobProperties']['JobStatus']) except ClientError as e: logger.error("An error occured: %s" % e) def lambda_handler(event, context): logger.setLevel(logging.DEBUG) logger.debug('Job ID is: {}' .format(event)) return(msgpublish(event))
  1. Choose Save.

Creating a state machine

To create a state machine, complete the following steps. For more information, see Create a State Machine.

  1. On the Step Functions console, on the Define state machine page, choose Start with a template.
  2. Choose Hello world.
  3. Under Type, choose Standard.
  4. Under Definition, enter the following Amazon States Language. Make sure to replace the Lambda function and SNS topic ARN.
{ "Comment": "Polling step function for translate job complete", "StartAt": "LambdaPoll", "States": { "LambdaPoll": { "Type": "Task", "Resource": "<ARN of the Lambda Function created in step 3>", "InputPath": "$.detail.responseElements.jobId", "ResultPath": "$.detail.responseElements.jobStatus", "Next": "Job Complete?", "Retry": [
{ "ErrorEquals": [ "States.ALL"
], "IntervalSeconds": 1, "MaxAttempts": 3, "BackoffRate": 2
}, "Job Complete?": { "Type": "Choice", "Choices": [
{ "Variable": "$.detail.responseElements.jobStatus", "StringEquals": "IN_PROGRESS", "Next": "Wait X Seconds"
{ "Variable": "$.detail.responseElements.jobStatus", "StringEquals": "SUBMITTED", "Next": "Wait X Seconds"
{ "Variable": "$.detail.responseElements.jobStatus", "StringEquals": "COMPLETED", "Next": "Notify"
{ "Variable": "$.detail.responseElements.jobStatus", "StringEquals": "FAILED", "Next": "Notify"
{ "Variable": "$.detail.responseElements.jobStatus", "StringEquals": "STOPPED", "Next": "Notify"
], "Default": "Wait X Seconds"
}, "Wait X Seconds": { "Type": "Wait", "Seconds": 60, "Next": "LambdaPoll"
}, "Notify": { "Type": "Task", "Resource": "arn:aws:states:::sns:publish", "Parameters": { "Subject": "Translate Batch Job Notification", "Message": { "JobId.$": "$.detail.responseElements.jobId", "S3OutputLocation.$": "$.detail.requestParameters.outputDataConfig.s3Uri", "JobStatus.$": "$.detail.responseElements.jobStatus"
}, "MessageAttributes": { "JobId": { "DataType": "String", "StringValue.$": "$.detail.responseElements.jobId"
}, "S3OutputLocation": { "DataType": "String", "StringValue.$": "$.detail.requestParameters.outputDataConfig.s3Uri"
}, "TopicArn": "<ARN of the SNS topic created in step 1>"
}, "End": true
  1. Use the graph in the Visual Workflow pane to check that your Amazon States Language code describes your state machine correctly. You should see something like the following screenshot.
    Amazon State machine depicting various states of batch translation job
  1. Choose Next.
  2. For Name, enter a name for the state machine.
  3. Under Permissions, select Create new role.

You now see an info block with the details of the role and the associated permissions.

IAM policy screenshot for State machine

  1. Choose Create state machine.

Creating a CloudWatch Events rule

To create a CloudWatch Events rule, complete the following steps. This rule catches when a user performs a StartTextTranslationJob API event and triggers the step function (set as a target).

  1. On the CloudWatch console, choose Rules.
  2. Choose Create rule.
  3. On the Step 1: Create rule page, under Event Source, select Event Pattern.
  4. Choose Build custom event pattern from the drop-down menu.
  5. Enter the following code into the preview pane:
{ "source": [ "aws.translate" ], "detail-type": [ "AWS API Call via CloudTrail" ], "detail": { "eventSource": [ "" ], "eventName": [ "StartTextTranslationJob" ] } }
  1. For Targets, select Step Functions state machine.
  2. Select the state machine you created earlier.
  3. For permission to send events to Step Functions, select Create a new role for this specific resource.
  4. Choose Configure details.
  5. On the Step 2: Configure rule details page, enter a name and description for the rule.
  6. For State, select Enabled.
  7. Choose Create rule.

Validating the solution

To test this solution, I first create an Amazon Translate batch job and provide the input text Amazon Simple Storage Service (Amazon S3) location, output Amazon S3 location, target language, and the data access service role ARN. For instructions on creating a batch translate job, see Asynchronous Batch Processing or Translating documents with Amazon Translate, AWS Lambda, and the new Batch Translate API.

The following screenshot shows my batch job on the Translation jobs page.

Amazon translate job start screenshot of Translate console

The CloudWatch Events rule picks up the StartTextTranslationJob API and triggers the state machine. When the job is complete, I get an email notification via Amazon SNS.

Translate job complete notification email screenshot showing job status, job name and output location of the translated job


In this post, we demonstrated how you can use Step Functions to poll for an Amazon Translate batch job. For this use case, we configured an email notification to send when a job is complete; however, you can use this framework to trigger other Lambda functions or integrate with Amazon Simple Queue Service (Amazon SQS) for any postprocessing automated steps, enabling you to build an end-to-end automated workflow. For further reading, see the following:

About the Authors

Sudhanshu Malhotra is a Boston-based Enterprise Solutions Architect for AWS. He is a technology enthusiast who enjoys helping customers find innovative solutions to complex business challenges. His core areas of focus are DevOps, Machine Learning, and Security. When he’s not working with customers on their journey to the cloud, he enjoys reading, hiking, and exploring new cuisines.

Siva Rajamani is a Boston-based Enterprise Solutions Architect for AWS. He enjoys working closely with customers, supporting their digital transformation and AWS adoption journey. His core areas of focus are Serverless, Application Integration, and Security. Outside of work, he enjoys outdoor activities and watching documentaries.



5 Work From Home Office Essentials

Working remotely from home had been increasing in popularity, but it’s now become a necessity for many professionals due to the pandemic. “Some companies are eager to reopen their doors and return to the office, but a large number of employer and employees are making the transitional work environment a permanent change.”  They can’t guarantee […]

The post 5 Work From Home Office Essentials appeared first on Aiiot Talk – Artificial Intelligence | Internet of Things | Technology.



Working remotely from home had been increasing in popularity, but it’s now become a necessity for many professionals due to the pandemic.

“Some companies are eager to reopen their doors and return to the office, but a large number of employer and employees are making the transitional work environment a permanent change.” 

They can’t guarantee their health and safety in a socially-crowded space, plus, companies are able to save tons of money they would have spent on their commercial lease or mortgage payments.

That’s not the say that working from home doesn’t come at its own costs, however. It can lead to a huge hit in productivity without the right equipment in place. To maximize your performance and efficiency in a remote setting, be sure to purchase these five office essentials.

1. Powerful PC

This one probably feels like an obvious pointer, but let’s knock it off our list. You won’t be able to get by with a make-shift work station and in today’s digital domain, your computer will be at the core of everything you do.

Never-ending loading wheels, delayed downloads, and slow rendering will add seconds to every task you do, so if your company didn’t provide you with a workhorse computer tower, you might look into investing in one yourself, then deduct the cost in your tax return.

Depending on your line of work, it might make more sense to go for a laptop vs a desktop computer. Unless your tasks demand super sophisticated software and large storage space, you can probably get by with a portable PC. That way, when coffee shops begin to reopen and allow patrons to sit inside, you can work on-the-go without feeling tethered to your desk.

2. Ergonomic Office Chair

If you’re looking at a long-term remote situation, it’s worth spending the big bucks on an ergonomic office chair. You should feel comfortably locked into your seat for eight hours a day—at least if you want to concentrate on your workflow, rather than the cramp in your back.

Shop around for an office chair that’s sophisticated in design and specifically built to hold the human body. Some stand-out features you should look out for include:

  • Targeted support around the lumbar spine
  • Adjustable height so you can adjust the seat as necessary for your arms to rest naturally on the keyboard
  • Swivel base to effortlessly turn your body, preventing neck strain
  • Cushioned seat to comfort your tailbone
  • Ventilated fabric that promotes airflow so you don’t feel overheated when sitting in the chair for several hours

You might have to pay a couple of hundred dollars for the best-of-the-line features, but there is another item that might qualify as an eligible tax deduction—just be sure to keep all your receipts organized with a document scanner in case the IRS raises their eyebrows and issues an audit.

3. Wireless Keyboard

If you want to type faster and feel better while you’re at it, then a wireless keyboard is clutch. They enable you to bring the keys closer, decreasing the extension length of your arms and accompanying shoulder strain.

“It also helps reduce the strain on your eyes by moving the bright screen farther away from your direct line of sight.” 

And, last but not least, the keys are placed in an ergonomic position for a more natural finger splay, with ample cushioning wrist cushioning that helps prevent overuse injuries such as a carpal tunnel.

4. Noise-cancelling Headphones

To truly get in the zone, you should block out distractions with headphones the cancel noise in your environment—especially if your work station is set up in a common area. Other tips to stay focused include installing a website blocker and leaving your cellphone on the other side of the room.

5. House plant or flowers

People are scientifically proven to be more productive when working near fresh flowers or lush greenery. The good news is that you don’t need to have a green thumb or natural lighting to achieve this effect—even artificial foliage can brighten your mood and improve your performance.

Working from home sometimes can feel like you’re locked inside all day, so bringing the outside world inside your space can help ward off burnout.

Take these tips with you into 2021 and set yourself up for success in your new home office setting.


Continue Reading


zomato digitizes menus using Amazon Textract and Amazon SageMaker

This post is co-written by Chiranjeev Ghai, ML Engineer at zomato. zomato is a global food-tech company based in India. Are you the kind of person who has very specific cravings? Maybe when the mood hits, you don’t want just any kind of Indian food—you want Chicken Chettinad with a side of paratha, and nothing […]



This post is co-written by Chiranjeev Ghai, ML Engineer at zomato. zomato is a global food-tech company based in India.

Are you the kind of person who has very specific cravings? Maybe when the mood hits, you don’t want just any kind of Indian food—you want Chicken Chettinad with a side of paratha, and nothing else will hit the spot! To help picky eaters satisfy their cravings, we at zomato have recently added enhanced search engine capabilities to our restaurant aggregation and food delivery platform. These capabilities enable us to recommend restaurants to zomato users based on searches for specific dishes.

We power this functionality with machine learning (ML), using it to extract and structure text data from menu images. To develop this menu digitization technology, we partnered with Amazon ML Solutions Lab to explore the capabilities of the AWS ML Stack. This post summarizes how we used Amazon Textract and Amazon SageMaker to develop a customized menu digitization solution.

Extracting raw text from menus with Amazon Textract

The first component of this solution was to accurately extract all the text in the menu image. This process is known as optical character recognition (OCR). For our use case, we experimented with both in-house and commercial OCR solutions.

We first created an in-house OCR solution by stacking a pre-trained text detection model and a pre-trained text recognition model. The challenge with these models was that they were trained on a standard text dataset that didn’t match the eclectic fonts found in restaurant menus. To improve system performance, we fine-tuned these models by generating a dataset of 1.5 million synthetic text images that were more representative of text in menus.

After evaluating our in-house solution and several commercial OCR solutions, we found that Amazon Textract offers the best text recognition precision and recall. Restaurants often get creative when designing their menus, so OCR robustness was crucial for this use case. Amazon Textract particularly differentiated itself when processing menus with unique fonts, background images, and low image resolutions. Using it is as simple as making an API call:

#Python 3.6
import boto3
textract_client = boto3.client( 'textract', region_name = '' #insert the AWS region you're working in
textract_response = textract_client.detect_document_text( Document={ 'S3Object': { 'Bucket': '', #insert the name of the S3 bucket containing your image 'Name': '' #insert the S3 key of your image } }
) print(textract_response)

The following code is the Amazon Textract output for a sample image:

{'DocumentMetadata': {'Pages': 1}, 'Blocks': [{'BlockType': 'PAGE', 'Geometry': {'BoundingBox': {'Width': 1.0, 'Height': 1.0, 'Left': 0.0, 'Top': 0.0}, ... {'BlockType': 'WORD', 'Text': 'Dim', 'Geometry': {'BoundingBox': {'Width': 0.10242128372192383, 'Height': 0. 048968635499477386, 'Left': 0. 24052166938781738, 'Top': 0. 02556285448372364},

The raw outputs are visualized by overlaying them on top of the image. The following image visualizes the preceding raw output. The black boxes are the text-detection bounding boxes provided by Amazon Textract. Extracted text is displayed on the right. Note the unconventional fonts, colors, and images on this menu.

The following image visualizes Amazon Textract outputs for a menu with a different design. Black boxes are the text-detection bounding boxes provided by Amazon Textract. Extracted text is displayed on the right. Again, this menu has unconventional fonts, colors, and images.

Using Amazon SageMaker to build a menu structure detector

The next component of this solution was to group the detections from Amazon Textract by menu section. This enabled our search engine to distinguish between entrees, desserts, beverages, and so on. We framed this as a computer vision problem—object detection, to be precise—and used Amazon SageMaker Ground Truth to collect training data. Ground Truth accelerated this process by providing a fully managed annotation tool that we customized to ask human annotators to draw bounding boxes around every menu section in the image. We used an annotation workforce from AWS Marketplace because this was a niche labeling task, and public labelers from Amazon Mechanical Turk didn’t perform well. With Ground Truth, it took just a few days and approximately $1,400 to label 4,086 images with triplicate redundancy.

With labeled data in hand, we faced a paradox of choice when selecting model-building approaches because object detection is such a thoroughly studied problem. Our choices included:

  • Removing low-confidence labels from the labeled dataset – Because even human annotators can make mistakes, Ground Truth calculates confidence scores for labels by having multiple annotators (for this use case, three) label the same image. Setting a higher confidence threshold for labels can decrease the noise in the training data at the expense of having less training data.
  • Data augmentation – Techniques for image data augmentation include horizontal flipping, cropping, shearing, and rotation. Data augmentation can make models more robust by increasing the amount of training data. However, excessive data augmentation may result in poor model convergence.
  • Feature engineering – From our experience in applying computer vision to processing menus, we had a variety of techniques in mind to emphasize or de-emphasize various aspects of the input images. For example, see the following images.

The following is the original image of a menu.

The following image shows the redacted image (overlay white boxes on a black background where text detections were found).

The following is a text cropped image. On a black background, the image has overlay crops from the original image where text detections were found.

The following is a single channel and text cropped image. The image is encoded as a single RGB channel (for this image, green). You can apply this with other transformations, in this case text cropping.


We also had the following additional model-building methods to choose from:

  • Model architectures like YOLO, SSD, and RCNN, with VGG or ResNet backbones – Each architecture has different trade-offs of model accuracy, inference time, model size, and more. For this use case, model accuracy was the most important metric because menu images were batch processed.
  • Using a model pre-trained on a general object detection task or starting from scratch – Transfer learning can be helpful when training complex models on small datasets. However, the task of detecting menu sections is very different from a general object detection task (for example, PASCAL VOC), so the pre-training may not be relevant.
  • Optimizer parameters – These include learning rate, momentum, regularization coefficients, and early stopping configuration.

With so many hyperparameters to consider, we turned to the automatic tuning feature of Amazon SageMaker to coordinate a massive tuning job across all these variables. The following code is an example of tuning a single model architecture and input data configuration:

import sagemaker
import boto3
from import get_image_uri
from sagemaker.estimator import Estimator
from sagemaker.tuner import HyperparameterTuner, IntegerParameter, CategoricalParameter, ContinuousParameter
import itertools
from time import sleep #set to the region you're working in
#set a S3 path for SageMaker to store the outputs of the training jobs S3_OUTPUT_PATH = ''
#set a S3 location for your training dataset, #assumed to be an augmented manifest file
#set a S3 location for your validation data, #assumed to be an augmented manifest file
#specify which fields in the augmented manifest file are relevant for training
#specify image shape
IMAGE_SHAPE = #specify label width
LABEL_WIDTH = #specify number of samples in the training dataset
NUM_TRAINING_SAMPLES = sgm_role = sagemaker.get_execution_role()
boto_session = boto3.session.Session( region_name = REGION_NAME
sgm_session = sagemaker.Session( boto_session = boto_session
training_image = get_image_uri( region_name = REGION_NAME, repo_name = 'object-detection', repo_version = 'latest'
) #set training job configuration
object_detection_estimator = Estimator( image_name = training_image, role = sgm_role, train_instance_count = 1, train_instance_type = 'ml.p3.2xlarge', train_volume_size = 50, train_max_run = 360000, input_mode = 'Pipe', output_path = S3_OUTPUT_PATH, sagemaker_session = sgm_session
) #set input data configuration
train_data = sagemaker.session.s3_input( s3_data = TRAIN_DATA_LOCATION, distribution = 'FullyReplicated', record_wrapping = 'RecordIO', s3_data_type = 'AugmentedManifestFile', attribute_names = DATA_ATTRIBUTE_NAMES
) val_data = sagemaker.session.s3_input( s3_data = VAL_DATA_LOCATION, distribution = 'FullyReplicated', record_wrapping = 'RecordIO', s3_data_type = 'AugmentedManifestFile', attribute_names = DATA_ATTRIBUTE_NAMES
) data_channels = { 'train': train_data, 'validation' : val_data
} #set static hyperparameters
static_hyperparameters = { 'num_classes' : 1, 'epochs' : 100, 'lr_scheduler_step' : '15,30', 'lr_scheduler_factor' : 0.1, 'overlap_threshold' : 0.5, 'nms_threshold' : 0.45, 'image_shape' : IMAGE_SHAPE, 'label_width' : LABEL_WIDTH, 'num_training_samples' : NUM_TRAINING_SAMPLES, 'early_stopping' : True, 'early_stopping_min_epochs' : 5, 'early_stopping_patience' : 1, 'early_stopping_tolerance' : 0.05,
} #set ranges for tunable hyperparameters
hyperparameter_ranges = { 'learning_rate': ContinuousParameter( min_value = 1e-5, max_value = 1e-2, scaling_type = 'Auto' ), 'mini_batch_size': IntegerParameter( min_value = 8, max_value = 64, scaling_type = 'Auto' )
} #Not all hyperparameters are feasible to tune directly
#For these we run model tuning jobs in parallel using a for loop
#We take this approach for tuning over different model architectures #and different feature engineering configurations
use_pretrained_options = [0, 1]
base_network_options = ['resnet-50', 'vgg-16'] for use_pretrained, base_network in itertools.product(use_pretrained_options, base_network_options): static_hyperparameter_configuration = { **static_hyperparameters, 'use_pretrained_model' : use_pretrained, 'base_network' : base_network } object_detection_estimator.set_hyperparameters( **static_hyperparameter_configuration ) tuner = HyperparameterTuner( estimator = object_detection_estimator, objective_metric_name = 'validation:mAP', strategy = 'Bayesian', hyperparameter_ranges = hyperparameter_ranges, max_jobs = 24, max_parallel_jobs = 2, early_stopping_type = 'Auto', ) inputs = data_channels ) print(f'Started tuning job: {}') #wait a bit before starting next job so auto generated names don't conflict sleep(60)

This code uses version 1.72.0 of the Amazon SageMaker Python SDK, which is the default version installed in Amazon SageMaker notebook instances. Version 2.X introduces breaking changes. For more information, see Use Version 2.x of the SageMaker Python SDK.

We used powerful GPU hardware (p3.2xlarge instances), and it took us just 1 week and approximately $1,500 to explore 455 unique parameter configurations. Of these configurations, Amazon SageMaker found that a fine-tuned Faster R-CNN model with text cropping performed the best, with a mean average precision score of 0.93. This aligned with results from our prior work in this space, which found that two-stage detectors generally outperform single-stage detectors in processing menus.

The following is an example of how the object detection model processed a menu. In this image, the purple boxes are the predicted bounding boxes from the menu section detection model. Black boxes are the text detection bounding boxes provided by Amazon Textract.

Using Amazon SageMaker to build rule- and ML-based text classifiers

The final component in the solution was a layer of text classification. To enable our enhanced search functionality, we had to know if each detection within a menu section was the menu section title, name of a dish, price of a dish, or something else (such as a description of a dish or the name of the restaurant). To this end, we developed a hybrid rule- and ML-based text classification system.

The first step of the classification was to use a rule to determine if a detection was a price or not. This rule simply calculated the proportion of numeric characters in the detection. If the proportion was greater than 40%, the detection was classified as a price. Although simple, this classifier worked well in practice. We used Amazon SageMaker notebook instances as a convenient interactive environment to develop this and other rules.

After the prices were filtered out, the remaining detections were classified as dish or not dish. From our experience in processing menus, we intuitively knew that in many cases, the location of prices was sufficient to do this classification. For these menus, dishes and prices are listed side by side, so simply classifying detections located to the left of prices as dishes worked well.

The following example shows how the rules-based text classification system processed a menu. Green boxes are detections classified as dishes (by the price location rule). Red boxes are detections classified as not dishes (by the price location rule). Blue boxes are detections classified as prices. Final dish detections are on the right.

Some menus might include lengthy dish descriptions or may not list prices next to individual dishes. These menus violate the assumptions of the price location rules, so we turned to model-based text classification. We used Amazon SageMaker training jobs to experiment with many modeling approaches in parallel, including an XGBoost model trained on hashed word count vectors. In the end, we found that a fine-tuned BERT model from GluonNLP achieved the best performance with an AUROC score of 0.86.

The following image is an example of how the model-based text classification system processed a menu. Green boxes are detections classified as dishes (by the BERT model). Red boxes are detections classified as not dishes (by the BERT model). Blue boxes are detections classified as prices. The final dish detections are on the right.

Of the remaining detections (those not classified as prices or dishes), a final round of classification identified menu section titles. We created features that captured the font size of the detection, the location of the detection on the menu, and the length of the words within the detection. We used these features as inputs to a logistic regression model that predicted if a detection is a menu section title or not.

Key features of Amazon SageMaker

In the end, we found that doing OCR was as simple as making an API call to Amazon Textract. However, our use case required additional customization. We selected Amazon SageMaker as an ML platform to develop this customization because it offered several key features:

  • Amazon SageMaker Notebooks made it easy to spin up Jupyter notebook environments for prototyping and testing rules and models.
  • Ground Truth helped us build and deploy a custom image annotation tool with no front-end experience required.
  • Amazon SageMaker automatic tuning enabled us to run massive hyperparameter tuning jobs on powerful hardware, and included an intuitive interface for tracking the results of hundreds of experiments. You can implement tuning jobs with early stopping conditions, which makes experimentation cost-effective.

Amazon SageMaker offers additional integration benefits from including all the preceding features in a single platform:

  • Amazon SageMaker Notebooks come pre-installed with all the dependencies needed to build models that can be optimized with automatic tuning.
  • Ground Truth offers easy access to labelers from Mechanical Turk or AWS Marketplace.
  • Automatic tuning can directly ingest the manifest files created by Amazon SageMaker Ground Truth.

Putting it all together

Our menu digitization system can extract text from images of menus, group it by menu section, extract the title of the section, extract the dishes within each section, and pair each dish with its price. The following is a visualization of the end-to-end solution.

The workflow contains the following steps:

  1. The input is an image of a menu.
  2. Amazon Textract performs OCR on the input image.
  3. An ML-based computer vision model predicts bounding boxes for menu sections in the menu image.
  4. A rules-based classifier classifies Amazon Textract detections as price or not price.
  5. A rules-based classifier (5a) attempts to use the location of price detections to classify the not price detections as dish or not dish. If this rule doesn’t successfully classify most of the detections on the page, an ML-based classifier is used instead (5b).
  6. The ML-based classifier uses hand-crafted features to classify not dish detections as menu section title or not menu section title.
  7.  The menu text is structured by combining the menu section detections and the text classification results.

The following image visualizes a sample output of the system. Green boxes are detections classified as dishes. Blue boxes are detections classified as prices. Yellow boxes are detections classified as menu section titles. Purple boxes are predicted menu section bounding boxes.

The following code is the structured output:

[ { "title":{ "text":"Shrimp Dishes" }, "dishes":[ { "text":"Shrimp Masala", "price":{ "text":"140" } }, { "text":"Shrimp Biryani", "price":{ "text":"170" } }, { "text":"Shrimp Pulav", "price":{ "text":"160" } } ] }, ...


We built a system that uses ML to digitize menus without any human input required. This system will improve user experience by powering new features such as advanced dish search and review highlight verification. Our content team will also use it to accelerate creating menus for online ordering.

To explore these capabilities of Amazon Textract and Amazon SageMaker in more depth, see Automatically extract text and structured data from documents with Amazon Textract and Amazon SageMaker Automatic Model Tuning: Using Machine Learning for Machine Learning.

The Amazon ML Solutions Lab helped us accelerate our use of ML by pairing our team with ML experts. The ML Solutions Lab brings to every customer engagement learnings from more than 20 years of Amazon’s ML innovations in areas such as fulfillment and logistics, personalization and recommendations, computer vision and translation, fraud prevention, forecasting, and supply chain optimization. To learn more about the AWS ML Solutions Lab, contact your account manager or visit Amazon Machine Learning Solutions Lab.

About the Authors

Chiranjeev Ghai is a Machine Learning Engineer. In his current role, he has been aiding automation at zomato by leveraging a wide variety of ML optimisations ranging from Image Classification, Product Recommendation, and Text Detection. When not building models, he likes to spend his time playing video games at home.

Ryan Cheng is a Deep Learning Architect in the Amazon ML Solutions Lab. He has worked on a wide range of ML use cases from sports analytics to optical character recognition. In his spare time, Ryan enjoys cooking.

Andrew Ang is a Deep Learning Architect at the Amazon ML Solutions Lab, where he helps AWS customers identify and build AI/ML solutions to address their business problems.

Vinayak Arannil is a Data Scientist at the Amazon Machine Learning Solutions Lab. He has worked on various domains of data science like computer vision, natural language processing, recommendation systems, etc.


Continue Reading


How to Improve Your Supply Chain With Deep Reinforcement Learning

What has set Amazon apart from the competition in online retail? Their supply chain. In fact, this has long been one of the greatest strengths of one of their chief competitors, Walmart. Supply chains are highly complex systems consisting of hundreds if not thousands of manufacturers and logistics carriers around the world who combine resources […]

The post How to Improve Your Supply Chain With Deep Reinforcement Learning appeared first on TOPBOTS.



reinforcement learning

What has set Amazon apart from the competition in online retail? Their supply chain. In fact, this has long been one of the greatest strengths of one of their chief competitors, Walmart.

Supply chains are highly complex systems consisting of hundreds if not thousands of manufacturers and logistics carriers around the world who combine resources to create the products we use and consume every day. To track all of the inputs to a single, simple product would be staggering. Yet supply chain organizations inside vertically integrated corporations are tasked with managing inputs from raw materials, to manufacturing, warehousing, and distribution to customers. The companies that do this best cut down on waste from excess storage, to unneeded transportation costs, and lost time to get products and materials to later stages in the system. Optimizing these systems is a key component in businesses as dissimilar as Apple and Saudi Aramco.

A lot of time and effort has been put into building effective supply chain optimization models, but due to their size and complexity, they can be difficult to build and manage. With advances in machine learning, particularly reinforcement learning, we can train a machine learning model to make these decisions for us, and in many cases, do so better than traditional approaches!


We train a deep reinforcement learning model using Ray and or-gym to optimize a multi-echelon inventory management model and benchmark it against a derivative free optimization model using Powell’s Method.

Multi-Echelon Supply Chain

In our example, we’re going to work with a multi-echelon supply chain model with lead times. This means that we have different stages of our supply chain that we need to make decisions for, and each decision that we make at different levels are going to affect decisions downstream. In our case, we have M stages going back to the producer of our raw materials all the way to our customers. Each stage along the way has a different lead time, or time it takes for the output of one stage to arrive and become the input for the next stage in the chain. This may be 5 days, 10 days, whatever. The longer these lead times become, the earlier you need to anticipate customer orders and demand to ensure you don’t stock out or lose sales!

If this in-depth educational content on is useful for you, you can subscribe to our AI research mailing list to be alerted when we release new material. 

Inventory Management with OR-Gym

The OR-Gym library has a few multi-echelon supply chain models ready to go to simulate this structure. For this, we’ll use the InvManagement-v1 environment, which has the structure shown above, but results in lost sales if you don’t have sufficient inventory to meet customer demand.

If you haven’t already, go ahead and install the package with:

pip install or-gym

Once installed, we can set up our environment with:

env = or_gym.make('InvManagement-v1')

This is a four-echelon supply chain by default. The actions determine how much material to order from the echelon above at each time step. The orders quantities are limited by the capacity of the supplier and their current inventory. So, if you order 150 widgets from a supplier that has a shipment capacity of 100 widgets and only has 90 widgets on hand, you’re going to only get 90 sent.

Each echelon has its own costs structure, pricing, and lead times. The last echelon (Stage 3 in this case) provides raw materials, and we don’t have any inventory constraints on this stage, assuming that the mine, oil well, forest — or whatever produces your raw material inputs — is large enough that this isn’t a constraint we need to concern ourselves with.

Default parameter values for the Invmanagement-v1 environment.

As with all or-gym environments, if these settings don’t suit you, simply pass an environment configuration dictionary to the make function to customize your supply chain accordingly (an example is given here).

Training with Ray

To train your environment, we’re going to leverage the Ray library to speed up our training, so go ahead and import your packages.

import or_gym
from or_gym.utils import create_env
import ray
from ray.rllib import agents
from ray import tune

To get started, we’re going to need a brief registration function to ensure that Ray knows about the environment we want to run. We can register that with the register_env function shown below.

def register_env(env_name, env_config={}): env = create_env(env_name) tune.register_env(env_name, lambda env_name: env(env_name, env_config=env_config))

From here, we can set up our RL configuration and everything we need to train the model.

# Environment and RL Configuration Settings
env_name = 'InvManagement-v1'
env_config = {} # Change environment parameters here
rl_config = dict( env=env_name, num_workers=2, env_config=env_config, model=dict( vf_share_layers=False, fcnet_activation='elu', fcnet_hiddens=[256, 256] ), lr=1e-5
) # Register environment
register_env(env_name, env_config)

The rl_config dictionary is where you can set all of the relevant hyperparameters or set your system to run on a GPU. Here, we’re just going to use 2 workers for parallelization, and train a two-layer network with an ELU activation function. Additionally, if you’re going to use tune for hyperparameter tuning, then you can use tools like tune.gridsearch() to systematically update learning rates, change the network, or whatever you like.

Once your happy with that, go head and choose your algorithm and get to training! Below, I just use the PPO algorithm because I find it trains well on most environments.

# Initialize Ray and Build Agent
agent = agents.ppo.PPOTrainer(env=env_name, config=rl_config) results = []
for i in range(500): res = agent.train() results.append(res) if (i+1) % 5 == 0: print('\rIter: {}\tReward: {:.2f}'.format( i+1, res['episode_reward_mean']), end='')

The code above will initialize ray, then build the agent according to the configuration you specified previously. If you’re happy with that, then let it run for a bit and see how it does!

One thing to note with this environment: if the learning rate is too high, the policy function will begin to diverge such that the loss becomes astronomically large. At that point, you’ll wind up getting an error, typically stemming from Ray’s default pre-processor with state showing bizarre values because the actions being given by the network are all nan. This is easy to fix by bringing the learning rate down a bit and trying again.

Let’s take a look at the performance.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec # Unpack values from each iteration
rewards = np.hstack([i['hist_stats']['episode_reward'] for i in results])
pol_loss = [ i['info']['learner']['default_policy']['policy_loss'] for i in results]
vf_loss = [ i['info']['learner']['default_policy']['vf_loss'] for i in results] p = 100
mean_rewards = np.array([np.mean(rewards[i-p:i+1]) if i >= p else np.mean(rewards[:i+1]) for i, _ in enumerate(rewards)])
std_rewards = np.array([np.std(rewards[i-p:i+1]) if i >= p else np.std(rewards[:i+1]) for i, _ in enumerate(rewards)]) fig = plt.figure(constrained_layout=True, figsize=(20, 10))
gs = fig.add_gridspec(2, 4)
ax0 = fig.add_subplot(gs[:, :-2])
ax0.fill_between(np.arange(len(mean_rewards)), mean_rewards - std_rewards, mean_rewards + std_rewards, label='Standard Deviation', alpha=0.3)
ax0.plot(mean_rewards, label='Mean Rewards')
ax0.set_title('Training Rewards')
ax0.legend() ax1 = fig.add_subplot(gs[0, 2:])
ax1.set_title('Policy Loss') ax2 = fig.add_subplot(gs[1, 2:])
ax2.set_title('Value Function Loss')
Image by author.

It looks like our agent learned a decent policy!

One of the difficulties of deep reinforcement learning for these classic, operations research problems is the lack of optimality guarantees. In other words, we can look at that training curve above and see that it is learning a better and better policy — and it seems to be converging on a policy — but we don’t know how good that policy is. Could we do better? Should we invest more time (and money) into hyperparameter tuning? To answer this, we need to turn to some different methods and develop a benchmark.

Derivative Free Optimization

A good way to benchmark an RL model is with derivative free optimization (DFO). Like RL, DFO treats the system as a black-box model providing inputs and getting some feedback in return to try again as it seeks the optimal value.

Unlike RL, DFO has no concept of a state. This means that we will try to find a fixed re-order policy to bring inventory up to a certain level to balance holding costs and profit from sales. For example, if the policy at stage 0 is to re-order up to 10 widgets, and the currently, we have 4 widgets, then the policy states we’re going to re-order 6. In the RL case, it would take into account the current pipeline and all of the other information that we provide into the state. So RL is more adaptive and ought to outperform a straightforward DFO implementation. If it doesn’t, then we know we need to go back to the drawing board.

While it may sound simplistic, this fixed re-order policy isn’t unusual in industrial applications, partly because real supply chains consist of many more variables and interrelated decisions than we’re modeling here. So a fixed policy is tractable and something that supply chain professionals can easily work with.

Implementing DFO

There are a lot of different algorithms and solvers out there for DFO. For our purposes, we’re going to leverage Scipy’s optimize library to implement Powell’s Method. We won’t get into the details here, but this is a way to quickly find minima on functions and can be used for discrete optimization – like we have here.

from scipy.optimize import minimize

Because we’re going to be working with a fixed re-order policy, we need a quick function to translate inventory levels into actions to evaluate.

def base_stock_policy(policy, env): ''' Implements a re-order up-to policy. This means that for each node in the network, if the inventory at that node falls below the level denoted by the policy, we will re-order inventory to bring it to the policy level. For example, policy at a node is 10, current inventory is 5: the action is to order 5 units. ''' assert len(policy) == len(env.init_inv), ( 'Policy should match number of nodes in network' + '({}, {}).'.format( len(policy), len(env.init_inv))) # Get echelon inventory levels if env.period == 0: inv_ech = np.cumsum(env.I[env.period] + env.T[env.period]) else: inv_ech = np.cumsum(env.I[env.period] + env.T[env.period] - env.B[env.period-1, :-1]) # Get unconstrained actions unc_actions = policy - inv_ech unc_actions = np.where(unc_actions>0, unc_actions, 0) # Ensure that actions can be fulfilled by checking # constraints inv_const = np.hstack([env.I[env.period, 1:], np.Inf]) actions = np.minimum(env.c, np.minimum(unc_actions, inv_const)) return actions

The base_stock_policy function takes the policy levels we supply and calculates the difference between the level and the inventory as described above. One thing to note, when we calculate the inventory level, we include all of the inventory in transit to the stage as well (given in env.T). For example, if the current inventory on hand for stage 0 is 100, and there is a lead time of 5 days between stage 0 and stage 1, then we take all of those orders for the past 5 days into account as well. So, if stage 0 ordered 10 units each day, then the inventory at this echelon would be 150. This makes policy levels greater than capacity meaningful because we’re looking at more than just the inventory in our warehouse today, but looking at everything in transit too.

Our DFO method needs to make function evaluation calls to see how the selected variables perform. In our case, we have an environment to evaluate, so we need a function that will run an episode of our environment and return the appropriate results.

def dfo_func(policy, env, *args): ''' Runs an episode based on current base-stock model settings. This allows us to use our environment for the DFO optimizer. ''' env.reset() # Ensure env is fresh rewards = [] done = False while not done: action = base_stock_policy(policy, env) state, reward, done, _ = env.step(action) rewards.append(reward) if done: break rewards = np.array(rewards) prob = env.demand_dist.pmf(env.D, **env.dist_param) # Return negative of expected profit return -1 / env.num_periods * np.sum(prob * rewards)

Rather than return the sum of the rewards, we’re returning the negative expectation of our rewards. The reason for the negative is the Scipy function we’re using seeks to minimize whereas our environment is designed to maximize the reward, so we invert this to ensure everything is pointing in the right direction. We calculate the expected rewards by multiplying by the probability of our demand based on the distribution. We could take more samples to estimate the distribution and calculate our expectation that way (and for many real-world applications, this would be required), but here, we have access to the true distribution so we can use that to reduce our computational burden.

Finally, we’re ready to optimize.

The following function will build an environment based on your configuration settings, take our dfo_func to evaluate, and apply Powell’s Method to the problem. It will return our policy and ensure that our answer contains only positive integers (e.g. we can’t order half a widget or a negative number of widgets).

def optimize_inventory_policy(env_name, fun, init_policy=None, env_config={}, method='Powell'): env = or_gym.make(env_name, env_config=env_config) if init_policy is None: init_policy = np.ones(env.num_stages-1) # Optimize policy out = minimize(fun=fun, x0=init_policy, args=env, method=method) policy = out.x.copy() # Policy must be positive integer policy = np.round(np.maximum(policy, 0), 0).astype(int) return policy, out

Now it’s time to put it all together.

policy, out = optimize_inventory_policy('InvManagement-v1', dfo_func)
print("Re-order levels: {}".format(policy))
print("DFO Info:\n{}".format(out))Re-order levels: [540 216 81]
DFO Info: direc: array([[ 0. , 0. , 1. ], [ 0. , 1. , 0. ], [206.39353826, 81.74560612, 28.78995703]]) fun: -0.9450780368543933 message: 'Optimization terminated successfully.' nfev: 212 nit: 5 status: 0 success: True x: array([539.7995151 , 216.38046861, 80.66902905])

Our DFO model found a fixed-stock policy with re-order levels at 540 for stage 0, 216 for stage 1, and 81 for stage 2. It did this with only 212 function evaluations, i.e. it simulated 212 episodes to find the optimal value.

We can run then feed this policy into our environment, say 1,000 times, to generate some statistics and compare it to our RL solution.

env = or_gym.make(env_name, env_config=env_config)
eps = 1000
rewards = []
for i in range(eps): env.reset() reward = 0 while True: action = base_stock_policy(policy, eenv) s, r, done, _ = env.step(action) reward += r if done: rewards.append(reward) break

Comparing Performance

Before we get into the reward comparisons, note that these are not perfect, 1:1 comparisons. As mentioned before, DFO yields us a fixed policy whereas RL has a more flexible, dynamic policy that changes based on state information. Our DFO approach was also given some information in terms of probabilities of demand to calculate the expectation on, RL had to infer that from additional sampling. So while RL learned from nearly ~65k episodes and DFO only had to make 212 function calls, they aren’t exactly comparable. Considering that to enumerate every meaningful fixed policy once would require ~200 million episodes, then RL doesn’t look so sample inefficient given its task.

So, how do these stack up?

Image by author.

What we can see above is that RL does indeed outperform our DFO policy by 11% on average (460 to 414). The RL model overtook the DFO policy after ~15k episodes and improved steadily after that. There is some higher variance with the RL policy however, with a few terrible episodes thrown in to the mix. All things considered, we did get stronger results overall from the RL approach, as expected.

In this case, neither method was very difficult to implement nor computationally intensive. I forgot to change my rl_config settings to run on my GPU and it still only took about 25 minutes to train on my laptop while the DFO model took ~2 seconds to run. More complex models may not be so friendly in either case.

Another thing to note, both methods can be very sensitive to initial conditions and neither are guaranteed to find the optimum policy in every case. If you have a problem you’d like to apply RL to, maybe use a simple DFO solver first, try a few initial conditions to get a feel for the problem, then spin up the full, RL model. You may find that the DFO policy is sufficient for your task.

Hopefully this gave a good overview of how to use these methods and the or-gym library. Leave feedback or questions if you have any!

This article was originally published on DataHubbs and re-published to TOPBOTS with permission from the author.

Enjoy this article? Sign up for more applied AI updates.

We’ll let you know when we release more technical education.

Continue Reading
AI11 mins ago

5 Work From Home Office Essentials

AI23 mins ago

zomato digitizes menus using Amazon Textract and Amazon SageMaker

AI52 mins ago

How to Improve Your Supply Chain With Deep Reinforcement Learning

AI57 mins ago

Video streaming and deep learning: Using Amazon Kinesis Video Streams with Deep Java Library

AI57 mins ago

Video streaming and deep learning: Using Amazon Kinesis Video Streams with Deep Java Library

AI4 hours ago

Conversation Designers: who are they and what do they do?

AI4 hours ago

Automating Bot Testing at Haptik

AI6 hours ago

Why Facebook’s New Machine Translation Model is a Great Step for AI

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI23 hours ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker