Connect with us


Streamlining data labeling for YOLO object detection in Amazon SageMaker Ground Truth

Object detection is a common task in computer vision (CV), and the YOLOv3 model is state-of-the-art in terms of accuracy and speed. In transfer learning, you obtain a model trained on a large but generic dataset and retrain the model on your custom dataset. One of the most time-consuming parts in transfer learning is collecting […]



Object detection is a common task in computer vision (CV), and the YOLOv3 model is state-of-the-art in terms of accuracy and speed. In transfer learning, you obtain a model trained on a large but generic dataset and retrain the model on your custom dataset. One of the most time-consuming parts in transfer learning is collecting and labeling image data to generate a custom training dataset. This post explores how to do this in Amazon SageMaker Ground Truth.

Ground Truth offers a comprehensive platform for annotating the most common data labeling jobs in CV: image classification, object detection, semantic segmentation, and instance segmentation. You can perform labeling using Amazon Mechanical Turk or create your own private team to label collaboratively. You can also use one of the third-party data labeling service providers listed on the AWS Marketplace. Ground Truth offers an intuitive interface that is easy to work with. You can communicate with labelers about specific needs for your particular task using examples and notes through the interface.

Labeling data is already hard work. Creating training data for a CV modeling task requires data collection and storage, setting up labeling jobs, and post-processing the labeled data. Moreover, not all object detection models expect the data in the same format. For example, the Faster RCNN model expects the data in the popular Pascal VOC format, which the YOLO models can’t work with. These associated steps are part of any machine learning pipeline for CV. You sometimes need to run the pipeline multiple times to improve the model incrementally. This post shows how to perform these steps efficiently by using Python scripts and get to model training as quickly as possible. This post uses the YOLO format for its use case, but the steps are mostly independent of the data format.

The image labeling step of a training data generation task is inherently manual. This post shows how to create a reusable framework to create training data for model building efficiently. Specifically, you can do the following:

  • Create the required directory structure in Amazon S3 before starting a Ground Truth job
  • Create a private team of annotators and start a Ground Truth job
  • Collect the annotations when labeling is complete and save it in a pandas dataframe
  • Post-process the dataset for model training

You can download the code presented in this post from this GitHub repo. This post demonstrates how to run the code from the AWS CLI on a local machine that can access an AWS account. For more information about setting up AWS CLI, see What Is the AWS Command Line Interface? Make sure that you configure it to access the S3 buckets in this post. Alternatively, you can run it in AWS Cloud9 or by spinning up an Amazon EC2 instance. You can also run the code blocks in an Amazon SageMaker notebook.

If you’re using an Amazon SageMaker notebook, you can still access the Linux shell of the underlying EC2 instance and follow along by opening a new terminal from the Jupyter main page and running the scripts from the /home/ec2-user/SageMaker folder.

Setting up your S3 bucket

The first thing you need to do is to upload the training images to an S3 bucket. Name the bucket ground-truth-data-labeling. You want each labeling task to have its own self-contained folder under this bucket. If you start labeling a small set of images that you keep in the first folder, but find that the model performed poorly after the first round because the data was insufficient, you can upload more images to a different folder under the same bucket and start another labeling task.

For the first labeling task, create the folder bounding_box and the following three subfolders under it:

  • images – You upload all the images in the Ground Truth labeling job to this subfolder.
  • ground_truth_annots – This subfolder starts empty; the Ground Truth job populates it automatically, and you retrieve the final annotations from here.
  • yolo_annot_files – This subfolder also starts empty, but eventually holds the annotation files ready for model training. The script populates it automatically.

If your images are in .jpeg format and available in the current working directory, you can upload the images with the following code:

aws s3 sync . s3://ground-truth-data-labeling/bounding_box/images/ --exclude "*" --include "*.jpg" 

For this use case, you use five images. There are two types of objects in the images—pencil and pen. You need to draw bounding boxes around each object in the images. The following images are examples of what you need to label. All images are available in the GitHub repo.

Creating the manifest file

A Ground Truth job requires a manifest file in JSON format that contains the Amazon S3 paths of all the images to label. You need to create this file before you can start the first Ground Truth job. The format of this file is simple:

{"source-ref": < S3 path to image1 >}
{"source-ref": < S3 path to image2 >}

However, creating the manifest file by hand would be tedious for a large number of images. Therefore, you can automate the process by running a script. You first need to create a file holding the parameters required for the scripts. Create a file input.json in your local file system with the following content:

{ "s3_bucket":"ground-truth-data-labeling", "job_id":"bounding_box", "ground_truth_job_name":"yolo-bbox", "yolo_output_dir":"yolo_annot_files"

Save the following code block in a file called

import boto3
import json def create_manifest(job_path): """ Creates the manifest file for the Ground Truth job Input: job_path: Full path of the folder in S3 for GT job Returns: manifest_file: The manifest file required for GT job """ s3_rec = boto3.resource("s3") s3_bucket = job_path.split("/")[0] prefix = job_path.replace(s3_bucket, "")[1:] image_folder = f"{prefix}/images" print(f"using images from ... {image_folder} \n") bucket = s3_rec.Bucket(s3_bucket) objs = list(bucket.objects.filter(Prefix=image_folder)) img_files = objs[1:] # first item is the folder name n_imgs = len(img_files) print(f"there are {n_imgs} images \n") TOKEN = "source-ref" manifest_file = "/tmp/manifest.json" with open(manifest_file, "w") as fout: for img_file in img_files: fname = f"s3://{s3_bucket}/{img_file.key}" fout.write(f'{{"{TOKEN}": "{fname}"}}\n') return manifest_file def upload_manifest(job_path, manifest_file): """ Uploads the manifest file into S3 Input: job_path: Full path of the folder in S3 for GT job manifest_file: Path to the local copy of the manifest file """ s3_rec = boto3.resource("s3") s3_bucket = job_path.split("/")[0] source = manifest_file.split("/")[-1] prefix = job_path.replace(s3_bucket, "")[1:] destination = f"{prefix}/{source}" print(f"uploading manifest file to {destination} \n") s3_rec.meta.client.upload_file(manifest_file, s3_bucket, destination) def main(): """ Performs the following tasks: 1. Reads input from 'input.json' 2. Collects image names from S3 and creates the manifest file for GT 3. Uploads the manifest file to S3 """ with open("input.json") as fjson: input_dict = json.load(fjson) s3_bucket = input_dict["s3_bucket"] job_id = input_dict["job_id"] gt_job_path = f"{s3_bucket}/{job_id}" man_file = create_manifest(gt_job_path) upload_manifest(gt_job_path, man_file) if __name__ == "__main__": main()

Run the following script:


This script reads the S3 bucket and job names from the input file, creates a list of images available in the images folder, creates the manifest.json file, and uploads the manifest file to the S3 bucket at s3://ground-truth-data-labeling/bounding_box/.

This method illustrates a programmatic control of the process, but you can also create the file from the Ground Truth API. For instructions, see Create a Manifest File.

At this point, the folder structure in the S3 bucket should look like the following:

ground-truth-data-labeling |-- bounding_box |-- ground_truth_annots |-- images |-- yolo_annot_files |-- manifest.json

Creating the Ground Truth job

You’re now ready to create your Ground Truth job. You need to specify the job details and task type, and create your team of labelers and labeling task details. Then you can sign in to begin the labeling job.

Specifying the job details

To specify the job details, complete the following steps:

  1. On the Amazon SageMaker console, under Ground Truth, choose Labeling jobs.

  1. On the Labeling jobs page, choose Create labeling job.

  1. In the Job overview section, for Job name, enter yolo-bbox. It should be the name you defined in the input.json file earlier.
  2. Pick Manual Data Setup under Input Data Setup.
  3. For Input dataset location, enter s3://ground-truth-data-labeling/bounding_box/manifest.json.
  4. For Output dataset location, enter s3://ground-truth-data-labeling/bounding_box/ground_truth_annots.

  1. In the Create an IAM role section, first select Create a new role from the drop down menu and then select Specific S3 buckets.
  2. Enter ground-truth-data-labeling.

  1. Choose Create.

Specifying the task type

To specify the task type, complete the following steps:

  1. In the Task selection section, from the Task Category drop-down menu, choose Image.
  2. Select Bounding box.

  1. Don’t change Enable enhanced image access, which is selected by default. It enables Cross-Origin Resource Sharing (CORS) that may be required for some workers to complete the annotation task.
  2. Choose Next.

Creating a team of labelers

To create your team of labelers, complete the following steps:

  1. In the Workers section, select Private.
  2. Follow the instructions to create a new team.

Each member of the team receives a notification email titled, “You’re invited to work on a labeling project” that has initial sign-in credentials. For this use case, create a team with just yourself as a member.

Specifying labeling task details

In the Bounding box labeling tool section, you should see the images you uploaded to Amazon S3. You should check that the paths are correct in the previous steps. To specify your task details, complete the following steps:

  1. In the text box, enter a brief description of the task.

This is critical if the data labeling team has more than one members and you want to make sure everyone follows the same rule when drawing the boxes. Any inconsistency in bounding box creation may end up confusing your object detection model. For example, if you’re labeling beverage cans and want to create a tight bounding box only around the visible logo, instead of the entire can, you should specify that to get consistent labeling from all the workers. For this use case, you can enter Please enter a tight bounding box around the entire object.

  1. Optionally, you can upload examples of a good and a bad bounding box.

You can make sure your team is consistent in their labels by providing good and bad examples.

  1. Under Labels, enter the names of the labels you’re using to identify each bounding box; in this case, pencil and pen.

A color is assigned to each label automatically, which helps to visualize the boxes created for overlapping objects.

  1. To run a final sanity check, choose Preview.

  1. Choose Create job.

Job creation can take up to a few minutes. When it’s complete, you should see a job titled yolo-bbox on the Ground Truth Labeling jobs page with In progress as the status.

  1. To view the job details, select the job.

This is a good time to verify the paths are correct; the scripts don’t run if there’s any inconsistency in names.

For more information about providing labeling instructions, see Create high-quality instructions for Amazon SageMaker Ground Truth labeling jobs.

Sign in and start labeling

After you receive the initial credentials to register as a labeler for this job, follow the link to reset the password and start labeling.

If you need to interrupt your labeling session, you can resume labeling by choosing Labeling workforces under Ground Truth on the SageMaker console.

You can find the link to the labeling portal on the Private tab. The page also lists the teams and individuals involved in this private labeling task.

After you sign in, start labeling by choosing Start working.

Because you only have five images in the dataset to label, you can finish the entire task in a single session. For larger datasets, you can pause the task by choosing Stop working and return to the task later to finish it.

Checking job status

After the labeling is complete, the status of the labeling job changes to Complete and a new JSON file called output.manifest containing the annotations appears at s3://ground-truth-data-labeling/bounding_box/ground_truth_annots/yolo-bbox/manifests/output /output.manifest.

Parsing Ground Truth annotations

You can now parse through the annotations and perform the necessary post-processing steps to make it ready for model training. Start by running the following code block:

from io import StringIO
import json
import s3fs
import boto3
import pandas as pd def parse_gt_output(manifest_path, job_name): """ Captures the json Ground Truth bounding box annotations into a pandas dataframe Input: manifest_path: S3 path to the annotation file job_name: name of the Ground Truth job Returns: df_bbox: pandas dataframe with bounding box coordinates for each item in every image """ filesys = s3fs.S3FileSystem() with as fin: annot_list = [] for line in fin.readlines(): record = json.loads(line) if job_name in record.keys(): # is it necessary? image_file_path = record["source-ref"] image_file_name = image_file_path.split("/")[-1] class_maps = record[f"{job_name}-metadata"]["class-map"] imsize_list = record[job_name]["image_size"] assert len(imsize_list) == 1 image_width = imsize_list[0]["width"] image_height = imsize_list[0]["height"] for annot in record[job_name]["annotations"]: left = annot["left"] top = annot["top"] height = annot["height"] width = annot["width"] class_name = class_maps[f'{annot["class_id"]}'] annot_list.append( [ image_file_name, class_name, left, top, height, width, image_width, image_height, ] ) df_bbox = pd.DataFrame( annot_list, columns=[ "img_file", "category", "box_left", "box_top", "box_height", "box_width", "img_width", "img_height", ], ) return df_bbox def save_df_to_s3(df_local, s3_bucket, destination): """ Saves a pandas dataframe to S3 Input: df_local: Dataframe to save s3_bucket: Bucket name destination: Prefix """ csv_buffer = StringIO() s3_resource = boto3.resource("s3") df_local.to_csv(csv_buffer, index=False) s3_resource.Object(s3_bucket, destination).put(Body=csv_buffer.getvalue()) def main(): """ Performs the following tasks: 1. Reads input from 'input.json' 2. Parses the Ground Truth annotations and creates a dataframe 3. Saves the dataframe to S3 """ with open("input.json") as fjson: input_dict = json.load(fjson) s3_bucket = input_dict["s3_bucket"] job_id = input_dict["job_id"] gt_job_name = input_dict["ground_truth_job_name"] mani_path = f"s3://{s3_bucket}/{job_id}/ground_truth_annots/{gt_job_name}/manifests/output/output.manifest" df_annot = parse_gt_output(mani_path, gt_job_name) dest = f"{job_id}/ground_truth_annots/{gt_job_name}/annot.csv" save_df_to_s3(df_annot, s3_bucket, dest) if __name__ == "__main__": main()

From the AWS CLI, save the preceding code block in the file and run:


Ground Truth returns the bounding box information using the following four numbers: x and y coordinates, and its height and width. The procedure parse_gt_output scans through the output.manifest file and stores the information for every bounding box for each image in a pandas dataframe. The procedure save_df_to_s3 saves it in a tabular format as annot.csv to the S3 bucket for further processing.

The creation of the dataframe is useful for a few reasons. JSON files are hard to read and the output.manifest file contains more information, like label metadata, than you need for the next step. The dataframe contains only the relevant information and you can visualize it easily to make sure everything looks fine.

To grab the annot.csv file from Amazon S3 and save a local copy, run the following:

aws s3 cp s3://ground-truth-data-labeling/bounding_box/ground_truth_annots/yolo-bbox/annot.csv 

You can read it back into a pandas dataframe and inspect the first few lines. See the following code:

import pandas as pd
df_ann = pd.read_csv('annot.csv')

The following screenshot shows the results.

You also capture the size of the image through img_width and img_height. This is necessary because the object detection models need to know the location of each bounding box within the image. In this case, you can see that images in the dataset were captured with a 4608×3456 pixel resolution.

There are quite a few reasons why it is a good idea to save the annotation information into a dataframe:

  • In a subsequent step, you need to rescale the bounding box coordinates into a YOLO-readable format. You can do this operation easily in a dataframe.
  • If you decide to capture and label more images in the future to augment the existing dataset, all you need to do is join the newly created dataframe with the existing one. Again, you can perform this easily using a dataframe.
  • As of this writing, Ground Truth doesn’t allow through the console more than 30 different categories to label in the same job. If you have more categories in your dataset, you have to label them under multiple Ground Truth jobs and combine them. Ground Truth associates each bounding box to an integer index in the output.manifest file. Therefore, the integer labels are different across multiple Ground Truth jobs if you have more than 30 categories. Having the annotations as dataframes makes the task of combining them easier and takes care of the conflict of category names across multiple jobs. In the preceding screenshot, you can see that you used the actual names under the category column instead of the integer index.

Generating YOLO annotations

You’re now ready to reformat the bounding box coordinates Ground Truth provided into a format the YOLO model accepts.

In the YOLO format, each bounding box is described by the center coordinates of the box and its width and height. Each number is scaled by the dimensions of the image; therefore, they all range between 0 and 1. Instead of category names, YOLO models expect the corresponding integer categories.

Therefore, you need to map each name in the category column of the dataframe into a unique integer. Moreover, the official Darknet implementation of YOLOv3 needs to have the name of the image match the annotation text file name. For example, if the image file is pic01.jpg, the corresponding annotation file should be named pic01.txt.

The following code block performs all these tasks:

import os
import json
from io import StringIO
import boto3
import s3fs
import pandas as pd def annot_yolo(annot_file, cats): """ Prepares the annotation in YOLO format Input: annot_file: csv file containing Ground Truth annotations ordered_cats: List of object categories in proper order for model training Returns: df_ann: pandas dataframe with the following columns img_file int_category box_center_w box_center_h box_width box_height Note: YOLO data format: <object-class> <x_center> <y_center> <width> <height> """ df_ann = pd.read_csv(annot_file) df_ann["int_category"] = df_ann["category"].apply(lambda x: cats.index(x)) df_ann["box_center_w"] = df_ann["box_left"] + df_ann["box_width"] / 2 df_ann["box_center_h"] = df_ann["box_top"] + df_ann["box_height"] / 2 # scale box dimensions by image dimensions df_ann["box_center_w"] = df_ann["box_center_w"] / df_ann["img_width"] df_ann["box_center_h"] = df_ann["box_center_h"] / df_ann["img_height"] df_ann["box_width"] = df_ann["box_width"] / df_ann["img_width"] df_ann["box_height"] = df_ann["box_height"] / df_ann["img_height"] return df_ann def save_annots_to_s3(s3_bucket, prefix, df_local): """ For every image in the dataset, save a text file with annotation in YOLO format Input: s3_bucket: S3 bucket name prefix: Folder name under s3_bucket where files will be written df_local: pandas dataframe with the following columns img_file int_category box_center_w box_center_h box_width box_height """ unique_images = df_local["img_file"].unique() s3_resource = boto3.resource("s3") for image_file in unique_images: df_single_img_annots = df_local.loc[df_local.img_file == image_file] annot_txt_file = image_file.split(".")[0] + ".txt" destination = f"{prefix}/{annot_txt_file}" csv_buffer = StringIO() df_single_img_annots.to_csv( csv_buffer, index=False, header=False, sep=" ", float_format="%.4f", columns=[ "int_category", "box_center_w", "box_center_h", "box_width", "box_height", ], ) s3_resource.Object(s3_bucket, destination).put(Body=csv_buffer.getvalue()) def get_cats(json_file): """ Makes a list of the category names in proper order Input: json_file: s3 path of the json file containing the category information Returns: cats: List of category names """ filesys = s3fs.S3FileSystem() with as fin: line = fin.readline() record = json.loads(line) labels = [item["label"] for item in record["labels"]] return labels def main(): """ Performs the following tasks: 1. Reads input from 'input.json' 2. Collect the category names from the Ground Truth job 3. Creates a dataframe with annotaion in YOLO format 4. Saves a text file in S3 with YOLO annotations for each of the labeled images """ with open("input.json") as fjson: input_dict = json.load(fjson) s3_bucket = input_dict["s3_bucket"] job_id = input_dict["job_id"] gt_job_name = input_dict["ground_truth_job_name"] yolo_output = input_dict["yolo_output_dir"] s3_path_cats = ( f"s3://{s3_bucket}/{job_id}/ground_truth_annots/{gt_job_name}/annotation-tool/data.json" ) categories = get_cats(s3_path_cats) print("\n labels used in Ground Truth job: ") print(categories, "\n") gt_annot_file = "annot.csv" s3_dir = f"{job_id}/{yolo_output}" print(f"annotation files saved in = ", s3_dir) df_annot = annot_yolo(gt_annot_file, categories) save_annots_to_s3(s3_bucket, s3_dir, df_annot) if __name__ == "__main__": main()

From the AWS CLI, save the preceding code block in a file and run:


The annot_yolo procedure transforms the dataframe you created by rescaling the box coordinates by the image size, and the save_annots_to_s3 procedure saves the annotations corresponding to each image into a text file and stores it in Amazon S3.

You can now inspect a couple of images and their corresponding annotations to make sure they’re properly formatted for model training. However, you first need to write a procedure to draw YOLO formatted bounding boxes on an image. Save the following code block in

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import matplotlib.colors as mcolors
import argparse def visualize_bbox(img_file, yolo_ann_file, label_dict, figure_size=(6, 8)): """ Plots bounding boxes on images Input: img_file : numpy.array yolo_ann_file: Text file containing annotations in YOLO format label_dict: Dictionary of image categories figure_size: Figure size """ img = mpimg.imread(img_file) fig, ax = plt.subplots(1, 1, figsize=figure_size) ax.imshow(img) im_height, im_width, _ = img.shape palette = mcolors.TABLEAU_COLORS colors = [c for c in palette.keys()] with open(yolo_ann_file, "r") as fin: for line in fin: cat, center_w, center_h, width, height = line.split() cat = int(cat) category_name = label_dict[cat] left = (float(center_w) - float(width) / 2) * im_width top = (float(center_h) - float(height) / 2) * im_height width = float(width) * im_width height = float(height) * im_height rect = plt.Rectangle( (left, top), width, height, fill=False, linewidth=2, edgecolor=colors[cat], ) ax.add_patch(rect) props = dict(boxstyle="round", facecolor=colors[cat], alpha=0.5) ax.text( left, top, category_name, fontsize=14, verticalalignment="top", bbox=props, ) def main(): """ Plots bounding boxes """ labels = {0: "pen", 1: "pencil"} parser = argparse.ArgumentParser() parser.add_argument("img", help="image file") args = parser.parse_args() img_file = args.img ann_file = img_file.split(".")[0] + ".txt" visualize_bbox(img_file, ann_file, labels, figure_size=(6, 8)) if __name__ == "__main__": main()

Download an image and the corresponding annotation file from Amazon S3. See the following code:

aws s3 cp s3://ground-truth-data-labeling/bounding_box/yolo_annot_files/IMG_20200816_205004.txt .

aws s3 cp s3://ground-truth-data-
labeling/bounding_box/images/IMG_20200816_205004.jpg .

To display the correct label of each bounding box, you need to specify the names of the objects you labeled in a dictionary and pass it to visualize_bbox. For this use case, you only have two items in the list. However, the order of the labels is important—it should match the order you used while creating the Ground Truth labeling job. If you can’t remember the order, you can access the information from the s3://data-labeling-ground-truth/bounding_box/ground_truth_annots/bbox-yolo/annotation-tool/data.json

file in Amazon S3, which the Ground Truth job creates automatically.

The contents of the data.json file the task look like the following code:


Therefore, a dictionary with the labels as follows was created in

labels = {0: 'pencil', 1: 'pen'}

Now run the following to visualize the image:

python IMG_20200816_205004.jpg

The following screenshot shows the bounding boxes correctly drawn around two pens.

To plot an image with a mix of pens and pencils, get the image and the corresponding annotation text from Amazon S3. See the following code:

aws s3 cp s3://ground-truth-data-labeling/bounding_box/yolo_annot_files/IMG_20200816_205029.txt . aws s3 cp s3://ground-truth-data-
labeling/bounding_box/images/IMG_20200816_205029.jpg .

Override the default image size in the  visualize_bbox  procedure to (10, 12) and run the following:

python IMG_20200816_205029.jpg

The following screenshot shows three bounding boxes correctly drawn around two types of objects.


This post described how to create an efficient, end-to-end data-gathering pipeline in Amazon Ground Truth for an object detection model. Try out this process yourself next time you are creating an object detection model. You can modify the post-processing annotations to produce labeled data in the Pascal VOC format, which is required for models like Faster RCNN. You can also adopt the basic framework to other data-labeling pipelines with job-specific modifications. For example, you can rewrite the annotation post-processing procedures to adopt the framework for an instance segmentation task, in which an object is labeled at the pixel level instead of drawing a rectangle around the object. Amazon Ground Truth gets regularly updated with enhanced capabilities. Therefore, check  the documentation for the most up to date features.

About the Author

Arkajyoti Misra is a Data Scientist working in AWS Professional Services. He loves to dig into Machine Learning algorithms and enjoys reading about new frontiers in Deep Learning.



Graph Convolutional Networks (GCN)

In this post, we’re gonna take a close look at one of the well-known graph neural networks named Graph Convolutional Network (GCN). First, we’ll get the intuition to see how it works, then we’ll go deeper into the maths behind it. Why Graphs? Many problems are graphs in true nature. In our world, we see many data are graphs, […]

The post Graph Convolutional Networks (GCN) appeared first on TOPBOTS.



graph convolutional networks

In this post, we’re gonna take a close look at one of the well-known graph neural networks named Graph Convolutional Network (GCN). First, we’ll get the intuition to see how it works, then we’ll go deeper into the maths behind it.

Why Graphs?

Many problems are graphs in true nature. In our world, we see many data are graphs, such as molecules, social networks, and paper citations networks.

Tasks on Graphs

  • Node classification: Predict a type of a given node
  • Link prediction: Predict whether two nodes are linked
  • Community detection: Identify densely linked clusters of nodes
  • Network similarity: How similar are two (sub)networks

Machine Learning Lifecycle

In the graph, we have node features (the data of nodes) and the structure of the graph (how nodes are connected).

For the former, we can easily get the data from each node. But when it comes to the structure, it is not trivial to extract useful information from it. For example, if 2 nodes are close to one another, should we treat them differently to other pairs? How about high and low degree nodes? In fact, each specific task can consume a lot of time and effort just for Feature Engineering, i.e., to distill the structure into our features.

graph convolutional network
Feature engineering on graphs. (Picture from [1])

It would be much better to somehow get both the node features and the structure as the input, and let the machine to figure out what information is useful by itself.

That’s why we need Graph Representation Learning.

graph convolutional network
We want the graph can learn the “feature engineering” by itself. (Picture from [1])

If this in-depth educational content on convolutional neural networks is useful for you, you can subscribe to our AI research mailing list to be alerted when we release new material. 

Graph Convolutional Networks (GCNs)

Paper: Semi-supervised Classification with Graph Convolutional Networks (2017) [3]

GCN is a type of convolutional neural network that can work directly on graphs and take advantage of their structural information.

it solves the problem of classifying nodes (such as documents) in a graph (such as a citation network), where labels are only available for a small subset of nodes (semi-supervised learning).

graph convolutional network
Example of Semi-supervised learning on Graphs. Some nodes dont have labels (unknown nodes).

Main Ideas

As the name “Convolutional” suggests, the idea was from Images and then brought to Graphs. However, when Images have a fixed structure, Graphs are much more complex.

graph convolutional network
Convolution idea from images to graphs. (Picture from [1])

The general idea of GCN: For each node, we get the feature information from all its neighbors and of course, the feature of itself. Assume we use the average() function. We will do the same for all the nodes. Finally, we feed these average values into a neural network.

In the following figure, we have a simple example with a citation network. Each node represents a research paper, while edges are the citations. We have a pre-process step here. Instead of using the raw papers as features, we convert the papers into vectors (by using NLP embedding, e.g., tf–idf).

Let’s consider the green node. First off, we get all the feature values of its neighbors, including itself, then take the average. The result will be passed through a neural network to return a resulting vector.

graph convolutional network
The main idea of GCN. Consider the green node. First, we take the average of all its neighbors, including itself. After that, the average value is passed through a neural network. Note that, in GCN, we simply use a fully connected layer. In this example, we get 2-dimension vectors as the output (2 nodes at the fully connected layer).

In practice, we can use more sophisticated aggregate functions rather than the average function. We can also stack more layers on top of each other to get a deeper GCN. The output of a layer will be treated as the input for the next layer.

graph convolutional network
Example of 2-layer GCN: The output of the first layer is the input of the second layer. Again, note that the neural network in GCN is simply a fully connected layer (Picture from [2])

Let’s take a closer look at the maths to see how it really works.

Intuition and the Maths behind

First, we need some notations

Let’s consider a graph G as below.

graph convolutional network
From the graph G, we have an adjacency matrix A and a Degree matrix D. We also have feature matrix X.

How can we get all the feature values from neighbors for each node? The solution lies in the multiplication of A and X.

Take a look at the first row of the adjacency matrix, we see that node A has a connection to E. The first row of the resulting matrix is the feature vector of E, which A connects to (Figure below). Similarly, the second row of the resulting matrix is the sum of feature vectors of D and E. By doing this, we can get the sum of all neighbors’ vectors.

graph convolutional network
Calculate the first row of the “sum vector matrix” AX
  • There are still some things that need to improve here.
  1. We miss the feature of the node itself. For example, the first row of the result matrix should contain features of node A too.
  2. Instead of sum() function, we need to take the average, or even better, the weighted average of neighbors’ feature vectors. Why don’t we use the sum() function? The reason is that when using the sum() function, high-degree nodes are likely to have huge v vectors, while low-degree nodes tend to get small aggregate vectors, which may later cause exploding or vanishing gradients (e.g., when using sigmoid). Besides, Neural networks seem to be sensitive to the scale of input data. Thus, we need to normalize these vectors to get rid of the potential issues.

In Problem (1), we can fix by adding an Identity matrix I to A to get a new adjacency matrix Ã.

Pick lambda = 1 (the feature of the node itself is just important as its neighbors), we have Ã = A + I. Note that we can treat lambda as a trainable parameter, but for now, just assign the lambda to 1, and even in the paper, lambda is just simply assigned to 1.

By adding a self-loop to each node, we have the new adjacency matrix

Problem (2)For matrix scaling, we usually multiply the matrix by a diagonal matrix. In this case, we want to take the average of the sum feature, or mathematically, to scale the sum vector matrix ÃX according to the node degrees. The gut feeling tells us that our diagonal matrix used to scale here is something related to the Degree matrix D̃ (Why , not D? Because we’re considering Degree matrix  of new adjacency matrix Ã, not A anymore).

The problem now becomes how we want to scale/normalize the sum vectors? In other words:

How we pass the information from neighbors to a specific node?

We would start with our old friend average. In this case, D̃ inverse (i.e., D̃^{-1}) comes into play. Basically, each element in D̃ inverse is the reciprocal of its corresponding term of the diagonal matrix D.

For example, node A has a degree of 2, so we multiple the sum vectors of node A by 1/2, while node E has a degree of 5, we should multiple the sum vector of E by 1/5, and so on.

Thus, by taking the multiplication of D̃ inverse and X, we can take the average of all neighbors’ feature vectors (including itself).

So far so good. But you may ask How about the weighted average()?. Intuitively, it should be better if we treat high and low degree nodes differently.

We’re just scaling by rows, but ignoring their corresponding columns (dash boxes)
Add a new scaler for columns.

The new scaler gives us the “weighted” average. What are we doing here is to put more weights on the nodes that have low-degree and reduce the impact of high-degree nodes. The idea of this weighted average is that we assume low-degree nodes would have bigger impacts on their neighbors, whereas high-degree nodes generate lower impacts as they scatter their influence at too many neighbors.

graph convolutional network
When aggregating feature at node B, we assign the biggest weight for node B itself (degree of 3), and the lowest weight for node E (degree of 5)
Because we normalize twice, we change “-1” to “-1/2”

For example, we have a multi-classification problem with 10 classes, F will be set to 10. After having the 10-dimension vectors at layer 2, we pass these vectors through a softmax function for the prediction.

The Loss function is simply calculated by the cross-entropy error over all labeled examples, where Y_{l} is the set of node indices that have labels.

The number of layers

The meaning of #layers

The number of layers is the farthest distance that node features can travel. For example, with 1 layer GCN, each node can only get the information from its neighbors. The gathering information process takes place independentlyat the same time for all the nodes.

When stacking another layer on top of the first one, we repeat the gathering info process, but this time, the neighbors already have information about their own neighbors (from the previous step). It makes the number of layers as the maximum number of hops that each node can travel. So, depends on how far we think a node should get information from the networks, we can config a proper number for #layers. But again, in the graph, normally we don’t want to go too far. With 6–7 hops, we almost get the entire graph which makes the aggregation less meaningful.

graph convolutional network
Example: Gathering info process with 2 layers of target node i

How many layers should we stack the GCN?

In the paper, the authors also conducted some experiments with shallow and deep GCNs. From the figure below, we see that the best results are obtained with a 2- or 3-layer model. Besides, with a deep GCN (more than 7 layers), it tends to get bad performances (dashed blue line). One solution is to use the residual connections between hidden layers (purple line).

graph convolutional network
Performance over #layers. Picture from the paper [3]

Take home notes

  • GCNs are used for semi-supervised learning on the graph.
  • GCNs use both node features and the structure for the training
  • The main idea of the GCN is to take the weighted average of all neighbors’ node features (including itself): Lower-degree nodes get larger weights. Then, we pass the resulting feature vectors through a neural network for training.
  • We can stack more layers to make GCNs deeper. Consider residual connections for deep GCNs. Normally, we go for 2 or 3-layer GCN.
  • Maths Note: When seeing a diagonal matrix, think of matrix scaling.
  • A demo for GCN with StellarGraph library here [5]. The library also provides many other algorithms for GNNs.

Note from the authors of the paper: The framework is currently limited to undirected graphs (weighted or unweighted). However, it is possible to handle both directed edges and edge features by representing the original directed graph as an undirected bipartite graph with additional nodes that represent edges in the original graph.

What’s next?

With GCNs, it seems we can make use of both the node features and the structure of the graph. However, what if the edges have different types? Should we treat each relationship differently? How to aggregate neighbors in this case? What are the advanced approaches recently?

In the next post of the graph topic, we will look into some more sophisticated methods.

graph convolutional network
How to deal with different relationships on the edges (brother, friend,….)?


[1] Excellent slides on Graph Representation Learning by Jure Leskovec (Stanford):

[2] Video Graph Convolutional Networks (GCNs) made simple:

[3] Paper Semi-supervised Classification with Graph Convolutional Networks (2017):

[4] GCN source code:

[5] Demo with StellarGraph library:

This article was originally published on Medium and re-published to TOPBOTS with permission from the author.

Enjoy this article? Sign up for more computer vision updates.

We’ll let you know when we release more technical education.

Continue Reading


Microsoft BOT Framework — Loops



Loops is one of the basic programming structure in any programming language. In this article, I would demonstrate Loops within Microsoft BOT framework.

To follow this article clearly, please have a quick read on the basics of the Microsoft BOT framework. I wrote a couple of articles sometime back and the links are below:

Let’s Get Started.

I would be using the example of a TaxiBot described in one of my previous article. The BOT asks some general questions and books a Taxi for the user. In this article, I would be providing an option to the user to choose there preferred cars for the ride. The flow will look like below:

Create a new Dialog Class for Loops

We would need 2 Dialog classes to be able to achieve this task:

  1. SuperTaxiBotDialog.cs: This would be the main dialog class. The waterfall will contains all the steps as defined in the previous article.
  2. ChooseCarDialog.cs: A new dialog class will be created which would allow the user to pick preferred cars. The loop will be defined in this class.

The water fall steps for both the classes could be visualized as:

The complete code base is present on the Github page.

Important Technical Aspects

  • Link between the Dialogs: In the constructor initialization of SuperTaxiBotDialog, add a dialog for ChooseCarDialog by adding the line:
AddDialog(new ChooseCarDialog());

1. 8 Proven Ways to Use Chatbots for Marketing (with Real Examples)

2. How to Use Texthero to Prepare a Text-based Dataset for Your NLP Project

3. 5 Top Tips For Human-Centred Chatbot Design

4. Chatbot Conference Online

  • Call ChooseCarDialog from SuperTaxiBotDialog: SuperTaxiBotDialog calls ChooseCarDialog from the step SetPreferredCars, hence the return statement of the step should be like:
await stepContext.BeginDialogAsync(nameof(ChooseCarDialog), null, cancellationToken);
  • Return the flow back from ChooseCarDialog to SuperTaxiBotDialog: Once the user has selected 2 cars, the flow has to be sent back to SuperTaxiBotDialog from the step LoopCarAsync. This should be achieved by ending the ChooseCarDialog in the step LoopCarAsync.
return await stepContext.EndDialogAsync(carsSelected, cancellationToken);

The complete code base is present on the Github page.

Once the project is executed using BOT Framework Emulator, the output would look like:

Hopefully, this article will help the readers in implementing a loop with Microsoft BOT framework. For questions: Hit me.




Continue Reading


The Bleeding Edge of Voice

This fall, a little known event is starting to make waves. As COVID dominates the headlines, an event called “Voice Launch” is pulling…



Tapaan Chauhan

This fall, a little known event is starting to make waves. As COVID dominates the headlines, an event called “Voice Launch” is pulling together an impressive roster of start-ups and voice tech companies intending to uncover the next big ideas and start-ups in voice.

While voice tech has been around for a while, as the accuracy of speech recognition improves, it moves into its prime. “As speech recognition moves from 85% to 95% accuracy, who will use a keyboard anymore?” says Voice Launch organizer Eric Sauve. “And that new, more natural way to interact with our devices will usher in a series of technological advances,” he added.

Voice technology is something that has been dreamt of and worked on for decades all over the world. Why? Well, the answer is very straightforward. Voice recognition allows consumers to multitask by merely speaking to their Google Home, Amazon Alexa, Siri, etc. Digital voice recording works by recording a voice sample of a person’s speech and quickly converting it into written texts using machine language and sophisticated algorithms. Voice input is just the more efficient form of computing, says Mary Meeker in her ‘Annual Internet Trends Report.’ As a matter of fact, according to ComScore, 50% of all searches will be done by voice by 2020, and 30% of searches will be done without even a screen, according to Gartner. As voice becomes a part of things we use every day like our cars, phones, etc. it will become the new “norm.”

The event includes a number of inspiration sessions meant to help start-ups and founders pick the best strategies. Companies presenting here include industry leaders like Google and Amazon and less known hyper-growth voice tech companies like Deepgram and Balto and VCs like OMERS Ventures and Techstars.

But the focus of the event is the voice tech start-ups themselves, and this year’s event has some interesting participants. Start-ups will pitch their ideas, and the audience will vote to select the winners. The event is a cross between a standard pitchfest and Britain’s Got Talent.


Continue Reading
AI2 hours ago

Graph Convolutional Networks (GCN)

AI4 hours ago

Microsoft BOT Framework — Loops

AI4 hours ago

The Bleeding Edge of Voice

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints

AI19 hours ago

Using Amazon SageMaker inference pipelines with multi-model endpoints