AI
Amazon Translate ranked as #1 machine translation provider by Intento
Customer obsession, one of the key Amazon Leadership principles that guides everything we do at Amazon, has helped Amazon Translate be recognized as an industry leading neural machine translation provider. This year, Intento ranked Amazon Translate #1 on the list of topperforming machine translation providers in its The State of Machine Translation 2020 report. We are […]
Customer obsession, one of the key Amazon Leadership principles that guides everything we do at Amazon, has helped Amazon Translate be recognized as an industry leading neural machine translation provider. This year, Intento ranked Amazon Translate #1 on the list of topperforming machine translation providers in its The State of Machine Translation 2020 report. We are excited to be recognized for pursuing our passion—designing the best customer experience in machine translation.
Amazon Translate is a neural machine translation service that delivers fast, highquality, and affordable language translation. Neural machine translation is a form of machine translation that uses deep learning models to deliver more accurate and more natural sounding translation than traditional statistical and rulebased translation algorithms. Amazon Translate’s development has been fueled by customer feedback, leading to a steady stream of rich features that help you reach more people in more places—without breaking your translation services budget.
Intento is one of the leading organizations helping global companies procure and utilize the bestfit cognitive AI services. In this independent study, Intento evaluated 15 of the most prominent MT providers used by language service providers and localization services. The data used included examples from across 16 industry sectors, with 8 content types, including topics such as financial documentation, patents, sales and marketing material. These inputs were translated between 14 common language pairs to determine the best engine for a given translation scenario. It ranked the results of each MT engine based on how they compared to a reference human translation.
The results showed that no MT service is best in all language pairs across all industry sectors and content types. However, Amazon Translate had the highest number of instances in which it was rated “best.”
At Amazon, we strive to bring the most value to our customers and deliver the world’s best machine translation service! If your company is looking for machine translation, please contact us. We’d love to show you what Amazon Translate can do.
More information on the features and capabilities that Intento considered in its analysis of the top MT providers is available in the full report (registration is required).
About the Author
Greg Rushing is a US Air Force Fellow in Amazon’s BRIDGE program. He is currently working with the Amazon Translate Product Management team, where he focuses on coordinating the activities required to bring Amazon Translate features to market. Outside of work, you can find him spending time exploring the outdoors with his family, doing auto repair, or woodworking.
AI
Graph Convolutional Networks (GCN)
In this post, we’re gonna take a close look at one of the wellknown graph neural networks named Graph Convolutional Network (GCN). First, we’ll get the intuition to see how it works, then we’ll go deeper into the maths behind it. Why Graphs? Many problems are graphs in true nature. In our world, we see many data are graphs, […]
The post Graph Convolutional Networks (GCN) appeared first on TOPBOTS.
In this post, we’re gonna take a close look at one of the wellknown graph neural networks named Graph Convolutional Network (GCN). First, we’ll get the intuition to see how it works, then we’ll go deeper into the maths behind it.
Why Graphs?
Many problems are graphs in true nature. In our world, we see many data are graphs, such as molecules, social networks, and paper citations networks.
Tasks on Graphs
 Node classification: Predict a type of a given node
 Link prediction: Predict whether two nodes are linked
 Community detection: Identify densely linked clusters of nodes
 Network similarity: How similar are two (sub)networks
Machine Learning Lifecycle
In the graph, we have node features (the data of nodes) and the structure of the graph (how nodes are connected).
For the former, we can easily get the data from each node. But when it comes to the structure, it is not trivial to extract useful information from it. For example, if 2 nodes are close to one another, should we treat them differently to other pairs? How about high and low degree nodes? In fact, each specific task can consume a lot of time and effort just for Feature Engineering, i.e., to distill the structure into our features.
It would be much better to somehow get both the node features and the structure as the input, and let the machine to figure out what information is useful by itself.
That’s why we need Graph Representation Learning.
If this indepth educational content on convolutional neural networks is useful for you, you can subscribe to our AI research mailing list to be alerted when we release new material.
Graph Convolutional Networks (GCNs)
Paper: Semisupervised Classification with Graph Convolutional Networks (2017) [3]
GCN is a type of convolutional neural network that can work directly on graphs and take advantage of their structural information.
it solves the problem of classifying nodes (such as documents) in a graph (such as a citation network), where labels are only available for a small subset of nodes (semisupervised learning).
Main Ideas
As the name “Convolutional” suggests, the idea was from Images and then brought to Graphs. However, when Images have a fixed structure, Graphs are much more complex.
The general idea of GCN: For each node, we get the feature information from all its neighbors and of course, the feature of itself. Assume we use the average() function. We will do the same for all the nodes. Finally, we feed these average values into a neural network.
In the following figure, we have a simple example with a citation network. Each node represents a research paper, while edges are the citations. We have a preprocess step here. Instead of using the raw papers as features, we convert the papers into vectors (by using NLP embedding, e.g., tf–idf).
Let’s consider the green node. First off, we get all the feature values of its neighbors, including itself, then take the average. The result will be passed through a neural network to return a resulting vector.
In practice, we can use more sophisticated aggregate functions rather than the average function. We can also stack more layers on top of each other to get a deeper GCN. The output of a layer will be treated as the input for the next layer.
Let’s take a closer look at the maths to see how it really works.
Intuition and the Maths behind
First, we need some notations
Let’s consider a graph G as below.
How can we get all the feature values from neighbors for each node? The solution lies in the multiplication of A and X.
Take a look at the first row of the adjacency matrix, we see that node A has a connection to E. The first row of the resulting matrix is the feature vector of E, which A connects to (Figure below). Similarly, the second row of the resulting matrix is the sum of feature vectors of D and E. By doing this, we can get the sum of all neighbors’ vectors.
 There are still some things that need to improve here.
 We miss the feature of the node itself. For example, the first row of the result matrix should contain features of node A too.
 Instead of sum() function, we need to take the average, or even better, the weighted average of neighbors’ feature vectors. Why don’t we use the sum() function? The reason is that when using the sum() function, highdegree nodes are likely to have huge v vectors, while lowdegree nodes tend to get small aggregate vectors, which may later cause exploding or vanishing gradients (e.g., when using sigmoid). Besides, Neural networks seem to be sensitive to the scale of input data. Thus, we need to normalize these vectors to get rid of the potential issues.
In Problem (1), we can fix by adding an Identity matrix I to A to get a new adjacency matrix Ã.
Pick lambda = 1 (the feature of the node itself is just important as its neighbors), we have Ã = A + I. Note that we can treat lambda as a trainable parameter, but for now, just assign the lambda to 1, and even in the paper, lambda is just simply assigned to 1.
Problem (2): For matrix scaling, we usually multiply the matrix by a diagonal matrix. In this case, we want to take the average of the sum feature, or mathematically, to scale the sum vector matrix ÃX according to the node degrees. The gut feeling tells us that our diagonal matrix used to scale here is something related to the Degree matrix D̃ (Why D̃, not D? Because we’re considering Degree matrix D̃ of new adjacency matrix Ã, not A anymore).
The problem now becomes how we want to scale/normalize the sum vectors? In other words:
How we pass the information from neighbors to a specific node?
We would start with our old friend average. In this case, D̃ inverse (i.e., D̃^{1}) comes into play. Basically, each element in D̃ inverse is the reciprocal of its corresponding term of the diagonal matrix D.
For example, node A has a degree of 2, so we multiple the sum vectors of node A by 1/2, while node E has a degree of 5, we should multiple the sum vector of E by 1/5, and so on.
Thus, by taking the multiplication of D̃ inverse and X, we can take the average of all neighbors’ feature vectors (including itself).
So far so good. But you may ask How about the weighted average()?. Intuitively, it should be better if we treat high and low degree nodes differently.
The new scaler gives us the “weighted” average. What are we doing here is to put more weights on the nodes that have lowdegree and reduce the impact of highdegree nodes. The idea of this weighted average is that we assume lowdegree nodes would have bigger impacts on their neighbors, whereas highdegree nodes generate lower impacts as they scatter their influence at too many neighbors.
For example, we have a multiclassification problem with 10 classes, F will be set to 10. After having the 10dimension vectors at layer 2, we pass these vectors through a softmax function for the prediction.
The Loss function is simply calculated by the crossentropy error over all labeled examples, where Y_{l} is the set of node indices that have labels.
The number of layers
The meaning of #layers
The number of layers is the farthest distance that node features can travel. For example, with 1 layer GCN, each node can only get the information from its neighbors. The gathering information process takes place independently, at the same time for all the nodes.
When stacking another layer on top of the first one, we repeat the gathering info process, but this time, the neighbors already have information about their own neighbors (from the previous step). It makes the number of layers as the maximum number of hops that each node can travel. So, depends on how far we think a node should get information from the networks, we can config a proper number for #layers. But again, in the graph, normally we don’t want to go too far. With 6–7 hops, we almost get the entire graph which makes the aggregation less meaningful.
How many layers should we stack the GCN?
In the paper, the authors also conducted some experiments with shallow and deep GCNs. From the figure below, we see that the best results are obtained with a 2 or 3layer model. Besides, with a deep GCN (more than 7 layers), it tends to get bad performances (dashed blue line). One solution is to use the residual connections between hidden layers (purple line).
Take home notes
 GCNs are used for semisupervised learning on the graph.
 GCNs use both node features and the structure for the training
 The main idea of the GCN is to take the weighted average of all neighbors’ node features (including itself): Lowerdegree nodes get larger weights. Then, we pass the resulting feature vectors through a neural network for training.
 We can stack more layers to make GCNs deeper. Consider residual connections for deep GCNs. Normally, we go for 2 or 3layer GCN.
 Maths Note: When seeing a diagonal matrix, think of matrix scaling.
 A demo for GCN with StellarGraph library here [5]. The library also provides many other algorithms for GNNs.
Note from the authors of the paper: The framework is currently limited to undirected graphs (weighted or unweighted). However, it is possible to handle both directed edges and edge features by representing the original directed graph as an undirected bipartite graph with additional nodes that represent edges in the original graph.
What’s next?
With GCNs, it seems we can make use of both the node features and the structure of the graph. However, what if the edges have different types? Should we treat each relationship differently? How to aggregate neighbors in this case? What are the advanced approaches recently?
In the next post of the graph topic, we will look into some more sophisticated methods.
REFERENCES
[1] Excellent slides on Graph Representation Learning by Jure Leskovec (Stanford): https://drive.google.com/file/d/1By3udbOt10moIcSEgUQ0TR9twQX9Aq0G/view?usp=sharing
[2] Video Graph Convolutional Networks (GCNs) made simple: https://www.youtube.com/watch?v=2KRAOZIULzw
[3] Paper Semisupervised Classification with Graph Convolutional Networks (2017): https://arxiv.org/pdf/1609.02907.pdf
[4] GCN source code: https://github.com/tkipf/gcn
[5] Demo with StellarGraph library: https://stellargraph.readthedocs.io/en/stable/demos/nodeclassification/gcnnodeclassification.html
This article was originally published on Medium and republished to TOPBOTS with permission from the author.
Enjoy this article? Sign up for more computer vision updates.
We’ll let you know when we release more technical education.
Related
AI
Microsoft BOT Framework — Loops
Loops is one of the basic programming structure in any programming language. In this article, I would demonstrate Loops within Microsoft BOT framework.
To follow this article clearly, please have a quick read on the basics of the Microsoft BOT framework. I wrote a couple of articles sometime back and the links are below:
Let’s Get Started.
I would be using the example of a TaxiBot described in one of my previous article. The BOT asks some general questions and books a Taxi for the user. In this article, I would be providing an option to the user to choose there preferred cars for the ride. The flow will look like below:
Create a new Dialog Class for Loops
We would need 2 Dialog classes to be able to achieve this task:
 SuperTaxiBotDialog.cs: This would be the main dialog class. The waterfall will contains all the steps as defined in the previous article.
 ChooseCarDialog.cs: A new dialog class will be created which would allow the user to pick preferred cars. The loop will be defined in this class.
The water fall steps for both the classes could be visualized as:
The complete code base is present on the Github page.
Important Technical Aspects
 Link between the Dialogs: In the constructor initialization of SuperTaxiBotDialog, add a dialog for ChooseCarDialog by adding the line:
AddDialog(new ChooseCarDialog());
1. 8 Proven Ways to Use Chatbots for Marketing (with Real Examples)
2. How to Use Texthero to Prepare a Textbased Dataset for Your NLP Project
 Call ChooseCarDialog from SuperTaxiBotDialog: SuperTaxiBotDialog calls ChooseCarDialog from the step SetPreferredCars, hence the return statement of the step should be like:
await stepContext.BeginDialogAsync(nameof(ChooseCarDialog), null, cancellationToken);
 Return the flow back from ChooseCarDialog to SuperTaxiBotDialog: Once the user has selected 2 cars, the flow has to be sent back to SuperTaxiBotDialog from the step LoopCarAsync. This should be achieved by ending the ChooseCarDialog in the step LoopCarAsync.
return await stepContext.EndDialogAsync(carsSelected, cancellationToken);
The complete code base is present on the Github page.
Once the project is executed using BOT Framework Emulator, the output would look like:
Hopefully, this article will help the readers in implementing a loop with Microsoft BOT framework. For questions: Hit me.
Regards
Tarun
AI
The Bleeding Edge of Voice
This fall, a little known event is starting to make waves. As COVID dominates the headlines, an event called “Voice Launch” is pulling…
This fall, a little known event is starting to make waves. As COVID dominates the headlines, an event called “Voice Launch” is pulling together an impressive roster of startups and voice tech companies intending to uncover the next big ideas and startups in voice.
While voice tech has been around for a while, as the accuracy of speech recognition improves, it moves into its prime. “As speech recognition moves from 85% to 95% accuracy, who will use a keyboard anymore?” says Voice Launch organizer Eric Sauve. “And that new, more natural way to interact with our devices will usher in a series of technological advances,” he added.
Voice technology is something that has been dreamt of and worked on for decades all over the world. Why? Well, the answer is very straightforward. Voice recognition allows consumers to multitask by merely speaking to their Google Home, Amazon Alexa, Siri, etc. Digital voice recording works by recording a voice sample of a person’s speech and quickly converting it into written texts using machine language and sophisticated algorithms. Voice input is just the more efficient form of computing, says Mary Meeker in her ‘Annual Internet Trends Report.’ As a matter of fact, according to ComScore, 50% of all searches will be done by voice by 2020, and 30% of searches will be done without even a screen, according to Gartner. As voice becomes a part of things we use every day like our cars, phones, etc. it will become the new “norm.”
The event includes a number of inspiration sessions meant to help startups and founders pick the best strategies. Companies presenting here include industry leaders like Google and Amazon and less known hypergrowth voice tech companies like Deepgram and Balto and VCs like OMERS Ventures and Techstars.
But the focus of the event is the voice tech startups themselves, and this year’s event has some interesting participants. Startups will pitch their ideas, and the audience will vote to select the winners. The event is a cross between a standard pitchfest and Britain’s Got Talent.
Source: https://chatbotslife.com/thebleedingedgeofvoice67538bd859a9?source=rss—a49517e4c30b—4

AI6 days ago
How to Improve Your Social Media Marketing with AI

AI1 week ago
Ai and Chatbots are Transforming the Customer Experience

AI5 days ago
GBoard Introducing Voice — Smooth Texting and Typing

AI1 week ago
Making of: Aurelia — the Echo Investment’s chatbot

AI6 days ago
Internet of Things Impact in 2021

AI3 days ago
Who is chatbot Eliza?

AI6 days ago
9 Applicable Ways To Use AI in Digital Marketing

AI5 days ago
How banks and finance enterprises can strengthen their support with AIpowered customer service…