Connect with us

AI

Automating the analysis of multi-speaker audio files using Amazon Transcribe and Amazon Athena

In an effort to drive customer service improvements, many companies record the phone conversations between their customers and call center representatives. These call recordings are typically stored as audio files and processed to uncover insights such as customer sentiment, product or service issues, and agent effectiveness. To provide an accurate analysis of these audio files, […]

Published

on

In an effort to drive customer service improvements, many companies record the phone conversations between their customers and call center representatives. These call recordings are typically stored as audio files and processed to uncover insights such as customer sentiment, product or service issues, and agent effectiveness. To provide an accurate analysis of these audio files, the transcriptions need to clearly identify who spoke what and when.

However, given the average customer service agent handles 30–50 calls a day, the sheer volume of audio files to analyze quickly becomes a challenge. Companies need a robust system for transcribing audio files in large batches to improve call center quality management. Similarly, legal investigations often need to efficiently analyze case-related audio files in search of potential evidence or insight that can help win legal cases. Also, in the healthcare sector, there is a growing need for this solution to help transcribe and analyze virtual patient-provider interactions.

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy to convert audio to text. One key feature of the service is called speaker identification, which you can use to label each individual speaker when transcribing multi-speaker audio files. You can specify Amazon Transcribe to identify 2–10 speakers in the audio clip. For the best results, define the correct number of speakers for the audio input.

A contact center, which often records multi-channel audio, can also benefit from using a feature called channel identification. The feature can separate each channel from within a single audio file and simultaneously transcribe each track. Typically, an agent and a caller are recorded on separate channels, which are merged into a single audio file. Contact center applications like Amazon Connect record agent and customer conversations on different channels (for example, the agent’s voice is captured in the left channel, and the customer’s in the right for a two-channel stereo recording). Contact centers can submit the single audio file to Amazon Transcribe, which identifies the two channels and produces a coherent merged transcript with channel labels.

In this post, we walk through a solution that analyzes audio files involving multiple speakers using Amazon Transcribe and Amazon Athena, a serverless query service for big data. Combining these two services together, you can easily set up a serverless, pay-per-use solution for processing audio files into readable text and analyze the data using standard query language (SQL).

Solution overview

The following diagram illustrates the solution architecture.

The solution contains the following steps:

  1. You upload the audio file to the Amazon Simple Storage Service (Amazon S3) bucket AudioRawBucket.
  2. The Amazon S3 PUT event triggers the AWS Lambda function LambdaFunction1.
  3. The function invokes an asynchronous Amazon Transcribe API call on the uploaded audio file.
  4. The function also writes a message into Amazon Simple Queue Service (Amazon SQS) with the transcription job information.
  5. The transcription job runs and writes the output in JSON format to the target S3 bucket, AudioPrcsdBucket.
  6. An Amazon CloudWatch Events rule triggers the function(LambdaFunction2) to run for every 2 minutes interval.
  7. The function LambdaFunction2 reads the SQS queue for transcription jobs, checks for job completion, converts the JSON file to CSV, and loads an Athena table with the audio text data.
  8. You can access the processed audio file transcription from the AudioPrcsdBucket.
  9. You also query the data with Amazon Athena.

Prerequisites

To get started, you need the following:

  • A valid AWS account with access to AWS services
  • The Athena database “default” in an AWS account in us-east-1
  • A multi-speaker audio file—for this post, we use medical-diarization.wav

To achieve the best results, we recommend the following:

  • Use a lossless format, such as WAV or FLAC, with PCM 16-bit encoding
  • Use a sample rate of 8000 Hz for low-fidelity audio and 16000 Hz for high-fidelity audio

Deploying the solution

You can use the provided AWS CloudFormation template to launch and configure all the resources for the solution.

  1. Choose Launch Stack:

This takes you to the Create stack wizard on the AWS CloudFormation console. The template is launched in the US East (N. Virginia) Region by default.

The CloudFormation templates used in this post are designed to work only in the us-east-1 Region. These templates are also not intended for production use without modification.

  1. On the Select Template page, keep the default URL for the CloudFormation template, and choose Next.
  2. On the Specify Details page, review and provide values for the required parameters in the template.
    • For EnvName, enter Dev.

Dev is your environment, where you want to deploy the template. AWS CloudFormation uses this value for resources in Lambda, Amazon SQS, and other services.

  1. After you specify the template details, choose Next.
  2. On the Options page, choose Next again.
  3. On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  4. Choose Create Stack

It takes approximately 5–10 minutes for the deployment to complete. When the stack launch is complete, it returns outputs with information about the resources that were created.

You can view the stack outputs on the AWS Management Console or by using the following AWS Command Line Interface (AWS CLI) command:

aws cloudformation describe-stacks --stack-name <stack-name> --region us-east-1 --query Stacks[0].Outputs

Resources created by the CloudFormation stack

  • AudioRawBucket – Stores raw audio files based on the PUT event Lambda function for Amazon Transcribe to run
  • AudioPrcsdBucket – Stores the processed output
  • LambdaRole1 – The Lambda role with required permissions for S3 buckets, Amazon SQS, Amazon Transcribe, and CloudWatch
  • LambdaFunction1 – The initial function to run Amazon Transcribe to process the audio file, create a JSON file, and update Amazon SQS
  • LambdaFunction2 – The post function that reads the SQS queue, converts (aggregates) the JSON to CSV format, and loads it into an Athena table
  • TaskAudioQueue– The SQS queue for storing all audio processing requests
  • ScheduledRule– The CloudWatch schedule for LambdaFunction2
  • AthenaNamedQuery – The Athena table definition for storing processed audio files transcriptions with object information

The Athena table for the audio text has the following definitions:

  • audio_transcribe_job – The job submitted to transcribe the audio
  • time_start – The beginning timestamp for the speaker
  • speaker – Speaker tags (for example, spk_0, spk-1, and so on)
  • speaker_text – The text from the speaker audio

Validating the solution

You can now validate that the solution works.

  1. Verify the AWS CloudFormation resources were created (see previous section for instructions via the console or AWS CLI).
  2. Upload the sample audio file to the S3 bucket AudioRawBucket.

The transcription process is asynchronous, so it can take a few minutes for the job to complete. You can check the job status on the Amazon Transcribe console and CloudWatch console.

When the transcription job is complete and Athena table transcribe_data created, you can run Athena queries to verify the transcription output. See the following select statement:

select * from "default"."transcribe_data" order by 1,2

The following table shows the output for the above select statement.

audio_transcribe_job time_start speaker speaker_text
medical-diarization.wav 0:00:01 spk_0  Hey, Jane. So what brings you into my office today?
medical-diarization.wav 0:00:03 spk_1  Hey, Dr Michaels. Good to see you. I’m just coming in from a routine checkup.
medical-diarization.wav 0:00:07 spk_0  All right, let’s see, I last saw you. About what, Like a year ago. And at that time, I think you were having some minor headaches. I don’t recall prescribing anything, and we said we’d maintain some observations unless things were getting worse.
medical-diarization.wav 0:00:20 spk_1  That’s right. Actually, the headaches have gone away. I think getting more sleep with super helpful. I’ve also been more careful about my water intake throughout my work day.
medical-diarization.wav 0:00:29 spk_0  Yeah, I’m not surprised at all. Sleep deprivation and chronic dehydration or to common contributors to potential headaches. Rest is definitely vital when you become dehydrated. Also, your brain tissue loses water, causing your brain to shrink and, you know, kind of pull away from the skull. And this contributor, the pain receptors around the brain, giving you the sensation of a headache. So how much water are you roughly taking in each day
medical-diarization.wav 0:00:52 spk_1  of? I’ve become obsessed with drinking enough water. I have one of those fancy water bottles that have graduated markers on the side. I’ve also been logging my water intake pretty regularly on average. Drink about three litres a day.
medical-diarization.wav 0:01:06 spk_0  That’s excellent. Before I start the routine physical exam is there anything else you like me to know? Anything you like to share? What else has been bothering you?

Cleaning up

To avoid incurring additional charges, complete the following steps to clean up your resources when you are done with the solution:

  1. Delete the Athena table transcribe_data from default
  2. Delete the prefixes and objects you created from the buckets AudioRawBucket and AudioPrcsdBucket.
  3. Delete the CloudFormation stack, which removes your additional resources.

Conclusion

In this post, we walked through the solution, reviewed sample implementation of audio file conversion using Amazon S3, Amazon Transcribe, Amazon SQS, Lambda, and Athena, and validated the steps for processing and analyzing multi-speaker audio files.

You can further extend this solution to perform sentiment analytics and improve your customer experience. For more information, see Detect sentiment from customer reviews using Amazon Comprehend. For more information about live call and post-call analytics, see AWS announces AWS Contact Center Intelligence solutions.


About the Authors

Mahendar Gajula is a Big Data Consultant at AWS. He works with AWS customers in their journey to the cloud with a focus on Big data, Data warehouse and AI/ML projects. In his spare time, he enjoys playing tennis and spending time with his family.

Rajarao Vijjapu is a data architect with AWS. He works with AWS customers and partners to provide guidance and technical assistance about Big Data, Analytics, AI/ML and Security projects, helping them improve the value of their solutions when using AWS.

Source: https://aws.amazon.com/blogs/machine-learning/automating-the-analysis-of-multi-speaker-audio-files-using-amazon-transcribe-and-amazon-athena/

AI

How does it know?! Some beginner chatbot tech for newbies.

Published

on

Wouter S. Sligter

Most people will know by now what a chatbot or conversational AI is. But how does one design and build an intelligent chatbot? Let’s investigate some essential concepts in bot design: intents, context, flows and pages.

I like using Google’s Dialogflow platform for my intelligent assistants. Dialogflow has a very accurate NLP engine at a cost structure that is extremely competitive. In Dialogflow there are roughly two ways to build the bot tech. One is through intents and context, the other is by means of flows and pages. Both of these design approaches have their own version of Dialogflow: “ES” and “CX”.

Dialogflow ES is the older version of the Dialogflow platform which works with intents, context and entities. Slot filling and fulfillment also help manage the conversation flow. Here are Google’s docs on these concepts: https://cloud.google.com/dialogflow/es/docs/concepts

Context is what distinguishes ES from CX. It’s a way to understand where the conversation is headed. Here’s a diagram that may help understand how context works. Each phrase that you type triggers an intent in Dialogflow. Each response by the bot happens after your message has triggered the most likely intent. It’s Dialogflow’s NLP engine that decides which intent best matches your message.

Wouter Sligter, 2020

What’s funny is that even though you typed ‘yes’ in exactly the same way twice, the bot gave you different answers. There are two intents that have been programmed to respond to ‘yes’, but only one of them is selected. This is how we control the flow of a conversation by using context in Dialogflow ES.

Unfortunately the way we program context into a bot on Dialogflow ES is not supported by any visual tools like the diagram above. Instead we need to type this context in each intent without seeing the connection to other intents. This makes the creation of complex bots quite tedious and that’s why we map out the design of our bots in other tools before we start building in ES.

The newer Dialogflow CX allows for a more advanced way of managing the conversation. By adding flows and pages as additional control tools we can now visualize and control conversations easily within the CX platform.

source: https://cloud.google.com/dialogflow/cx/docs/basics

This entire diagram is a ‘flow’ and the blue blocks are ‘pages’. This visualization shows how we create bots in Dialogflow CX. It’s immediately clear how the different pages are related and how the user will move between parts of the conversation. Visuals like this are completely absent in Dialogflow ES.

It then makes sense to use different flows for different conversation paths. A possible distinction in flows might be “ordering” (as seen here), “FAQs” and “promotions”. Structuring bots through flows and pages is a great way to handle complex bots and the visual UI in CX makes it even better.

At the time of writing (October 2020) Dialogflow CX only supports English NLP and its pricing model is surprisingly steep compared to ES. But bots are becoming critical tech for an increasing number of companies and the cost reductions and quality of conversations are enormous. Building and managing bots is in many cases an ongoing task rather than a single, rounded-off project. For these reasons it makes total sense to invest in a tool that can handle increasing complexity in an easy-to-use UI such as Dialogflow CX.

This article aims to give insight into the tech behind bot creation and Dialogflow is used merely as an example. To understand how I can help you build or manage your conversational assistant on the platform of your choice, please contact me on LinkedIn.

Source: https://chatbotslife.com/how-does-it-know-some-beginner-chatbot-tech-for-newbies-fa75ff59651f?source=rss—-a49517e4c30b—4

Continue Reading

AI

Who is chatbot Eliza?

Between 1964 and 1966 Eliza was born, one of the very first conversational agents. Discover the whole story.

Published

on


Frédéric Pierron

Between 1964 and 1966 Eliza was born, one of the very first conversational agents. Its creator, Joseph Weizenbaum was a researcher at the famous Artificial Intelligence Laboratory of the MIT (Massachusetts Institute of Technology). His goal was to enable a conversation between a computer and a human user. More precisely, the program simulates a conversation with a Rogérian psychoanalyst, whose method consists in reformulating the patient’s words to let him explore his thoughts himself.

Joseph Weizenbaum (Professor emeritus of computer science at MIT). Location: Balcony of his apartment in Berlin, Germany. By Ulrich Hansen, Germany (Journalist) / Wikipedia.

The program was rather rudimentary at the time. It consists in recognizing key words or expressions and displaying in return questions constructed from these key words. When the program does not have an answer available, it displays a “I understand” that is quite effective, albeit laconic.

Weizenbaum explains that his primary intention was to show the superficiality of communication between a human and a machine. He was very surprised when he realized that many users were getting caught up in the game, completely forgetting that the program was without real intelligence and devoid of any feelings and emotions. He even said that his secretary would discreetly consult Eliza to solve his personal problems, forcing the researcher to unplug the program.

Conversing with a computer thinking it is a human being is one of the criteria of Turing’s famous test. Artificial intelligence is said to exist when a human cannot discern whether or not the interlocutor is human. Eliza, in this sense, passes the test brilliantly according to its users.
Eliza thus opened the way (or the voice!) to what has been called chatbots, an abbreviation of chatterbot, itself an abbreviation of chatter robot, literally “talking robot”.

Source: https://chatbotslife.com/who-is-chatbot-eliza-bfeef79df804?source=rss—-a49517e4c30b—4

Continue Reading

AI

FermiNet: Quantum Physics and Chemistry from First Principles

Weve developed a new neural network architecture, the Fermionic Neural Network or FermiNet, which is well-suited to modeling the quantum state of large collections of electrons, the fundamental building blocks of chemical bonds.

Published

on

Unfortunately, 0.5% error still isn’t enough to be useful to the working chemist. The energy in molecular bonds is just a tiny fraction of the total energy of a system, and correctly predicting whether a molecule is stable can often depend on just 0.001% of the total energy of a system, or about 0.2% of the remaining “correlation” energy. For instance, while the total energy of the electrons in a butadiene molecule is almost 100,000 kilocalories per mole, the difference in energy between different possible shapes of the molecule is just 1 kilocalorie per mole. That means that if you want to correctly predict butadiene’s natural shape, then the same level of precision is needed as measuring the width of a football field down to the millimeter.

With the advent of digital computing after World War II, scientists developed a whole menagerie of computational methods that went beyond this mean field description of electrons. While these methods come in a bewildering alphabet soup of abbreviations, they all generally fall somewhere on an axis that trades off accuracy with efficiency. At one extreme, there are methods that are essentially exact, but scale worse than exponentially with the number of electrons, making them impractical for all but the smallest molecules. At the other extreme are methods that scale linearly, but are not very accurate. These computational methods have had an enormous impact on the practice of chemistry – the 1998 Nobel Prize in chemistry was awarded to the originators of many of these algorithms.

Fermionic Neural Networks

Despite the breadth of existing computational quantum mechanical tools, we felt a new method was needed to address the problem of efficient representation. There’s a reason that the largest quantum chemical calculations only run into the tens of thousands of electrons for even the most approximate methods, while classical chemical calculation techniques like molecular dynamics can handle millions of atoms. The state of a classical system can be described easily – we just have to track the position and momentum of each particle. Representing the state of a quantum system is far more challenging. A probability has to be assigned to every possible configuration of electron positions. This is encoded in the wavefunction, which assigns a positive or negative number to every configuration of electrons, and the wavefunction squared gives the probability of finding the system in that configuration. The space of all possible configurations is enormous – if you tried to represent it as a grid with 100 points along each dimension, then the number of possible electron configurations for the silicon atom would be larger than the number of atoms in the universe!

This is exactly where we thought deep neural networks could help. In the last several years, there have been huge advances in representing complex, high-dimensional probability distributions with neural networks. We now know how to train these networks efficiently and scalably. We surmised that, given these networks have already proven their mettle at fitting high-dimensional functions in artificial intelligence problems, maybe they could be used to represent quantum wavefunctions as well. We were not the first people to think of this – researchers such as Giuseppe Carleo and Matthias Troyer and others have shown how modern deep learning could be used for solving idealised quantum problems. We wanted to use deep neural networks to tackle more realistic problems in chemistry and condensed matter physics, and that meant including electrons in our calculations.

There is just one wrinkle when dealing with electrons. Electrons must obey the Pauli exclusion principle, which means that they can’t be in the same space at the same time. This is because electrons are a type of particle known as fermions, which include the building blocks of most matter – protons, neutrons, quarks, neutrinos, etc. Their wavefunction must be antisymmetric – if you swap the position of two electrons, the wavefunction gets multiplied by -1. That means that if two electrons are on top of each other, the wavefunction (and the probability of that configuration) will be zero.

This meant we had to develop a new type of neural network that was antisymmetric with respect to its inputs, which we have dubbed the Fermionic Neural Network, or FermiNet. In most quantum chemistry methods, antisymmetry is introduced using a function called the determinant. The determinant of a matrix has the property that if you swap two rows, the output gets multiplied by -1, just like a wavefunction for fermions. So you can take a bunch of single-electron functions, evaluate them for every electron in your system, and pack all of the results into one matrix. The determinant of that matrix is then a properly antisymmetric wavefunction. The major limitation of this approach is that the resulting function – known as a Slater determinant – is not very general. Wavefunctions of real systems are usually far more complicated. The typical way to improve on this is to take a large linear combination of Slater determinants – sometimes millions or more – and add some simple corrections based on pairs of electrons. Even then, this may not be enough to accurately compute energies.

Source: https://deepmind.com/blog/article/FermiNet

Continue Reading
AI16 hours ago

How does it know?! Some beginner chatbot tech for newbies.

AI16 hours ago

Who is chatbot Eliza?

AI1 day ago

FermiNet: Quantum Physics and Chemistry from First Principles

AI1 day ago

How to take S3 backups with DejaDup on Ubuntu 20.10

AI3 days ago

How banks and finance enterprises can strengthen their support with AI-powered customer service…

AI3 days ago

GBoard Introducing Voice — Smooth Texting and Typing

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

Trending