Get more content like this in your Inbox monthly!
Our newsletter consists of curated articles from our top authors.
By Dhilip Subramanian
Speech recognition is one of the most fascinating topics in AI. It helps to translate the human language into text As we can see, speech recognition is used in many day to day applications, like in Banking, Healthcare, Marketing, IOT (Internet of Things) etc.
Speech recognition is one of the most fascinating topics in AI. It helps to translate the human language into text As we can see, speech recognition is used in many day to day applications, like in Banking, Healthcare, Marketing, IOT (Internet of Things) etc. Other examples includes Apple Siri, Amazon Alexa, Google Assistant etc.
System takes the speech (input) through audio file or microphone
It converts the physical sound into electrical signal
It convert the electrical signal into digital data with Analog -to-Digital converter
Once digitized ML model can be used to transcribed the audio into text
ML and Deep neural network models are used to convert the audio into text. Explanation of how the model works is beyond the scope of this article. In this article, I am explaining how to convert the speech into text using Python. I used Speech Recognition API and PyAudio library in Python to convert the speech into text
Speech Recognition API supports the following
For more details, please check the SpeechRecognition document. In this article, I used google speech recognition API.
Installation Speech Recognition and PyAudio Python libraries:
(You can embed this code in the article, so the code will be clear)
I used Taken English movie audio clip dialogue (I-dont-know.wav file)
Let’s try in some of our languages.
First I am trying with Tamil language, we don’t need to change the entire code. We need to just add the language option in the recogonize_google and change the audio file. Language options for Tamil is “ta-IN”, Hindi - “hi-IN”, Telugu - “te-IN”, Malayalam - “ml-IN”. For more details, please check the Speech Recognition document.
We have converted audio speech into text from the above code. How do we convert our speech using the microphone into text? In order to do that, we need to install the PyAudio library which helps to get the audio input through the Microphone and speaker.
The code is almost the same, only change is we need to use Microphone class instead of audio file source.
I talked : “Corona changed the world completely”
We just need to add the language option for Tamil “ta-IN” same like audio file
I talked “Welcome, How are you” in Tamil and it exactly translates
I talked in Hindi “What is your name”
I talked “How are you” in Telugu
I talked “Where are you from”
This is one of the simplest methods to convert speech into text using google speech recognition API. This is very useful for NLP projects. Also, please note Google speech recognition API requires an internet connection to operate. Please try with other languages and explore.
About the author
Image Source: Loginworks.com
Developing healthcare solutions in low-resource settings in India
ZestMoney and Stellaps chosen as WEF’s Technology Pioneers 2020