How to convert speech to text?

By Dhilip Subramanian

Highlights

Speech recognition is one of the most fascinating topics in AI. It helps to translate the human language into text As we can see, speech recognition is used in many day to day applications, like in Banking, Healthcare, Marketing, IOT (Internet of Things) etc.

Share this Article:

Speech recognition is one of the most fascinating topics in AI. It helps to translate the human language into text As we can see, speech recognition is used in many day to day applications, like in Banking, Healthcare, Marketing, IOT (Internet of Things) etc. Other examples includes Apple Siri, Amazon Alexa, Google Assistant etc.

How does it work?

System takes the speech (input) through audio file or microphone

It converts the physical sound into electrical signal

It convert the electrical signal into digital data with Analog -to-Digital converter

Once digitized ML model can be used to transcribed the audio into text

ML and Deep neural network models are used to convert the audio into text. Explanation of how the model works is beyond the scope of this article. In this article, I am explaining how to convert the speech into text using Python. I used Speech Recognition API and PyAudio library in Python to convert the speech into text

Speech Recognition API supports the following

  • CMU Sphinx (works offline)
  • Google Speech Recognition
  • Google Cloud Speech API
  • Wit.ai
  • Microsoft Bing Voice Recognition
  • Houndify API
  • IBM Speech to Text
  • Snowboy Hotword Detection (works offline)

For more details, please check the SpeechRecognition document. In this article, I used google speech recognition API

Installation Speech Recognition and PyAudio Python libraries:

https://jovian.ml/sdhilip/untitled20/v/13&cellId=0

Embed code

Convert an audio file into text


Steps:

  1. Import Speech recognition library
  2. Initializing recognizer class in order to recognize the speech. Google speech recognition API was used. 
  3. Audio file supported by speech recognition: wav, AIFF, AIFF-C, FLAC
  4. Google Speech Recognition API supports Indian languages like Hindi, Tamil, Telugu, Malayalam, Punjabi, Urdu, Marathi, Bengali, Gujarathi, Kannada. I have tested Hindi, Tamil, English, Malayalam and Telugu movie audio clips in this article. 

Code:

https://jovian.ml/sdhilip/untitled20/v/2&cellId=1

(You can embed this code in the article, so the code will be clear)

I used Taken English movie audio clip dialogue (I-dont-know.wav file)

Output


Let’s try in some of our languages.

First I am trying with Tamil language, we don’t need to change the entire code. We need to just add the language option in the recogonize_google and change the audio file. Language options for Tamil is “ta-IN”, Hindi - “hi-IN”, Telugu - “te-IN”, Malayalam - “ml-IN”. For more details, please check the Speech Recognition document.

https://jovian.ml/sdhilip/untitled20/v/5&cellId=2

Output:

For Hindi:

https://jovian.ml/sdhilip/untitled20/v/8&cellId=2

Output

Microphone speech into text

We have converted audio speech into text from the above code. How do we convert our speech using the microphone into text? In order to do that, we need to install the PyAudio library which helps to get the audio input through the Microphone and speaker. 

The code is almost the same, only change is we need to use Microphone class instead of audio file source. 

Code:

https://jovian.ml/sdhilip/untitled20/v/10&cellId=7

I talked : “Corona changed the world completely”

Output

In Tamil:

We just need to add the language option for Tamil “ta-IN” same like audio file

I talked “Welcome, How are you” in Tamil and it exactly translates 

https://jovian.ml/sdhilip/untitled20/v/13&cellId=11

Output


In Hindi

https://jovian.ml/sdhilip/untitled20/v/12&cellId=10

I talked in Hindi “What is your name”

Output

In Telugu

https://jovian.ml/sdhilip/untitled20/v/13&cellId=8

I talked “How are you” in Telugu

Output

In Malayalam

https://jovian.ml/sdhilip/untitled20/v/12&cellId=9

I talked “Where are you from”

Output


This is one of the simplest methods to convert speech into text using google speech recognition API. This is very useful for NLP projects. Also, please note Google speech recognition API requires an internet connection to operate. Please try with other languages and explore.

About the author

Dhilip Subramanian

Dhilip is Machine Learning Engineer working in Wellington, NZ and an AI enthusiast who is passionate with Data Science, Machine Learning and Data Visualization. He loves to explain AI concepts into a simpler term. He is a contributor to the SAS community and blogger in various data science platforms

Reference:

  • https://pypi.org/project/SpeechRecognition/
  • https://cloud.google.com/speech-to-text/docs/languages

Image Source: Loginworks.com

Previous Article

Developing healthcare solutions in low-resource settings in India

Next Article

ZestMoney and Stellaps chosen as WEF’s Technology Pioneers 2020

Want to get your article featured?

Leave your email address here so our team can contact you.

Suggested Articles