Get more content like this in your Inbox monthly!
Our newsletter consists of curated articles from our top authors.
The researchers launched a state-of-the-art voice separation model that distinguishes the voices of five speakers simultaneously.
Facebook researchers have created an Artificial Intelligence (AI) model that can distinguish up to five different speakers simultaneously on one microphone. The model so far is the best existing system in audio technology so far. The breakthrough can improve audio technology for phones, voice assistants and hearing aids.
In their paper titled Voice Separation with an Unknown Number of Multiple Speakers, the researchers explain their model. “To build our model, we use a novel recurrent neural network architecture that works directly on the raw audio waveform,” states the Facebook blog that made the announcement last week. The researchers taught the AI to segregate different noises by using the recurrent neural network to duplicate memory and analyse the audio to determine the number of people who are speaking before an encoder network identifies and organises the voices in an orderly manner.
Previously, best-available models would use a mask and a decoder to sort each speaker’s voice. The performance of these kinds of models rapidly degrades when the number of speakers is high or unknown.
The main aim of the model is to estimate the source of input and present an isolated channel of the speakers to give a better sound quality. This technology can enhance voice assistants, call qualities and hearing aids as the cancellation of surrounding sounds has not been successfully accomplished by many big tech players.
AI can predict long-term stability of planetary systems
Graphcore unveils world’s most complex IPU processor