Speech Recognition and AI

Part1: Speech Recognition from Audrey to Alexa

Speech recognition is a technology that can recognize spoken words, which can then be converted to text. Voice recognition is a part of speech recognition which is voice based.

History of Speech Recognition

In 1952, Bell Laboratories designed the “Audrey” system which could recognize a single voice speaking digits aloud.

In 1962, IBM introduced the first speech recognition machine, Shoebox. It could understand 16 words: zero, one, two, three, four, five, six, seven, eight, nine, minus, plus, subtotal, total, false, and off.

In the 1970s, the Speech Understanding Research (SUR) program run by US Department of Defense and DARPA supported research in this field. The Harpy Speech Recognition System designed at the Computer Science department in Carnegie Mellon could understand about 1,000 words.

The Bell Lab also introduced a system that could understand multiple voices.

In 1980, IBM developed a talking typewriter for sight-impaired individuals, and the next year introduced a talking display terminal.

As graphic user interfaces grew in popularity during the 1980s, IBM developed one of the first screen readers to work with the new technology.

Also, in the 1980s a statistical method called Hidden Markov Model (HMM) was discovered that estimated the probability of unknown sounds being words instead of just using words and looking for sound patterns.

In 1990s, the personal computer made it possible for big strides in the world of speech recognition.

In 1999, IBM introduced the IBM Home Page Reader, a talking web browser that helped users who were sight-impaired hear the full range of web-page content in a logical, understandable manner.

Dragon Dictate software and a dial-in voice recognition system call VAL (voice portal) by Bell South continued further advancement in this field.

In the 2000s, Google introduced the Google voice search app which included 230 billion words from user searches. Not only did this app make speech recognition available to millions of people, Google was also using it to collect data on user searches to help predict what the user was saying to further improve the accuracy of its app.

In the 2010s, Apple launched Siri. Amazon’s Alexa and Google Home were few more voice recognition apps available to consumers. With all these advancements, speech recognition accuracy has been also rapidly improving with tech companies trying to reduce their word error rate.

What are some applications of voice/speech technology?

  1. Driver safety: Hands-free dialing for phone users, Voice activated navigation system, Voice control & search capabilities for in-car radios
  2. Accessible computing: For vision, mobility or other impairments.
  3. Virtual assistants: Virtual assistants on our phones, Smart speakers at home
  4. Speech to text software: Transcribe interviews, podcasts, dictation, Translate and subtitle content

The Future

With the advancements in artificial intelligence and the increasing amounts of speech data that can be easily mined, voice is on it ways to become one of the dominant user interfaces in the world of technology.

Today, this technology is ingrained in our day to day lives with a multitude of voice driven applications like

  • Microsoft’s Cortana
  • Apple’s Siri
  • Amazon’s Alexa
  • Voice responsive features of Google

Our day to day gadgets like phones, watches, computers, even refrigerators are becoming increasingly integrated with voice interactivity enabled by AI & machine learning.

Divya Sikka is a Student Ambassador in the Inspirit AI Student Ambassadors Program. Inspirit AI is a pre-collegiate enrichment program that exposes curious high school students globally to AI through live online classes. Learn more at https://www.inspiritai.com/.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store