Voice imitation algorithms

Voice imitation algorithms (also known as Speech synthesis^[1]) are a form of Synthetic Media, used to imitate human speech. They achieve this by using machine learning and artificial intelligence techniques^[2].

History

Commercial implementation

The Speak and Spell was originally introduced in 1978 by Texas Instruments. It featured a keyboard and a speech synthesizer, which was used to convert words that were typed onto the keyboard into synthesized audio that it played from speakers.

Lyrebird (also known as Lyrebird AI) was a Montreal based company founded in 2017 focused on speech synthesis and voice imitation.^[3] In 2019 it was acquired by Descript, an American company focused on audio editing software, specifically tailored towards podcast creators.^[4] Lyrebird AI uses artificial intelligence and voice samples to accurately replicate human speech.

China-based technology company Baidu has used neural networks and deep learning to create accurate voice imitations from thousands of collected voice samples.^[5]^[6]

Research

University of Delaware and Nemours Alfred I. duPont Hospital for Children's jointly operated Applied Science and Engineering Laboratories (also know as ASEL), has researched and developed the Model Talker.^[7]^[8] A software which is used with AAC devices to replicate human speech to assist those with hearing or speech impairments.