Difference between revisions of "Voice imitation algorithms"

Revision as of 19:07, 13 March 2020

Voice imitation algorithms (also known as Speech synthesis^[1]) are a form of Synthetic Media, used to imitate human speech. They achieve this by using machine learning and artificial intelligence techniques^[2].

History

Commercial implementation

The Speak and Spell was originally introduced in 1978 by Texas Instruments. It featured a keyboard and a speech synthesizer, which was used to convert words that were typed onto the keyboard into synthesized audio that it played from speakers.

Lyrebird AI

Lyrebird (also known as Lyrebird AI) was a Montreal based company founded in 2017 focused on speech synthesis and voice imitation.^[3] In 2019 it was acquired by Descript, an American company focused on audio editing software, specifically tailored towards podcast creators.^[4] Lyrebird AI uses artificial intelligence and voice samples to accurately replicate human speech.

China-based technology company Baidu has used neural networks and deep learning to create accurate voice imitations from thousands of collected voice samples.^[5]^[6]

Research

University of Delaware and Nemours Alfred I. duPont Hospital for Children's jointly operated Applied Science and Engineering Laboratories (also know as ASEL), has researched and developed the Model Talker.^[7]^[8] A software which is used with AAC devices to replicate human speech to assist those with hearing or speech impairments.

The vocoder was invented in 1938 by Bell Labs.^[9] It is a type of voice codec that analyzes and synthesizes the human voice waveforms. It is mainly used in audio data compression so that voice data can be saved and utilized while using fewer bits than the original data.

Ethical implications

Voice imitation algorithms have been used in Grandparent scams. A type of telemarketing fraud where the scammer will call an elderly person while claiming to be a relative who has gotten themselves into some kind of trouble and needs money. This type of scam is made easier by the realistic sounding synthesized voice.

radnom

References

[1] ttps://thehill.com/opinion/cybersecurity/470826-perception-wont-be-reality-once-ai-can-manipulate-what-we-see

[2] ttps://www.sciencedirect.com/science/article/pii/S0007681319301600?via%3Dihub

[3] ttps://www.wired.com/brandlab/2018/10/lyrebird-uses-ai-find-artificial-voice/

[4] ttps://www.businessinsider.com/groupon-founder-andrew-mason-new-startup-descript-detour-2017-12

[5] ttps://www.technologyreview.com/f/610386/a-new-algorithm-can-mimic-your-voice-with-just-snippets-of-audio/

[6] ttp://research.baidu.com/Blog/index-view?id=91

[7] ttps://www.asel.udel.edu/

[8] ttps://www.asel.udel.edu/speech/ModelTalker.html

[9] ttps://patents.google.com/patent/US2121142A/en

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

@@ Line 3: / Line 3: @@
 == History ==
 ===Commercial implementation===
-The [https://en.wikipedia.org/wiki/Speak_%26_Spell_(toy) Speak and Spell] was originally introduced in 1978 by [https://en.wikipedia.org/wiki/Texas_Instruments Texas Instruments]. It featured a keyboard and a speech synthesizer, which was used to convert words that were typed onto the keyboard into synthesized audio that it played from speakers.
+The [https://en.wikipedia.org/wiki/Speak_%26_Spell_(toy) Speak and Spell] was originally introduced in 1978 by [https://en.wikipedia.org/wiki/Texas_Instruments Texas Instruments]. It featured a keyboard and a speech synthesizer, which was used to convert words that were typed onto the keyboard into synthesized audio that it played from speakers.  [[File:Screen Shot 2020-03-13 at 3.47.02 PM.png|thumbnail|Lyrebird AI]]
 [https://www.descript.com/lyrebird-ai?source=lyrebird Lyrebird] (also known as '''Lyrebird AI''') was a Montreal based company founded in 2017 focused on speech synthesis and voice imitation.<ref>https://www.wired.com/brandlab/2018/10/lyrebird-uses-ai-find-artificial-voice/</ref> In 2019 it was acquired by Descript, an American company focused on [https://en.wikipedia.org/wiki/Audio_editing_software audio editing software], specifically tailored towards [https://en.wikipedia.org/wiki/Podcast podcast creators].<ref>https://www.businessinsider.com/groupon-founder-andrew-mason-new-startup-descript-detour-2017-12</ref> Lyrebird AI uses artificial intelligence and voice samples to accurately replicate human speech.
 China-based [https://en.wikipedia.org/wiki/Technology_company technology company] [https://en.wikipedia.org/wiki/Baidu Baidu] has used [https://en.wikipedia.org/wiki/Artificial_neural_network neural networks] and [https://en.wikipedia.org/wiki/Deep_learning deep learning] to create accurate voice imitations from thousands of collected voice samples.<ref>https://www.technologyreview.com/f/610386/a-new-algorithm-can-mimic-your-voice-with-just-snippets-of-audio/</ref><ref>http://research.baidu.com/Blog/index-view?id=91</ref>
@@ Line 11: / Line 11: @@
 [https://en.wikipedia.org/wiki/University_of_Delaware University of Delaware] and [https://en.wikipedia.org/wiki/Nemours_Alfred_I._duPont_Hospital_for_Children Nemours Alfred I. duPont Hospital for Children's] jointly operated Applied Science and Engineering Laboratories (also know as ASEL), has researched and developed the [https://www.asel.udel.edu/speech/ModelTalker.html Model Talker].<ref>https://www.asel.udel.edu/</ref><ref>https://www.asel.udel.edu/speech/ModelTalker.html</ref> A software which is used with [https://en.wikipedia.org/wiki/Augmentative_and_alternative_communication AAC devices] to replicate human speech to assist those with hearing or speech impairments.
-== radnom ==
+The [https://en.wikipedia.org/wiki/Vocoder vocoder] was invented in 1938 by [https://en.wikipedia.org/wiki/Bell_Labs Bell Labs].<ref>https://patents.google.com/patent/US2121142A/en</ref> It is a type of [https://en.wikipedia.org/wiki/Voice_codec voice codec] that analyzes and synthesizes the human voice waveforms. It is mainly used in [https://en.wikipedia.org/wiki/Data_compression#Audio audio data compression] so that voice data can be saved and utilized while using fewer bits than the original data.
-== radnom ==
+==Ethical implications==
-== radnom ==
+Voice imitation algorithms have been used in [https://en.wikipedia.org/wiki/Telemarketing_fraud#Popular_scams Grandparent scams]. A type of telemarketing fraud where the scammer will call an elderly person while claiming to be a relative who has gotten themselves into some kind of trouble and needs money. This type of scam is made easier by the realistic sounding synthesized voice.
-Examples, Lyrebird AI
-*References
+== radnom ==
+==References==

Difference between revisions of "Voice imitation algorithms"

Revision as of 19:07, 13 March 2020

Contents

History

Commercial implementation

Research

Ethical implications

radnom

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools