Amazon Unveils Record-Breaking Text-to-Speech Model: BASE TTS

February 18, 2024 – Amazon’s artificial intelligence research team has announced the development of what is claimed to be the largest text-to-speech model ever created, boasting the highest number of parameters and utilizing the most extensive training dataset to date. Details of the model’s development and training process have been outlined in a paper published on the arXiv preprint server.

In recent years, “large language models” like ChatGPT have garnered significant attention for their ability to intelligently answer questions and generate advanced text. However, artificial intelligence is also making its way into other mainstream applications. In this new project, researchers sought to enhance the capabilities of text-to-speech applications by increasing the number of parameters and expanding the training dataset.

The new model, named “BASE TTS” (Scalable Streaming Text-to-Speech Model), boasts an impressive 980 million parameters and was trained using 100,000 hours of audio recordings sourced from public websites, predominantly in English. Additionally, the researchers provided the model with examples of words and phrases in other languages, enabling it to pronounce common expressions such as “au contraire” and “adios, amigo” correctly.

The Amazon team also experimented with models trained on smaller datasets, hoping to uncover what is known in the field of artificial intelligence as “emergent abilities.” These abilities refer to the sudden leap in intelligence exhibited by AI applications, whether they are large language models or text-to-speech models. Their findings revealed that for text-to-speech applications, this jump in performance occurs with a medium-sized dataset of around 150 million parameters.

The researchers further noted that this leap involves various linguistic attributes, such as the ability to use compound nouns, express emotions, utilize foreign words, apply phonology and punctuation, and correctly emphasize key words in a sentence.

Citing concerns over potential misuse, the research team has stated that BASE TTS will not be made available to the public. Instead, they plan to use it as a learning application and apply the knowledge gained to improve the overall sound quality of text-to-speech applications.

Xpeng’s He Xiaopeng: Chinese Intelligent Driving Leads the Globe, L4 & L5 to Land in 3-5 Years

BYD Outpaces Tesla Again in Q2 BEV Sales with a 77,000-Vehicle Lead

BYD Hits Record Monthly Sales Over 400,000 Units in June

The Worst Slump in Ten Years: Budget Thousand-Yuan Phones Suffer Most, Warns Lu Weibing

J.D. Power Study: In-Car Screens Become the Top Flaw Amid Overall Quality Boost

Toyota Executive: Quality, Reliability and Residual Value Serve as Core Advantages Against Chinese Automakers

Amazon Unveils Record-Breaking Text-to-Speech Model: BASE TTS

Leave a Reply Cancel reply

Popular News

Leave a Reply Cancel reply

Related News