If you’ve had an Alexa long enough, at some point, I’m sure you’ve wondered, who is the voice behind your quirky home assistant? Is there a real-life person behind the voice you hear every day? And if so, who is it?
Like you I wanted answers, so I did a little digging to discover who is the voice behind Amazon’s Alexa. I went to the closest Alexa enabled device, the Amazon Alexa Echo, and simply asked, “Alexa, who is your voice?” Try it out yourself! The answer might surprise you! But after reading this article it should all make sense. You see, unlike Apple’s Siri, Amazon’s Alexa is not modeled after any particular person or voice actor, but instead a fully computer-generated voice.
Crazy! I know but it’s true! Alexa’s voice, although life-like, is not from any recordings of a human voice. Let’s look into how that is even possible and why Amazon went this route.
What you’ll learn below: Why Amazon chose a computer-generated voice? Exactly how is Amazon able to generate such realistic sounding responses? As an added bonus, I’ll even reveal the inspiration behind the name, “Alexa”.
Why did Amazon go with a computer-generated voice?
The most obvious answer could be the cost. Not surprisingly, going with a human-inspired voice could mean that Amazon would’ve had to fork out millions, in royalties, to the voice-actor behind the voice (depending on how the contract was structured). The more realistic and probable reason is actually due to advances in some of the core technologies that allows Alexa to sound so real, like Machine Learning & Text-To-Speech.
In the years since Siri, circa 2011, a lot of advancements in Machine Learning algorithms, name neural networks (neural nets), and Text-To-Speech (TTS), have enabled more recent digital assistants to sound more and more life-like, not to mention actually helpful, each day. Apple was slightly ahead of the game when it released Siri, and thus didn’t have the same technological capabilities as Amazon did when Alexa was born, thus Siri was voiced by a real person.
Amazon has taken full advantage of those breakthroughs in technology, by either creating or acquiring, to leap ahead of Apple’s Siri when it comes to digital voice assistants. Back in 2013, Amazon bought Polish-based voice technology software company, Ivona Software, and started integrating Ivona’s tech into products like the Kindle Fire.
Everyday Alexa’s voice is becoming more life-like thanks to neural nets. Soon she’ll even be able to change her speech based on the type of content she’s reading out to you.
“A TV newscaster, for example, will use a very different style when conveying the day’s headlines than a parent will when reading a bedtime story. Amazon scientists have shown that our latest text-to-speech (TTS) system, which uses a generative neural network, can learn to employ a newscaster style from just a few hours of training data. This advance paves the way for Alexa and other services to adopt different speaking styles in different contexts, improving customer experiences.”
– Trevor Wood, Tom Merritt via Alexa Blogs
How is Alexa’s speech generated?
Simply, Alexa uses complex and proprietary Text-To-Speech (TTS) algorithms and tech to transform normal text to easily understandable speech responses. Without getting too technical, Alexa’s TTS works like most TTS systems. In the Alexa cloud (see diagram below), a two-phase process turns text into speech. There is phase 1, aka the front-end, that takes in raw text containing abbreviations, number, etc. and converts them into the written out form.
For example, “hello, it is 45 degrees outside today” gets converted to “hello, it is forty-five degrees outside today”.
It then adds phonetic assignments to each word.
For example, “hello” becomes “/həˈloʊ/”. Pro Tip: you can use a tool like the MacMillan Dictionary to find common phonemes.
Then in phase 2, aka the back-end, these phonetic transcriptions, which as linguistic representations of words, are turned into the sound that is then played on your Amazon echo, Amazon Show, or a host of other Alexa enabled devices.
Why are most digital assistants a female voice?
Now that you understand why Alexa’s voice is computer generated and how Amazon accomplishes that, you might be inclined, like I was, to ask why then did Amazon go with a female voice if they could’ve just as easily made their assistant a male? In a world of digital assistants, why are all the major ones female?
Aside from wanting to match the name with the voice, there are societal and psychological reasons why Apple (Siri), Microsoft (Cortana), Google (Assistant), and Amazon (Alexa), all went with female voices for their respective assistants. These large companies, with vast resources, rarely just put something out without first secretly testing it with many potential customers. When testing the voices for a “personal digital assistant”, beta program participants overwhelmingly favored the female voice over the male because, as PCMag puts it,
“they embody what we think of when we picture a personal assistant: a competent, efficient, and reliable woman. She gets you to meetings on time with reminders and directions, serves up reading material for the commute, and delivers relevant information on the way, like weather and traffic. Nevertheless, she is not in charge.”
Amazon noted that during their research, the female voice was better received as more sympathetic and pleasing to engage with. Echoing other studies that show both men & women prefer a female voice over a male or a computerized voice, given we find the female more cordial.
I’m not going to discuss the potential reinforcement of a harmful societal stereotype here, but the next time you ask Alexa for something think about (1) how you interact with her and (2) if you would do the same if you were talking to a man.
The Inspiration behind Alexa’s Name
In Alexandria Egypt, during the rule of King Ptolemy (285–246 BC), one of the greatest libraries of the ancient world was constructed. During its time, The Great Library of Alexandria (aka Bibliotheca Alexandrina) was regarded as the pillar of higher learning and knowledge; as many of the most influential scholars around that time worked there to standardize texts. It was open to anyone, who could prove themselves a worthy scholar.
The library, built to rival any the scholarly institutions in Athens, Greece, was overflowing with knowledge. It was said that any ship docking into the city had to turn over its books for copying into the library. And they even employed “book hunters” to scour the Mediterranean for literary texts. The library was part of a larger research institution called the “
mu·se·um – early 17th century (denoting a university building, specifically one erected at Alexandria by Ptolemy Soter): via Latin from Greek
via Googlemouseion ‘seat of the Muses’, based onmousa ‘muse’.
Amazon’s naming of Alexa was, in fact, inspired by the Library of Alexa(
Conclusion
So now you know that:
- Alexa’s not officially based on any one human’s voice
- Amazon uses machine learning to generate more humanized Text-to-Speech
- Alexa’s female voice is due to our general preference for the female voice
- The name “Alexa” comes from the Library of Alexandria.
If you could change Alexa’s voice, who or what would you make her sound like? Check out what Amazon’s answers!
Interested in finding some cool Alexa Skills?
Check out our Fresh Skills section.