Artificial intelligence has become part of nearly every aspect of our lives, from content-aware fills for video and photos, facial recognition to unlock your phone and even recommendations for your mobile coffee order. The field is growing so rapidly, it’s becoming increasingly difficult to nail down a definitive definition. Machine learning, deep learning, natural language processing (NLP), computer vision, voice recognition and speech synthesis… all these and many more fall under the umbrella of artificial intelligence.
IBM, Google, Amazon and many others have created API endpoints for developers to integrate and start leveraging AI in their own projects. AI trained on millions of data sets and models are at your fingertips. Hooking into machine learning power has never been easier.
Imagine building a web-based app that can not only understand what a user is saying to it, but also respond in a voice customised to their liking. All in real time. Combining chatbot dialog models with voice recognition and now voice synthesis, this scenario has become a reality. You can develop solutions for education, hands-free communications, call-centre automation and engaging games and web experiences.
In this tutorial, you are going to create a simple app to enable you to return AI-powered, human-sounding speech, based on values you choose.
SPEECH SYNTHESIS (TEXT-TO-SPEECH)
Speech synthesis, or text-to-speech, is the conversion of text input into human-like speech. Although on the surface the concept may seem simple, the complexity of making a sound humanlike requires vast amounts of AI training. DeepMind has developed groundbreaking technology called WaveNet that can create extremely human-sounding voices. Combining this with neural networks yields an increasing range of voices and options.
SOME KEY FEATURES OF SPEECH SYNTHESIS
- Multi-language support
- Playback rate to adjust how fast or slow a voice is
- Pitch control to set the right style voice for your app
- Volume control to fine-tune for various scenarios
- Device-specific profiles to target hardware for optimal playback
SPEECH SYNTHESIS WITH GOOGLE’S TEXT-TO-SPEECH API
With many great choices of tools and APIs, it can be hard to know what tool to use but, for this tutorial, we’ve chosen to focus on Google’s. Google Cloud Text-to-Speech is one of the most advanced options currently available. It uses a Restful API model, enabling you to access it from a wide range of platforms and languages. It is capable of producing over 180 different voices, spanning over 30 languages. It combines the DeepMind WaveNet tech with Google’s own machine learning. The results can be returned in multiple formats, as well as be devicespecific for optimal playback results.
When you combine this API with others within the Google ecosystem, you can create powerful solutions all under one project. It makes it a great choice for this task and is valuable learning, regardless of what tools you use in projects ahead.
STEP 1: SET UP A GOOGLE CLOUD PROJECT
To get started, you’ll need to set up a Google Cloud project. Go to the Google Cloud Platform Console (https://console.cloud.google.com/) to create a new project. Or select an existing one, if you want to add this feature to a project you are working on.
The Text-to-Speech API is free to use until you start needing to process millions of characters per month. For this service you may need to associate billing info with your account when you activate the API, if you haven’t before. You can remove the services after you are done testing and it’s not charged at low volume usage.
STEP 2: ACTIVATE THE TEXT-TO-SPEECH API
Next, go to the Google API library and select the Text-to-Speech API for your project. If you are having trouble finding it, the URL for it is: https:// console.cloud.google.com/apis/library?project=text-tospeech-265814&q=Text
You can read upto 3 premium stories before you subscribe to Magzter GOLD
Log-in, if you are already a subscriber
Get unlimited access to thousands of curated premium stories and 5,000+ magazines
READ THE ENTIRE ISSUE