Natural Sounding Text to Speech?

SVOX pico2wave

A very minimalistic TTS, a better sounding than espeak or mbrola (to my mind). Some information here.

I don't understand why pico2wave is, compared to espeak or mbrola, rarely discussed. It's small, but sounds really good (natural). Without modification you'll hear a natural sounding female voice.

AND ... compared to Mbrola, it recognise Units and speaks it the right way!
For example:

  • 2°C → two degrees
  • 2m → two meters
  • 2kg → two kilograms

After installation I use it in a script:

#!/bin/bash
pico2wave -w=/tmp/test.wav "$1"
aplay /tmp/test.wav
rm /tmp/test.wav

Then run it with the desired text:

<scriptname>.sh "hello world"

or read the contents of an entire file:

<scriptname>.sh "$(cat <filename>)"

That's all to have a lightweight, stable working TTS on Ubuntu.


SpeakIt!

I believe Ive found the best TTS software for free using a Google Chrome extension called "SpeakIt". This only works in the Chrome browser for me on Ubuntu. It doesnt work with Chromium for some reason. SpeakIt comes with two female voices which both sound very realistic compared to everything else out there. There are at least four more male & female voices listed s Chrome extensions if you search the Chrome Web Store using "TTS" as your query.

Usage: For use on a website. you highlight the text you want to be read and either right click and "SpeakIt" or click the SpeakIt icon docked on the Chrome top bar.


Firefox users also have two options. Within Firefox addons, do a search for TTS and you should find "Click Speak" and also "Text to Voice". The voices are not as good as the Chrome SpeakIt voices, but are definitely usable.

The SpeakIt extension uses iSpeech technology and for a price of $20 a year, the site can convert text to MP3 audio files. You can input text, URLs, RSS feeds, as well as documents such as TXT, DOC, and PDF and output to MP3. You can make podcast, embed audio, etc. Here is a link, and a sample of their audio (don't know how long the link will last).


Pico and espeak are fun and easy to get to work, but they're not all that good. The default Festival voices are also not that good. However, Festival is a scheme-based speech framework, where a number of researchers have built much better plug-in voices. You can easily surpass the pico2wave quality on stock Ubuntu, because one of those voices is available as a ready-made package.

To make Festival sound natural, here's what to do:

sudo apt-get install festival
sudo apt-get install festvox-us-slt-hts
festival -i
festival> (voice_cmu_us_slt_arctic_hts) 
festival> (SayText "Don't hate me, I'm just doing my job!")

You can do it from the command line by using -b (or --batch) and putting each command into single quotes:

festival -b '(voice_cmu_us_slt_arctic_hts)' \
    '(SayText "The temperature is 22 degrees centigrade and there is a slight breeze from the west.")'

You can get other quite good voices from the Nitech repository, but installing them is finicky, and the default paths changed so the file name references in the bundled scheme files may need to be manually edited to work on stock Ubuntu.