Text-to-speech

From Tuxisalive

Jump to: navigation, search

Contents

[edit] Introduction

Speech synthesis needs two applications:

  • a TTS engine: a program that translates plain text into phonemes
  • a synthetiser, which generates what the TTS sends him


Here is an attempt to summarize the possible/interesting TTS/synth combinations, keeping in mind that the ideal TTS is:

  • light
  • easily understandable by humans :p
  • free (as in freedom)
  • python-based would be a plus

Note: for ARM-based devices, we would want fixed point operation.

We may want to let the possibility to choose between several packages (i.e. no hard acapela-specific integration). Is it possible do keep it modular? (from Florent Thiery & David Bourgeois)

[edit] TuxDroid's default TTS+Synthetiser package: Acapela

This is the default choice by tuxdroid's team. It's features are quite impressive, such as date/time speaking.

For a list of the current Languages and associated constants: TTS Languages

[edit] Alternatives

[edit] Cicero+MBROLA

Blind linux users can give us reliable feedback about the performance/comfort of linux speech apps. I was told by a linux distro for blind users that the Cicero+MBrola combination is popular among them, so we may want to investigate/evaluate it. Plus, it's a very light GPLed Python application !

The only restriction is that MBROLA is free (as in beer) for non-commercial applications, because of a France Telecom patent.

  • Pros: light, a LOT of languages available
  • Cons: MBROLA not really free

User evaluation feedbacks:

 None at this time

[edit] eSpeak

  • Pros: light footprint, efficient, "plug-n-play", 600 kb, shared lib exists, lots of languages supported
  • Cons: no "special" pronounciations (ex: time, data)

User evaluation feedbacks:

 Florent Thiery: I quickly evaluated espeak in english, which gives really good results. Install, and that's it !

[edit] Festival+MBROLA

This page is designed to provide the necessary information to get MBROLA voices working with festival.

[edit] Keeping tux's TTS modular & easy to use: speechd

speechd implements a /dev/speech device -- any plaintext written to this file is spoken aloud. This is done via the Festival or rsynth speech synthesis systems (or others if we add them). Needs perl. It could allow us to keep the TTS architecture completely independant of the chosen sotfware, providing an easy to use "glue".

It allows really easy integration into regular apps, by simply executing sysexecs:

 exec echo $anything > /dev/speech

Tux's API has tux.sys.shell_free(any shell command), so basically in any application for tux, a simple

 tux.sys.shell_free("exec echo "+self.string_to_be_spoken"> /dev/speech")

will do the job.

[edit] Speech-enabled/able applications

What can we do with tux's TTS?

[edit] Standalone

From The Speakup Project:

  • Speakup is a screen review/reader package for Linux
  • Trplayer is a Text-Mode RealMedia Player for Linux/Unix which has a command-line interface. It can play RealAudio, RealVideo, MP3, and all

other media types supported by RealPlayer under Unix

  • Speak Freely is a realtime text and audio IRC type program for the Linux, Unix and Windows platforms
  • TuxTalk is a software-based synthesizer for the GNU/Linux operating system, originally based on rsynth
  • Emacspeak --The Complete Audio Desktop Emacspeak is a speech interface that allows visually impaired users to interact independently and efficiently with the computer. Audio formatting allows Emacspeak to produce rich aural presentations of electronic information.

[edit] speechd-based

  • speech.irc is an irc script allowing to speak various elements of conversation
  • Slashes is a Slashdot news ticker -- it displays all the current headlines on Slashdot, speaks each headline aloud during refreshes

[edit] Easy programming

Using python libraries such as feedparser, one can quite easily write scripts like this one, which parses rss feeds and converts them to mp3 rss2mp3.py.

[edit] Giving tux's voice emotions/personality

Emofilt is a open source program to simulate emotional arousal with speech synthesis based on the free-for-non-commercial-use MBROLA synthesis engine. Adding emotion to tux?

[edit] Links

This page groups the possible text-to-speech in GNU/Linux according to a language and their current availability in Ubuntu.

Personal tools