Speech recognition
From Tuxisalive
Contents |
[edit] Introduction
In fact, we're mostly seeking for a command-and-control dialog manager, i.e. an app allowing to launch commands using voice, more than dication features.
The most common problems encountered while evaluating the apps are:
- that these apps almost never offer the choice of the sound device (microphone is our case, /dev/dsp2 for me, reported different for others), so they need source hand-editing and recompilation. A possible modification of the chosen alternative will be to add a --device switch to them.
- that they need / produce various sound quality, not always compatible with tux (8bit, 8khz); we may need to find/do a downsampling/filtering daemon
- the simplest (that may suffice) are dead projects, the still-alive ones a bit overkill ...
[edit] Viavoice & xvoice
ViaVoice shows great features on paper, but we have to test it. It relies on IBM's ViaVoice SDK, which is old and unmaintained. However, we managed to find old rpms together with a tutorial (2004 last updated).
+ It's supposed to work quite good - old, unsupported, no future, non-free
Most projects using voice recognition (such as MisterHouse) use Viavoice on Linux, and Microsoft's voice recognition features on Windows.
[edit] Alternatives
[edit] Perl Box
It's a perl-based command and control dialog manager, using sphinx2 and festival.
Evaluation status :
Florent: I didn't manage to force sphinx use /dev/dsp2 (my own tux mic entry) instead of /dev/dsp. It could be an interesting base for tuxdroid, even if it's a tk interface
[edit] CVoiceControl
- it's really pattern-matching oriented, simple, intended for command line script launching.
"CVoiceControl is a tool that gives the user voice control over unix commands. A template matching based speaker dependent isolated word recognition approach is employed."
But it's 16 kHz, 16 bit, mono, not 8 kHz...
Depends on:
- Ncurses library and header files
- Pthreads library -- potential problem for NASes
- OSS sound library (sys/soundcard.h)
How is it supposed to work?
- Choose devices
- Calibrate mic level
- Record voice sample for script launching
Evaluation status:
Florent: Installed the deb on ubuntu 6.10 $: microphone_config Error opening terminal: xterm. Opening it in another terminal (Eterm) works. I went through the config process, which crashes when writing the config file. Mandriva bugreport Doesn't compile from source anymore.
We should abandon this alternative.
Update 2008-04-24: on mandriva 2008.1, cvoicecontrol works again. It is fast and simple. Pure C code. We should consider using it again.
[edit] Speech recognition engines and resources
[edit] Sphinx
Sphinx exists in various declinations
Considering tux's microphone performance, our best chance is with PocketSphinx.
There's an package / ipkg already in openembedded
[edit] Julius
[edit] SpeechIO
[edit] VoxForge
Here is a project to help open source developers make their project work better by submitting your speech samples to add them to a speech corpus used to train 4 open source projects that work on that task
[edit] Common problems
- When the motors are running or the speaker is used, the microphone will get that noise with a high level
- Opening the mouth raises the mic level (upcoming calibration problems?)
- There's some 500Hz noise due to the RF digital modulation. Actually the 500Hz comes from the 2.4GHz signal which is pulsed at 500Hz (a frame is sent each 2ms, thus 500Hz)
[edit] Links
- KDE: commandes vocales http://kubuntu.free.fr/blog/index.php?p=43
- http://www.linux.com/howtos/Speech-Recognition-HOWTO/software.shtml
- Festival & CVoiceControl tutorial
- http://www.linuxjournal.com/article/4723
- the article shows a method where /dev/speech incarnates the TTS daemon (just do echo "test" > /dev/speech). A quick and simple solution.

