Speech recognition

From Tuxisalive

Jump to: navigation, search

Contents

[edit] Introduction

In fact, we're mostly seeking for a command-and-control dialog manager, i.e. an app allowing to launch commands using voice, more than dication features.

The most common problems encountered while evaluating the apps are:

  • that these apps almost never offer the choice of the sound device (microphone is our case, /dev/dsp2 for me, reported different for others), so they need source hand-editing and recompilation. A possible modification of the chosen alternative will be to add a --device switch to them.
  • that they need / produce various sound quality, not always compatible with tux (8bit, 8khz); we may need to find/do a downsampling/filtering daemon
  • the simplest (that may suffice) are dead projects, the still-alive ones a bit overkill ...

[edit] Viavoice & xvoice

ViaVoice shows great features on paper, but we have to test it. It relies on IBM's ViaVoice SDK, which is old and unmaintained. However, we managed to find old rpms together with a tutorial (2004 last updated).

+ It's supposed to work quite good
- old, unsupported, no future, non-free

Most projects using voice recognition (such as MisterHouse) use Viavoice on Linux, and Microsoft's voice recognition features on Windows.

[edit] Alternatives

[edit] Perl Box

It's a perl-based command and control dialog manager, using sphinx2 and festival.

Evaluation status :

 Florent:
 I didn't manage to force sphinx use /dev/dsp2 (my own tux mic entry) instead of /dev/dsp.
 It could be an interesting base for tuxdroid, even if it's a tk interface

[edit] CVoiceControl

  • it's really pattern-matching oriented, simple, intended for command line script launching.
"CVoiceControl is a tool that gives the user voice control over unix
commands. A template matching based speaker dependent isolated word
recognition approach is employed."

But it's 16 kHz, 16 bit, mono, not 8 kHz...

Depends on:

  • Ncurses library and header files
  • Pthreads library -- potential problem for NASes
  • OSS sound library (sys/soundcard.h)

How is it supposed to work?

  1. Choose devices
  2. Calibrate mic level
  3. Record voice sample for script launching

Screenshot

Tutorial & usage

Evaluation status:

 Florent:
 Installed the deb on ubuntu 6.10
 $: microphone_config 
 Error opening terminal: xterm.
 Opening it in another terminal (Eterm) works. I went through the config process, which crashes 
 when writing the config file.
 Mandriva bugreport
 Doesn't compile from source anymore.

We should abandon this alternative.

Update 2008-04-24: on mandriva 2008.1, cvoicecontrol works again. It is fast and simple. Pure C code. We should consider using it again.

[edit] Speech recognition engines and resources

[edit] Sphinx

Sphinx exists in various declinations

Considering tux's microphone performance, our best chance is with PocketSphinx.

There's an package / ipkg already in openembedded

[edit] Julius

[edit] SpeechIO

[edit] VoxForge

Here is a project to help open source developers make their project work better by submitting your speech samples to add them to a speech corpus used to train 4 open source projects that work on that task

[edit] Common problems

  1. When the motors are running or the speaker is used, the microphone will get that noise with a high level
  2. Opening the mouth raises the mic level (upcoming calibration problems?)
  3. There's some 500Hz noise due to the RF digital modulation. Actually the 500Hz comes from the 2.4GHz signal which is pulsed at 500Hz (a frame is sent each 2ms, thus 500Hz)

[edit] Links

Personal tools