[ Usenet FAQs | Search | Web FAQs | Documents | RFC Index ]
    Search the FAQ Archives


Part1 - Part2 - Part3

comp.speech Frequently Asked Questions - part 3/3

There are reader questions on this topic!
Help others by sharing your knowledge

From: andrew.hunt@east.sun.com (Andrew Hunt)
Newsgroups: comp.speech
Subject: comp.speech Frequently Asked Questions - part 3/3
Date: 12 Jul 1998 12:00:30 GMT
Message-ID: <comp-speech-faq/part3_900244804@rtfm.mit.edu>
Reply-To: andrew.hunt@east.sun.com (Andrew Hunt)
Summary: Information on Speech Technology
X-Last-Updated: 1998/07/08

Archive-name: comp-speech-faq/part3
Last-modified: 1998/07/06
URL: http://www.speech.su.oz.au/comp.speech/

                   COMP.SPEECH FAQ POSTING - PART 3/3


[Note: this document has been automatically extracted from a WWW site:
        http://www.speech.su.oz.au/comp.speech/
This may introduce some formatting errors.]


                              Speech Synthesis

                         comp.speech FAQ Section 5

          * SpeechLinks: Speech Synthesis
          * Q5.1: What is speech synthesis?
          * Q5.2: How can speech synthesis be performed?
          * Q5.3: References/Books on Synthesis
          * Q5.4: Speech Synthesis on the WWW
          * Q5.5: Speech Synthesis Software/Hardware


___________________________________________________________________________

                        Q5.1: What is speech synthesis?

   Speech synthesis programs convert written input to spoken output by
   automatically generating synthetic speech. Speech synthesis is often
   referred to a "Text-to-Speech" conversion (TTS).


___________________________________________________________________________

                       Q5.2: Performing speech synthesis

   There are several algorithms. The choice depends on the task they're
   used for. The easiest way is to just record the voice of a person
   speaking the desired phrases. This is useful if only a restricted
   volume of phrases and sentences is used, e.g. messages in a train
   station, or schedule information via phone. The quality depends on the
   way recording is done.

   More sophisticated but worse in quality are algorithms which split the
   speech into smaller pieces. The smaller those units are, the less are
   they in number, but the quality also decreases. An often used unit is
   the phoneme, the smallest linguistic unit. Depending on the language
   used there are about 35-50 phonemes in western European languages,
   i.e. there are 35-50 single recordings. The problem is combining them
   as fluent speech requires fluent transitions between the elements. The
   intellegibility is therefore lower, but the memory required is small.

   A solution to this dilemma is using diphones. Instead of splitting at
   the transitions, the cut is done at the center of the phonemes,
   leaving the transitions themselves intact. This gives about 400
   elements (20*20) and the quality increases.

   The longer the units become, the more elements are there, but the
   quality increases along with the memory required. Other units which
   are widely used are half-syllables, syllables, words, or combinations
   of them, e.g. word stems and inflectional endings.

   The Museum of Speech Analysis and Synthesis has pictures of artificial
   speech systems going back over 150 years: worth a visit. (
   http://mambo.ucsc.edu/psl/smus/smus.html)


___________________________________________________________________________

                      Q5.3: References/Books on Synthesis

  Books and Papers

     * Thierry Dutoit, An Introduction to Text-to-Speech Synthesis,
       Kluwer Academic Publishers (Dordrecht), 1997, ISBN 0-7923-4498-7,
       312 pages. Volume 3 in the series on Text, Speech and Language
       Technology.
     * Douglas O'Shaughnessy, Speech Communication: Human and Machine
       Addison Wesley series in Electrical Engineering: Digital Signal
       Processing, 1987.
     * T.V. Raman, Auditory User Interfaces --Toward The Speaking
       Computer Kluwer Academic Publishers, Boston, ISBN 0-7923-9984-6,
       August 1997, 168 pp.
     * D. H. Klatt, "Review of Text-To-Speech Conversion for English",
       Jnl. of the Acoustic Society of America (JASA), Vol 82, pp
       737-793.
     * "Talking Machines, Theories, Models and Designs" Eds, G. Bailly &
       C. Benoit (Elsevier: North Holland)
     * I. H. Witten. Principles of Computer Speech, London: Academic
       Press, Inc., 1982.
     * W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis,
       Elsevier, Amsterdam, 1995.
       Contents, preface etc on the WWW:
       http://www.elsevier.nl/section/engtech/scs/menu.htm
     * John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to
       Speech: The MITalk System", Cambridge University Press, 1987.
     * J.P.H. van Santen, R. W. Sproat, J. P. Olive, and J. Hirschberg,
       "Progress in Speech Synthesis", Springer, 1996.

  On the WWW

     * Survey of the State of the Art in Human Language Technology
       Report edited by Ronald A. Cole et. al. with a section on
       Text-to-Speech Technologies.
       http://www.cse.ogi.edu/CSLU/HLTsurvey/ch5node1.html

  Bibliographies and Reference Lists

     * WWW searchable online-bibiliography for Phonetics and Speech
       Technology with more than 8000 entries. Provided by Institut fur
       Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt.
       http://www.uni-frankfurt.de/~ifb/bib_engl.html
     * Computational Speech Processing: Speech Analysis, Recognition,
       Understanding, Compression, Transmission, Coding, Synthesis ; Text
       to Speech Systems, Speech to Tactile Displays, Speaker
       Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
       Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA
       inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
       See also: http://gomer.mlink.net/infolingua.html


___________________________________________________________________________

                   Q5.4: Speech Synthesis on the WWW

   Most of the following are links to WWW pages with demonstrations of
   speech synthesis. Plenty more links are included in the detailed list
   of speech synthesis software/hardware in Q5.5.

   Speech Synthesis "Museum"
          URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html
          Maintained by Jon Iles (j.p.iles@cs.bham.ac.uk) at the
          University of Birmingham.
          Information and speech samples for

          + YorkTalk
          + Loughborough Sound Images
          + University of Birmingham - FDFS
          + Eurovocs
          + DECtalk
          + AT&T Bell Labs Synthesiser
          + S.W.A.Ll.C. - Welsh Synthesis from CSTR
          + All-Prosodic Speech Synthesis - IPOX
          + Orator from Bellcore

   The Festival Speech Synthesis System
          http://www.cstr.ed.ac.uk/projects/festival.html
          Pre-synthesized examples in English, Welsh and Spanish, and
          online demo of English.

   Pavarobotti
          http://www.shc.uiowa.edu/fun/pavarobotti/pavarobotti.html
          WWW demo of the Pavarobotti synthesis technology developed at
          the National Center for Voice and Speech
          (http://www.shc.uiowa.edu/ncvs_home.html).

   Say...
          http://wwwtios.cs.utwente.nl/say
          WWW demo of the rsynth speech synthesis software. The WWW
          capability was implemented by Axel Belinfante.

   Musee sonore de la synthese de la Parole en francais
          http://www.icp.grenet.fr/exemples_synthese/ex.html
          Speech synthesis examples from a series of French language
          speech synthesisers plus links to other speech synthesis demo
          pages.

          + ICP-Grenoble
          + CNET-Lannion (with TD-PSOLA)
          + KTH-Stockholm
          + Universite-Mons - several versions

   Lucent Technologies Bell Labs Text-to-Speech
          http://www.bell-labs.com/project/tts/
          Demos and samples of the latest Lucent Technologies Bell Labs
          Text-to-Speech system.

   WATSON FlexTalk from AT&T Advanced Speech Products Group
          http://www.att.com/aspg/demo.html
          WWW interface to the WATSON FlexTalk speech synthesis
          demonstration.

   AT&T Bell Laboratories Voices
          http://www.research.att.com/cgi-bin/cgiwrap/mjm/voices.cgi
          WWW interface to the AT&T Bell Laboratories text to speech
          (TTS) synthesizer

   Laureate from British Telecom
          http://www.labs.bt.com/innovate/speech/laureate/
          Demo of the Laureate speech synthesis system - not yet
          commercially available.

   ORATOR from Bellcore
          Online demo of the ORATOR system developed at Bellcore.
          http://www.bellcore.com/ORATOR/

   SVOX from TIK, ETH in Zurich
          http://www.tik.ee.ethz.ch/cgi-bin/w3svox
          Demo of German speech synthesis from Institut fur Technische
          Informatik und Kommunikationsnetze.

   Speech Synthesis Research at OGI
          http://www.cse.ogi.edu/CSLU/research/TTS
          Examples of diphone speech corpora and algorithms developed at
          OGI for synthesis of American English and Mexican Spanish using
          the Festival framework.

   Lyricos
          http://www.cse.ogi.edu/CSLU/research/TTS/research/sing.html
          Demos of the Lyricos singing voice synthesis system.
          Concatenation-based synthesis of singing voice from MIDI input.

   Multi-Lingual TTS from Gerhard-Mercator University, Duisburg
          http://www.fb9-ti.uni-duisburg.de/demos/speech.html
          Synthesis in German, English or Japanese.

   TMH: Institutionen for Taloverforing och Musikakustik, Kungliga
          Tekniska Hogskolan
          http://www.speech.kth.se/info/software.html
          Synthesis in Swedish, Finish, Norwegian, Icelandic, Danish,
          British and American English, French, German, Italian, Spanish,
          LA Spanish and Greek.

   Haskins Laboratory WWW Site
          http://www.haskins.yale.edu/Haskins/MISC/special.html
          Examples of several types of speech synthesis. Articulatory
          Synthesis by HyperASY. SineWave Synthesis. Gestural
          Computational Model. Pattern Playback system of the 1940's!

   BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
          http://www.bestspeech.com/weblang.html

   Eurovocs Multilingual Speech Synthesis
          http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.h
          tml
          Based on Lernout and Hauspie technology.

   HADIFIX German Speech Synthesis
          http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html
          Provided by the Instituts fur Kommunikationsforschung und
          Phonetik, Universitat Bonn.

   Centigram's TruVoice Demo
          http://www.centigram.com/centigram/TruVoice/index.html
          Allows control of speech rate, pitch and other prosodic
          charateristics.

   MBROLA: Free Speech Synthesis Project
          http://tcts.fpms.ac.be/synthesis/modelcmp.html
          WWW demo of MBROLA which compares the quality of PSOLA,
          MBR-PSOLA, LPC, and Hybrid Harmonic/Stochastic concatenative
          synthesizers. Provided by the TCTS Lab, Faculti Polytechnique
          de Mons, Belgium

   Institute of Phonetic Sciences
          http://fonsg3.let.uva.nl/IFA-Features.html
          Links to lots of on-line speech synthesis demonstrations
          provided by the Institute of Phonetic Sciences of the Faculty
          of Arts of the University of Amsterdam.

   Yahoo page on speech generation
          http://www.yahoo.com/Science/Computer_Science/Artificial_Intell
          igence/Natural_Language_Processing/Speech_Generation/ 


___________________________________________________________________________

                   Q5.5: Speech Synthesis Software/Hardware

   Please email any updates, corrections or additions to the following
   list. The range of commercially available synthesis software is
   growing rapidly so any help in keeping up to date will be appreciated.

   Other lists of speech synthesis software on the WWW include:

    Kevin Lenzo's list of Macintosh Speech Resources and Apps
          http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html

    Speech Toys Speech Synthesis Information
          http://www.speechtoys.com/spchtoys/spsyn.html

  In the FAQ...

   The following speech recognition software/hardware is described in the
   comp.speech FAQ.

   _Apple Macintosh_
          * BeSTspeech from Berkeley Speech Technologies, Inc., (BST) 
          * Infovox Product Range 
          * Macintosh Speech Output Applications 
          * Macintosh Speech Synthesis Manager 
          * MacYack Pro 
          * MBROLA: Free Speech Synthesis Project 
          * ProVoice Developer's Speech Toolkit from First Byte 
          * SENSYN speech synthesizer 
          * Sound Bytes DeveloperUs Kit 
          * Macintosh Speech Synthesis Manager 

   _Windows (including 95, NT, 3.1)_
          * AcuVoice 
          * AT&T Watson Speech Synthesis 
          * BeSTspeech from Berkeley Speech Technologies, Inc., (BST) 
          * Creative TextAssist and TextAssist API 
          * DECtalk: Text-to-Speech from Digital 
          * ETI-Eloquence 
          * HADIFIX 
          * Infovox Product Range 
          * IPOX: All Prosodic Speech Synthesis Architecture 
          * Lernout and Hauspie Text-To-Speech Windows SDK 
          * Listen2 Text Reader 
          * MBROLA: Free Speech Synthesis Project 
          * Monologue for Windows from First Byte 
          * PAM - A Text-To-Speech Application 
          * ProVerbe Speech Engine from ELAN Informatique 
          * ProVoice Developer's Speech Toolkit from First Byte 
          * SENSYN speech synthesizer 
          * Sound Bytes DeveloperUs Kit 
          * Tinytalk 
          * TruVoice from Centigram 
          * WinSpeech 
          * ZMD Speech Synthesis 

   _DOS_
          * CSRE: Computerized Speech Research Environment 
          * Infovox Product Range 
          * MBROLA: Free Speech Synthesis Project 
          * ProVoice Developer's Speech Toolkit from First Byte 
          * SENSYN speech synthesizer 
          * spchsyn.exe 
          * Tinytalk 
          * ZMD Speech Synthesis 

   _OS/2_
          * ProVerbe Speech Engine from ELAN Informatique 
          * ProVoice Developer's Speech Toolkit from First Byte 
          * Sound Bytes DeveloperUs Kit 

   _Unix_
          * AcuVoice 
          * AsTeR 
          * BeSTspeech from Berkeley Speech Technologies, Inc., (BST) 
          * DECtalk: Text-to-Speech from Digital 
          * ETI-Eloquence 
          * Emacspeak - A Speech Output Subsystem For Emacs 
          * Festival Speech Synthesis System 
          * JSRU 
          * Klatt-style synthesiser 
          * KPE80 - A Klatt Synthesiser and Parameter Editor 
          * "learph": Trainable text-to-phoneme software by Antonio Lucca

          * Lucent Technologies Bell Labs Text-to-Speech system 
          * MBROLA: Free Speech Synthesis Project 
          * Orator from Bellcore 
          * ProVerbe Speech Engine from ELAN Informatique 
          * rsynth 
          * SENSYN speech synthesizer 
          * SGI Developers Toolbox Synthesiser 
          * Speak 
          * TrueTalk 
          * TruVoice from Centigram 

   _Integrated Circuits and Dedicated Hardware_
          * Eurovocs 
          * Infovox Product Range 
          * ProVerbe Speech Engine from ELAN Informatique 
          * RC Systems V8600/V8601 Text to Speech synthesizers 

   _Other Platforms_
          * BeSTspeech from Berkeley Speech Technologies, Inc., (BST) 
          * TheBigMouth (NeXT) 
          * MBROLA: Free Speech Synthesis Project 
          * Narrator Translator Library (Amiga) 
          * Narrator (Amiga) 
          * TextToSpeech Kit (NeXT) 
          * Orator from Bellcore 
          * SENSYN speech synthesizer 
          * WreadFiles: File reader for Commodore Amiga 

   _Unknown_
          * Lernout and Hauspie Text-To-Speech (3 products) 
          * SIMTEL 
          * Text to Phoneme Program 1 
          * Text to phoneme program 2 
          * Text to phoneme program 3 



AcuVoice

     * Platform: Windows, Solaris
     * Description: AcuVoice is a natural sounding text-to-speech system
       built using a concatenative approach. Currently it is available
       for an American English Male Voice. Software Developer Kits are
       available for the Windows Platform (32-Bit) and also for the
       Solaris Platform. More information and samples are available on
       the Acuvoice web site.
     * Contact: AcuVoice, Inc.
       84 W. Santa Clara Street, Suite 720, San Jose, CA 95113-1810
       Ph: 1(408)289-1661, Fax: 1(408)289-1201
       Demo: 1(408)289-1177
       Email: AcuVoice1@AOL.COM
       WWW: http://www.acuvoice.com/



AsTeR

     * Platform: UNIX
     * Description: TTS front-end program which encodes structural
       information about documents in speech synthesis. For more
       information check out:

                http://www.research.digital.com/CRL/personal/raman/aster/
                aster-toplevel.html

     * Operation requirements: Lisp: Lucid, clisp
     * Contact: T. V. Raman
       WWW: http://www.research.digital.com/CRL/personal/raman/raman.html

       Email: raman@adobe.com



AT&T Watson Speech Synthesis

     * Platform: Windows 95/NT on a Pentium 75 Mhz or higher
     * Description: Watson is a software implementation of AT&T Bell
       Laboratories voice processing technology. Watson includes BLASR
       Speech Recognition (see Q6.6) and FlexTalk speech synthesis. It
       requires no special hardware to run other than a standard sound
       card and/or phone card. Technical details for the FlexTalk speech
       synthesis include:
          + Compliant with MS Speech API.
          + Male and Female Voices available
          + 8 KHz and 11 KHz output
          + SoundBlaster compatible sound card and drivers required
          + Context sensitive abbreviation expansion
          + Accurate pronunciation of most proper names
          + Adjustable vocal tract size, speed, volume, pitch, etc.
          + American English only - other languages in development
       The AT&T Advanced Speech Products Group home page provides more
       detailed information including a Frequently Asked Questions list,
       information for application developers on the Independent Software
       Vendor (ISV) Program (including info on the SDK, licensing, and
       the training program).
     * Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz
       or higher (uses
     * Cost and Availability: WATSON is a software-based speech platform
       with a Software Developers Kit (SDK) that allows application
       developers to use voice processing in their applications. It is
       not available as a stand-alone product.
       Licensing information (inc. price) is provided in the AT&T
       Advanced Speech Products Group home page
     * See also: Watson BLASR speech recognition in Q6.5, Microsoft
       Speech API, and Advanced Speech API.
     * Contact: AT&T Advanced Speech Products Group
       Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
       Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
       Email: aspg@attmail.com
       WWW: http://www.att.com/aspg/



BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

     * Platform: available for Macintosh, Sun, Silicon Graphics, Windows
       PC and IBM RS/6000 platforms, and can be ported to others.
     * Description: BeSTspeech reads ASCII text no vocabulary limits.
       Available for Dutch, English (male and female), French, German,
       Italian, Portuguese, Spanish, Arabic, Cantonese, Japanese, Korean,
       Malay, Mandarin and Russian.
     * Availability: Berkeley Speech Technologies, Inc does not sell end
       user toolkits or products.
     * Contact: Berkeley Speech Technologies, Inc.
       2246 Sixth Street, Berkeley, California 94710, USA
       Ph: (510) 841-5083, Fax: (510) 841-5093
       Email: webmaster@bst.com
       WWW: http://www.bestspeech.com/index.html



TheBigMouth - a Text to Speech Program

     * Platform: NeXT
     * Description: Text to speech program based on concatenation of
       pre-recorded speech segments.
     * Availability:
       ftp://ftp.cs.keio.ac.jp/pub/NeXT/source/



Creative TextAssist

     * Platform: Windows
     * Description: Based on DECtalk speech synthesis. A detailed
       description of TextAssist is provided on the Creative WWW pages.
       TextAssist TextReader provides a convenient Windows user interface
       for text reading.
     * Availability: Creative TextAssist is bundled with most (all?)
       Creative Sound Blaster audio cards. TextAssist preview software is
       available from the Creative Labs TextAssist home page.
     * Contact: Creative Labs, Inc.
       Address, phone, email etc unknown
       WWW: http://www.creaf.com/ :
       http://www.creaf.com/wwwnew/tech/devcnr/tassist.html

Creative TextAssist API

     * Platform: Windows
     * Description: The TextAssist API (TAAPI) is created for Microsoft
       Windows 3.1x and Windows 95 developers who intend to develop
       16-bit Text-to-Speech software applications using Creative's
       TextAssist speech engine. It supports direct control of speech
       output characteristics, concurrent playback of text-to-speech and
       wave files, foreign language support, speech synchronization,
       exception dictionaries. It also includes a voice editing tool for
       creating new custom voices, a Visual Basic Custom Control for
       high-level support in Visual Basic and other languages
     * Availability: The TextAssist API is released to registered
       developers at no cost.
     * Contact: WWW: http://www.creaf.com/
       FAQ: http://www.creaf.com/wwwnew/tech/devcnr/tassfaq.html



CSRE: Computerized Speech Research Environment

     * Platform: DOS
     * Description: CSRE is a software system which includes in an
       implementation of the Klatt speech synthesizer. See the CSRE entry
       in Q1.9 and the AVAAZ WWW pages for more detail.
     * Contact: AVAAZ Innovations Inc.
       P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G
       2B0
       Ph: +1-519-472-7944 , Fax: +1-519-472-7814
       Email: info@avaaz.com
       WWW: http://www.icis.on.ca/homepages/avaaz/



DECtalk Speech Synthesis

     * Platform: Windows NT, Alpha with Digital UNIX and RS232 ports
     * Description: Converts ordinary text into natural-sounding,
       intelligible speech. Provides personalized voices, and extensive
       user controls. DECtalk technology is available for the following
       packaging options.
          + DECtalk PC card option: An industry-standard ISA/EISA bus
            card implementation that can be integrated with any Intel 486
            processor-based system running DOS or Windows. Applications
            can be interfaced to the bus via a DOS Terminate and Stay
            Resident (TSR) driver or a Windows Dynamic Link Library
            (DLL). This option is available with an external speaker with
            volume control and headphone jack.
          + DECtalk Express external package: An external, portable
            package that you can plug in to any PC or serial port. The
            external package includes a built-in speaker and headphone
            jack, plus combined on/off and volume controls and a
            rechargeable battery pack.
          + DECtalk Software solution: Software-only text to speech for
            Alpha or Intel systems running Windows NT or Alpha systems
            running Digital UNIX. Provides complete speech synthesis
            capabilities so developers can enhance applications with
            DECtalk technology. DECtalk Software output can be directed
            to audio devices, into WAVE files, or into memory buffers.
     * Pricing:
       ://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synthesis
       -oi.html
     * More Information:
       Digital Equipment Corporation WWW pages: http://www.digital.com/
       DECtalk page:
       http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.htm
       l
       Ph: 1-800-DIGITAL

DECtalk Software

     * Platform: Digital UNIX and Windows NT
     * Description: DECtalk converts standard ASCII text into natural,
       intelligible speech. Speech output through any audio device is
       supported by Microsoft Video for Windows or Multimedia Services
       for Digital UNIX. An API gives developers direct access to
       text-to-speech functions. Provides nine voice personalities (4
       female, 4 male, 1 child). Provides punctuation and tonal control,
       supports customized pronunciation of trade jargon and acronyms.
       Common programming interface works with both Alpha and Intel
       platforms.
     * More Information:
       Digital Equipment Corporation WWW pages: http://www.digital.com/
       DECtalk Software page:
       http://www.systems.digital.com/DIcatalog/html/DECtalk-Software.htm
       l
       WWW:
       http://www.systems.digital.com/DIcatalog/html/DECtalk-Speech-Synth
       esis.html
       Ph: 1-800-DIGITAL



ETI-Eloquence

     * Platform: MS Windows (Win95,NT,3.1), Solaris, SunOS, SGI, RS/6000
     * Description: ETI-Eloquence is a software based text-to-speech
       system. It generates waveforms completely algorithmically instead
       of by concatenating waveforms, for maximum flexibility and
       naturalism. For instance, when the user requests a deeper voice,
       the software simulates a larger vocal tract, instead of simply
       pitch-shifting samples. It uses high-level linguistic parsing,
       which obviates the need for a huge dictionary. It handles numbers,
       acronyms, currency, etc. It includes a set of annotation symbols,
       for placing stress on particular words, expressing
       excitement/boredom, etc. Also allows phonetic input. Supports MS
       SAPI.
       Produces male and female voices for General American English.
       Dialects under development include Alabama and Brooklyn.
     * Price: Flexible license agreements on application.
     * Availability:Eloquent Technology, Inc.
       2389 North Triphammer Road, Ithaca, NY 14850 , USA
       Ph: (607) 266-7025, Fax: (607) 266-7030
       Email: info@eloq.com
       WWW: http://www.eloq.com/



Emacspeak - A Speech Output Subsystem For Emacs

     * Platform: UNIX, Emacs
     * Description: Emacspeak is a speech output system that will allow
       someone who cannot see to work directly on a UNIX system.
       Emacspeak is built on top of Emacs. With emacspeak loaded, Emacs
       provides spoken feedback for everything you do. Emacspeak
       currently supports the new Dectalk Express speech synthesizer, as
       well as older versions of the Dectalk e.g. the MultiVoice. See the
       Emacspeak WWW page, the Emacspeak FAQ or the Emacspeak
       distribution for additional details.
     * Requirements: Requires GNU FSF Emacs 19 (version 19.23 or later)
       and TCLX 7.3B (Extended TCL) to run Emacspeak.
     * Availability:

        Emacspeak WWW page
                http://www.research.digital.com/CRL/personal/raman/emacsp
                eak/emacspeak.html

        Emacspeak source
                http://www.research.digital.com/CRL/personal/raman/emacsp
                eak/emacspeak.tar.gz

     * Contact: T. V. Raman, raman@adobe.com



Eurovocs

     * Platform: Various - RS232 hardware connection
     * Description: Eurovocs is a stand-alone text-to-speech synthesizer
       which uses the text-to-speech technology of Lernout and Hauspie
       Speech Products. Available for Dutch, French, German and American
       English with other languages planned for release soon. One
       Eurovocs device can support two different languages. Eurovocs can
       be connected to any computer via a standard serial interface
       (RS232). It supports personal dictionaries, generation of DTMF
       tones, and pronunciation of special character sequences such as
       digit strings, telephone-numbers, date and time indications,
       abbreviations, alphanumeric strings etc.
     * Contact: Technologie & Revalidatie
       Postbus 128, B-9000 Gent, Belgium
       Ph: +32-9-264 33 97, Fax: +32-9-264 35 94
       E-mail: noe@elis.rug.ac.be
       WWW:
       http://www.elis.rug.ac.be/ELISgroups/speech/research/eurovocs.html



Festival Speech Synthesis System

     * Platform: General Unix (including Solaris (2.4,2.5), SunOS, HPUX,
       SGIs, Linux, Dec Alpha, FreeBSD)
     * Description: Festival is a general multi-lingual speech synthesis
       system developed at CSTR, University of Edinburgh. It offers a
       full text to speech system with various APIs, as well an
       environment for development and research of speech synthesis
       techniques. It is written in C++ with a Scheme-based command
       interpreter for general control. Festival's home page offers
       demos, the full manual and access to the download page. The
       distribution includes full source and documentation, and lexicons
       and speech databases for British English text to speech.
     * Price: Free for non-commercial use
     * Availability: by anonymous ftp:
       WWW: http://www.cstr.ed.ac.uk/projects/festival/download.html
       ftp: ftp://ftp.cstr.ed.ac.uk/pub/festival/



HADIFIX

     * Platform: Windows
     * Description: German speech synthesis system developed at the
       Institute for Communications Research and Phonetics , University
       of Bonn. Provides conversion of input text to phonemes, automatic
       prediction of stress, phrasing and pitch, and speech generation by
       concatenation of small units of natural speech. Demisyllables and
       similar units are used; they comprise all consonants before the
       vowel and the beginning of the vowel (initial demisyllable) or the
       end of the vowel and the following consonants (final
       demisyllable). For example, the word 'Strolch' is formed by
       concatenating 'Stro' and 'olch'.
     * Demo: Windows demo software available. Limited to synthesis of one
       short text (text.txt) at a time. Speech format limitations too.
       1.3MB file.
       ftp://asl1.ikp.uni-bonn.de/pub/hadifix/
       A 1993 version is available with unlimited synthesis from a string
       of phonemic symbols and accent markers. 6MB file.
       ftp://asl1.ikp.uni-bonn.de/pub/hadifix/
     * WWW: http://asl1.ikp.uni-bonn.de/~tpo/Hadifix.en.html
     * On-line demo: http://asl1.ikp.uni-bonn.de/~tpo/Hadiq.en.html



Infovox Product Range

     * Description: Multilingual Text-to-speech systems, languages
       available: American English, British English, German, French,
       Spanish, Italian, Swedish, Norwegian, Icelandic, Danish and
       Finnish.
     * Product name:INFOVOX 500, PC BOARD
          + Product description: Half length expansion board for IBM PC,
            XT, AT, PS/2 model 30 or compatible personal computers. The
            board can also be connected via the serial port. Language and
            control program for downloading into RAM or mounted on EPROMs
          + Platform: DOS/Windows with IBM PC, XT, AT, PS/2 model 30 or
            compatible
          + Delivered standard interface: MS DOS I/O driver
     * Product name: INFOVOX 600, OEM BOARD
          + Product description: OEM board built with CMOS IC's. Language
            and control program are stored in on-board fixed memory.
          + Platform: any, hardware interface: 9-pole D-SUB (RS 232-C)
            300-9600 Baud.
          + Delivered standard interfaces: MS DOS I/O driver and
            interface to Apple Speech manager.
     * Product name: INFOVOX 700, DESKTOP UNIT
          + Product description: Desktop unit with built in Infovox 600
            to be connected to any computer or terminal via an RS 232-C
            serial interface. Built in loudspeaker and rechargable
            battery for 4 hours use, and control knobs for continuous
            control of speech volume and speed.
          + Platform: various through hardware interface
          + Delivered standard interfaces: MS DOS I/O driver and
            interface to Apple Speech manager
     * Product name: INFOVOX 650, OEM BOARD
          + Product description: OEM-board built with CMOS IC's. Language
            and control program are stored in on-board memory.
          + Platform: any, hardware interface: 9 pole D-SUB (RS 232-C)
            300-9600 Baud
          + Delivered standard interfaces: MS DOS I/O driver and
            interface to Apple Speech manager
     * Product name: INFOVOX 750, DESKTOP UNIT
          + Product description: Desktop unit with built in Infovox 650
            to be connected to any computer or terminal via an RS 232-C
            serial interface. Built in loudspeaker and rechargable
            battery for 5 hours use, and a control knob for continuous
            control of speech volume.
          + Platform: various through hardware interface. Delivered
            standard interfaces include MS DOS I/O driver and interface
            to Apple Speech manager
     * Product name: Infovox 210, software for Apple Macintosh
          + Product description: Software based text-to-speech
            conversion. Produces 16 bit and 8 bit sound. Delivered on
            3.5" diskettes with user lexicon and a complete
            documentation.
          + Platform: Apple Macintosh with minimum 68030, 33 MHz
            microprocessor.
          + Delivered standard interfaces: Standard interface to Apple
            Speech manager
     * Product name: Infovox 220, software for Microsoft Windows.
          + Product description: Software based text-to-speech
            conversion. Produces 16 bit sound and conforms to Microsoft
            Windows multimedia standard MCI. Delivered on 3.5" diskettes
            with user lexicon and a complete documentation.
          + Platform: Windows on IBM compatible PC with minimum 486/25MHz
            microprocessor.
          + Delivered standard interfaces: Standard interface to
            Microsoft Windows 3.1 and sound boards supporting Microsoft
            Windows multimedia driver for audio.
     * Contact: Telia Promotor Infovox AB
       TTS Sales Division
       P.O. Box 2069, S-171 02 Solna, Sweden
       Ph: +46 8 764 35 00, Fax: +46 8 735 78 76
       Email: tts-sales@infovox.se
       WWW: http://www.promotor.telia.se/NYA/cc/t-s/index.html



IPOX: All Prosodic Speech Synthesis Architecture

     * Platform: Windows
     * Description: IPOX is an experimental, all-prosodic speech
       synthesizer, developed by Arthur Dirksen and John Coleman. IPOX is
       freely available (after registration) for evaluation and
       non-profit research purposes.
     * Requirements: PC (preferably a fast 486) running Windows 3.1 or
       higher. Sound output requires a 16-bit Windows-compatible sound
       card
     * Availability: By WWW from
       http://www.tue.nl/ipo/people/adirksen/ipox/ipox.htm



JSRU

     * Platform: UNIX and PC
     * Cost: 100 pounds sterling (from academic institutions and
       industry)
     * Description: A C version of the JSRU system, Version 2.3 is
       available. It's written in Turbo C but runs on most Unix systems
       with very little modification. A Form of Agreement must be signed
       to say that the software is required for research and development
       only.
     * Contact: Dr. E.Lewis _eric.lewis@bristol.ac.uk)_



Klatt-style synthesiser

     * Platform: Unix
     * Cost: Free
     * Description: Software posted to comp.speech in late 1992.
     * Availability: By ftp from the comp.speech ftp site
          +
            ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.
            04.tar.gz 
          +
            ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/klatt.3.
            04.tar.Z 
     * See also: KPE80 - A Klatt Synthesiser and Parameter Editor.



KPE80 - A Klatt Synthesiser and Parameter Editor

     * Platform: Unix
     * Description: The KPE80 program provides a graphical interface for
       the implementation of the Klatt 1980 formant synthesiser written
       by Jon Iles and Nick Ing-Simmons. It was inspired by IGE, a piece
       of code written by Rob Fletcher (
       http://www.york.ac.uk/~rpf1/IGE.html).
     * Technical Desc.: It is comprised of an X-Window interface and
       version 3.03 of the synthesiser code. The interface allows users
       to display and edit Klatt parameters using a graphical display
       which includes the time-amplitude waveform of both the original
       speech and its synthetic copy, and some signal analysis
       facilities. Most of the work in choosing the parameter values to
       produce the synthetic copy has to be done by the user. KPE will
       estimate the fundamental frequency contour from an original token;
       this estimate will need to be amended where errors occur. It is
       possible to specify the formant trajectories with some precision
       by overlaying the appropriate formant frequency parameter tracks
       on the spectrogram of the target waveform. A number of facilities
       exist to help in the refinement of parameter values: original and
       synthetic waveforms can be compared aurally, spectrally, and
       spectrographically using built-in speech analysis facilities.
     * File formats: KPE will read RIFF (.wav) files and SFS files. (SFS
       is a suite of speech-signal processing programs available free
       from Phonetics and Linguistics, UCL.)
     * Availability:

        KPE for SunOs 4.1.3 (statically compiled libraries)
                ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.sun413.tar.Z

        KPE for Linux (statically compiled libraries)
                ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.linux.tar.Z

        The source code (needs gcc and SUIT to compile)
                ftp://pitch.phon.ucl.ac.uk/pub/kpe/kpe80.src.tar.Z

        A postscript overview of KPE
                ftp://pitch.phon.ucl.ac.uk/pub/kpe/OVERVIEW.ps

        The SFS distribution
                ftp://pitch.phon.ucl.ac.uk/pub/sfs/

     * See also: Public domain Klatt-style speech synthesis code.
     * Contact: Andrew Simpson
       Department of Phonetics and Linguistics, University College London

       Wolfson House, 4 Stephenson Way, London NW1 2HE
       Email: a.simpson@ucl.ac.uk
       WWW: http://www.phon.ucl.ac.uk/home/andrew/home.html



"learph": Trainable text-to-phoneme software by Antonio Lucca

     * Platform: UNIX
     * Description: Experimental software which learns text to phoneme
       translation from examples using decision-tree-like data
       structures. It is based on the assumption that each letter can
       correspond to different phoneme strings depending on the context.
     * Availability: Examples and source are available on the WWW:
       http://www.silab.dsi.unimi.it/~al367212/ttsdoc.html
     * Contact: Antonio Lucca: toninlcc@tesi.dsi.unimi.it



Lernout & Hauspie Text-to-Speech (3 products)

   Lernout & Hauspie have three TTS products. The functionality of the
   products is similar, however, they differ in hardware implementation
   and other details where described below.

     * L&H tts2000/T: TTS for the Telephony and Telecommunications Market
     * L&H tts2000/M: TTS for the Computer and Multimedia Market
     * L&H tts3000/C: TTS for the Buisness and Consumer Electronics
       Market

     * Description: Text to Speech (TTS) software based on parameterized
       segment concatenation (diphones, triphones and tetraphones)
       algorithms. Available for US English, German, Dutch, French,
       Spanish (Castilian), Italian and Korean. General features include:
          + The control of volume, speech rate and speech pitch.
          + The use of control sequences to customize TTS output (adding
            pauses, using phonetic input, etc.).
          + Switching between languages at run time.
          + A personal vocabulary editor is available for building
            exception dictionaries.
          + Readout modes: letter by letter, word by word or sentence by
            sentence.
          + Input formats: orthographic input, phonetic input, phonetic
            input with prosodic information.
     * tts2000/T
          + Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
            linear PCM.
          + Sampling Frequency: 8kHz
          + Single channel platform examples: SHARP SH7000, ARM6/ARM7,
            Intel i960, TI TMS320C31, AT&T DSP3210
          + Multi-channel platform examples: TI TMS320C31, AT&T DSP3210
     * tts2000/M
          + Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit
            A-law PCM, 16 bit linear PC.
          + Sampling Frequency: 8/10/11.025 kHz
          + Single processor platform examples: ARM6/ARM7, Intel
            386/486/Pentium, Motorola 68040
          + Two processor platform examples: {Intel 386/486/Pentium or
            Motorola 68030} and {ADI ADSP21XX or Motorola 5600X or TI
            TMS320C25/20C5X}
     * tts3000/C
          + Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit
            linear PCM.
          + Sampling Frequency: 10kHz
          + Single processor platform examples: SHARP SH7000, ARM6/ARM7,
            Intel i960, TI TMS320C31, AT&T DSP3210
          + Two processors platform examples: { SHARP SH7000 or ARM6/ARM7
            or Intel 386EX or Motorola 683XX} and {ADI ADSP21XX or
            Motorola 5600X or TI TMS320C25/C5X or TI TSP50C10}
     * See also: L&H Windows TTS SDK
     * More Information: on the Lernout & Hauspie WWW pages:
       http://www.lhs.com/tts.html
     * Price: Unknown
     * Contact: Lernout and Hauspie Speech Products
       20 Mall Road, 4th Floor
       Burlington, MA 01803, USA
       Ph: +1-617-238-0960, Fax: +1-617-238-0986
       Email: sales@lhs.com
       WWW: http://www.lhs.com/



Lernout & Hauspie Text-to-Speech Windows SDK

     * Platform: Windows
     * Description: The L&H Text-to-Speech software developers kit is
       able to integrate text-to-speech technology with your own or
       existing PC applications under Microsoft Windows 3.1. This
       software will allow conversion of written text into clear human
       sounding synthetic speech.
     * Requirements: IBM-compatible PC 386 DX/33 + 8Mb RAM + MS DOS 5.0 +
       MS Windows 3.1 (or higher) + SoundBlaster compatible sound board.
     * See also: L&H TTS Products
     * More Information: on the Lernout & Hauspie WWW pages:
       http://www.lhs.com/tts.html
     * Price: Unknown
     * Contact: Lernout and Hauspie Speech Products
       20 Mall Road, 4th Floor
       Burlington, MA 01803, USA
       Ph: +1-617-238-0960, Fax: +1-617-238-0986
       Email: sales@lhs.com
       WWW: http://www.lhs.com/



Listen2 Text Reader

     * Platform: Windows
     * Description: Listen2 is a multi-voice, multi-language text reader.
       Listen2 comes in two versions, English only that uses high quality
       male and female voices, and the International version that can
       speak up to 5 different languages: English, German, French,
       Spanish or Italian, all in male voices. The basic International
       program comes with built-in English and additional language fonts
       can be purchased separately. The English version comes complete.
       Both programs are dynamically switchable and configurable. This
       means that you can press a hot key to speed up the speech, make it
       louder or quieter, etc., as it is reading a file. You can also
       insert flags in text files to make it switch voices or switch
       languages, depending on what version you have.
       Listen2 has all the features of the JTS Reader shareware program
       plus a few more. It will voice your reminder messages or
       appointment list on start-up. It will also speak a reminder
       message on shutting down.
     * WWW: A more complete description is available on the Listen2 web
       page
     * Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,
       JTS Micro Consulting Ltd
       10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0
       WWW: http://www.islandnet.com/jts/



Lucent Technologies Bell Labs Text-to-Speech system

     * Platform: UNIX and Win-95/NT
     * Description:Lucent Technologies provides a web site with demos and
       samples of their latest speech synthesis technology. The site has
       interactive demos in American English, German, and Mandarin
       Chinese, and the capability to adjust voice parameters on the fly.
       Pre-synthesized demos for French, Italian, Russian, and Romanian
       are also provided.
       The site includes downloadable papers with detailed system
       descriptions.
     * WWW: http://www.bell-labs.com/project/tts/



Macintosh Speech Output Applications

     * Platform: Macintosh
     * Description: A comprehensive list of Macintosh Speech Applications
       is provided by Kevin Lenzo at CMU:
       http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html
       The Apple Speech WWW Site also has some useful information:
       http://www.speech.apple.com/



Speech Manager and PlainTalk

     * Platform: Macintosh
     * Description: Apple's text-to-speech system extensions that enable
       applications to perform text-to-speech conversion. The Speech
       Manager runs on most Macs, but PlainTalk (and the high quality
       voices) requires a 68020 Mac or better.
     * Availability: By anonymous ftp from:
       ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/Syst
       em/PlainTalk 1.4.1/
       This directory contains subdirectories for recent versions of
       PlainTalk. The current release (PlainTalk 1.4.1) contains the
       English Text-To-Speech with about a dozen voices
       (English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish
       (Mexican_Spanish_TTS.hqx: 2.8 MByte), and the English Speech
       Recognition software (English_Speech_Recognition.hqx: 2.3MByte).
     * Cost: Free
     * WWW: The latest information is available from Apple's WWW page for
       speech recognition and synthesis:
       http://www.speech.apple.com/
     * Note 1: Check out Kevin Lenzo's list of Macintosh Speech
       Applications.
     * Note 2: Joshua Baer (josh@skyweyr.com) runs a mailing list for
       Plaintalk. For subscription and other information visit the
       Plaintalk Discussion List Home page
     * Contact: Apple Computer, Inc.
       1 Infinite Loop, Cupertino, CA 95014, USA
       WWW: http://www.speech.apple.com/
       Email: PlainTalk@atg.apple.com



MacYack Pro

     * Platform: Macintosh
     * Description: MacYack Pro is a commercial speech package for
       Macintosh that uses the PlainTalk Text-to-Speech synthesis
       software. Features include:
          + Add speech to any word processor.
          + Hear notification dialogs and other dialog boxes.
          + See and hear a customized message at startup or shutdown.
          + Hear calculations instantly.
          + Correct pronounciation errors.
          + Create custom double-clickable "speech files."
          + Have speaking alert sounds.
          + Add speech to HyperCard stacks.
          + Use AppleScript to add speech to other programs.
     * Price: $29.95 for a limited time, reduced from $49.95 regular
       price. 30 days money back guarantee.
     * Contact: Scantron Quality Computers
       20200 Nine Mile Rd. St. Clair Shores, MI 48080
       Ph: 1-800-777-3642, Fax: 810-774-2698
       E-mail: sales@sqc.com
       WWW: http://www.sqc.com/
       Product Info: http://www.lowtek.com/macyack/



MBROLA: Free Speech Synthesis Project

     * Platform: Sun4, Sun/SunOS5.4, HP, VAX/VMS, DEC Alpha/VMS, PS/DOS,
       PS/Windows 3.1, PS/Windows 95, PC/Solaris2.4, PC/Linux, SGI
       INDY/IRIX, NeXT, and soon for Macintosh.
     * Description: MBROLA is a high-quality, diphone-based speech
       synthesizer which is available for free. It is provided by the
       TCTS Lab of the Faculte Polytechnique de Mons (Belgium) which aims
       to obtain a set a speech synthesizers for as many languages as
       possible which will be free of use for non-commercial,
       non-military applications.
       MBROLA 2.00 takes a list of phonemes as input, together with
       prosodic information (duration of phonemes and a piecewise linear
       description of pitch), and produces 16bit speech samples at the
       sampling frequency of the diphone database (typically 16kHz). (It
       is therefore NOT a Text-To-Speech (TTS) synthesizer, since it does
       not accept raw text as input.) Databases are now being prepared
       for English, Spanish, Italian, Dutch, and Romanian. Collaborations
       are welcome. More information can be found at the MBROLA project
       homepage.
     * Demonstration: WWW demo of MBROLA which compares the quality of
       PSOLA, MBR-PSOLA, LPC, and Hybrid Harmonic/Stochastic
       concatenative synthesizers is available at
       http://tcts.fpms.ac.be/synthesis/modelcmp.html.
     * Contact: Dr Thierry Dutoit
       Faculte Polytechnique de Mons, TCTS Lab,
       31, bvd Dolez, B-7000 Mons, Belgium.
       Ph: +32-65-374133, Fax: +32-65-374129
       e-mail: mbrola@tcts.fpms.ac.be
       WWW: http://tcts.fpms.ac.be/synthesis/mbrola.html



Monologue for Windows from First Byte

     * Platform: Windows
     * Description: Monologue is a software program that reads text from
       the clipboard in Windows 16 or 32 bit applications. It can be
       found as a bundled product with many sound cards and multimedia
       general purpose computer systems. Monologue can add the element of
       speech to virtually any text oriented application. Any
       pronounceable combination of letters and numbers will be spoken
       clearly. It can be applied to tasks such as eyes-free
       proofreading, data verification (e.g. spreadsheets), reading
       E-mail and more. User-changeable parameters provide control over
       the sound quality by allowing for changes in pitch, and the speed
       of speech. An exception dictionary saves preferred pronunciation
       of words and abbreviations.
       Monologue Win32 now includes support for the Microsoft SAPI.
       Monologue male "SpeechFonts" are available for US English, British
       English, German, French, Latin American Spanish, Italian. A US
       English Female SpeechFont is also available.
       For more detailed information and examples go to the First Byte
       WWW pages.
     * Availability: Currently bundled with many sound cards and
       multimedia general purpose computer systems. For pricing,
       licensing details, and release information see the First Byte WWW
       pages or email info@firstbyte.davd.com.
     * See also: ProVoice Developer's Speech Toolkit from First Byte
     * Contact: First Byte
       19840 Pioneer Ave., Torrance, CA 90503
       Ph: 310-793-0610 Fax: 310-793-0611
       Email: info@firstbyte.davd.com
       WWW: http://www.firstbyte.davd.com/



Narrator Translator Library

     * Platform: Amiga
     * Description: A US English text to phoneme translator, implemented
       as a resident software library, for use with the Amiga Narrator
       Device. This software was supplied as a standard part of the Amiga
       operating system software up to O.S version 2.04. (Translator
       version 37.1, 1991) Approximately 700 translation rules are used
       to create the 'ARPAbet' phonemes. This software is functional on
       all current Amiga systems (O.S. 3.1).
     * Availability: limited to pre-owned system software disks and
       unsold O.S upgrade kits (Pre-O.S. 2.1).

Replacement Library: Translator42

     * Platform: Amiga
     * Description: an independent replacement for the Commodore-supplied
       "translator.library" which is a part of the Narrator speech
       synthesis package. It implements multi-lingual text-to-speech for
       an Amiga. The translation rules for each language are defined in a
       plain text 'Accent' file.
       There is a provision for the selection of unique languages for
       text segments by inserting in-line markup codes in the text: e.g.
       "Hello there! \french{Bonjour} \deutsch{gute morgen}".
       'Accent' files for American English, British English, Swedish,
       Maori, Finnish, German, Icelandic, Klingon, Polish, Italian, and
       Welsh languages included in the archive.
     * Availability: Amiga The most current version, 42.4, of the library
       and source are available by anonymous ftp from Aminet:
       ftp://ftp.doc.ic.ac.uk/pub/aminet/util/libs/
       ftp://ftp.doc.ic.ac.uk/pub/aminet/dev/src/



Narrator

     * Platform: Amiga
     * Description: Formant based speech synthesis. Includes a
       Engish-to-phoneme translation library, and a SPEAK: pseudo-device
       for speech output.
     * Hardware: Standard Amiga hardware
     * Availability: Part of AmigaOS
     * See Also: The Narrator Translation library



TextToSpeech Kit

     * Platform: NeXT Computers
     * Description: The TextToSpeech Kit does unrestricted conversion of
       English text to synthesized speech in real-time. The user has
       control over speaking rate, median pitch, stereo balance, volume,
       and intonation type. Text of any length can be spoken, and
       messages can be queued up, from multiple applications if desired.
       Real-time controls such as pause, continue, and erase are
       included. Pronunciations are derived primarily by dictionary
       look-up. The Main Dictionary has nearly 100,000 hand-edited
       pronunciations which can be supplemented or overridden with the
       User and Application dictionaries. A number parser handles numbers
       in any form. A letter-to-sound knowledge base provides
       pronunciations for words not in the Main or customized
       dictionaries. Dictionary search order is under user control.
       Special modes of text input are available for spelling and
       emphasis of words or phrases. The actual conversion of text to
       speech is done by the TextToSpeech Server. The Server runs as an
       independent task in the background, and can handle up to 50 client
       connections.
     * Misc: The TextToSpeech Kit comes in two packages: the Developer
       Kit and the User Kit. The Developer Kit enables developers to
       build and test applications which incorporate text-to-speech. It
       includes the TextToSpeech Server, the TextToSpeech Object, the
       pronunciation editor PrEditor, several example applications,
       phonetic fonts, example source code, and developer documentation.
       The User Kit provides support for applications which incorporate
       text-to-speech. It is a subset of the Developer Kit.
     * Hardware: Uses standard NeXT Computer hardware.
     * Cost:
          + TextToSpeech User Kit: $175 CDN ($145 US)
          + TextToSpeech Developer Kit: $350 CDN ($290 US)
          + Upgrade from User to Developer Kit: $175 CDN ($145 US)
     * Availability: Trillium Sound Research

    1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
    Tel: (403) 284-9278 Fax: (403) 282-6778
    Order Desk: 1-800-L-ORATOR (US and Canada only)
    Email: TTSInfo@trillium.ab.ca



Orator Text-to-Speech Synthesizer

     * Platform: SUN SPARC, Decstation 5000. Written in C, and therefore
       portable to other UNIX platforms. Some successful ports: HP,
       RS-6000, PC-Unix [Linux].
     * Description: Sophisticated speech synthesis package. Has text
       preprocessing (for abbreviations, numbers), acronym rules, and
       human-like spelling routines. Natural-sounding synthesis based on
       demisyllable concatenation. Has high accuracy for pronunciation of
       names of people, places and businesses in America; good accuracy
       for English text; rules for stress and intonation marking; various
       methods of user control and customization at most stages of
       processing.
       A new version of the ORATOR system is under development. Both
       ORATOR and this new "ORATOR II" system are capable of general text
       synthesis. The ORATOR II system has a more natural-sounding voice.
     * Hardware: Runs on common SPARC or Decstation workstations, using
       their internal audio output capability. Recommend at least 16M of
       memory.
     * WWW: More detailed information plus examples of ORATOR synthesis
       are available on the ORATOR WWW pages:
       http://www.bellcore.com/ORATOR/
     * Misc 1: A free demo cassette is available.
     * Misc 2: Examples of Orator are also available on the University of
       Birmingham Speech Synthesis "Museum" WWW site (see Q5.4).
     * Availability and Pricing: Contact Bellcore's Licensing Office
       Tel: 1-800-521-CORE (521-2673)
       Fax: 1-908-336-2559
       Email: Anthony Lindsey: alin1@panix.com
       WWW: http://www.bellcore.com/ORATOR/



PAM - A Text-To-Speech Application

     * Platform: Windows
     * Description: PAM is a talking personal assistant and text reader
       application. It uses the ProVoice TTS package. PAM will verbally
       advise about appointments and reminder messages at specified times
       during the day. It can read text files, clipboard text, and text
       sent in DDE messages. Using the full verbal interface, PAM can be
       used by visually challenged individuals. Shareware - thirty day
       free trial.
     * Requirements: Any Windows sound card, speakers or headphones. Min.
       memory - 4 megs, 8 megs recommended.
     * WWW: A more complete description is available on the JTS homepage:
       http://www.islandnet.com/~tslemko/
     * Availability: The shareware can be downloaded by ftp from
       ftp://ftp.islandnet.com/jts/pam_en3c.zip. The file size is approx.
       1 MByte.
     * Price: $US40 for the registered version.
     * Contact: Tom Slemko: e-mail: tslemko@islandnet.com, or,
       JTS Micro Consulting Ltd
       10931 Lytton Road, RR#4, Ladysmith, B.C., Canada, V0R 2E0



ProVerbe Speech Engine from ELAN Informatique

     * Platform: Windows 3.x, NT, 95, OS/2, Unix Solaris, Unix SCO and
       hardware
     * Description: The ProVerbe Speech Engine from ELAN Informatique
       produces natural sounding speech from written text. Naturalness is
       achieved by using the TD-PSOLA process from the CNET (France
       telecom's research lab.) which is based on the concatenation of
       elementary speech units (including diphones). Supported languages
       are British English, American English, Russian, German, French and
       Spanish. For multi-channel applications Elan Informatique also
       provides hardware platforms.
       Elan Informatique provides a SDK reference document (sdken.doc:
       WinWord6 format).
     * Demo versions: Telephone demonstration: +33-561 17 67 01
       Sample sound files and demonstration software available.
       A CD-ROM with all these demonstrations is available by
       registration.
     * Contact: Elan Informatique
       4 rue Jean Rodier, 31400 TOULOUSE FRANCE
       Contact person: Pierre Delrat
       Phone: +33-561-36-0777 Fax: +33-61-36-0770
       BBS: +33-561-36-0788
       E-mail: sales@elan.fr
       ftp: ftp://ftp.elan.fr
       WWW: http://www.elan.fr/



ProVoice Developer's Speech Toolkit from First Byte

     * Platform: ProVoice Developer's Toolkits are available for DOS,
       Windows 3.1, Windows 95, Windows NT, OS/2, and Macintosh.
     * Description: ProVoice allows programmers to add synthesized speech
       to their applications. Your program passes text strings to the
       ProVoice speech engine that translates text into audible speech.
       Male and/or female "SpeechFonts" are available for many languages;
       English, French, German, UK British English, Italian, and Spanish.

       ProVoice converts text to speech in two phases using a set of
       phonetic translation and pronunciation rules. First, the software
       analyzes and translates text into "sound descriptors", a phonetic
       language with pitch, duration, and amplitude codes which are
       needed to produce stress patterns in phrases and sentences. Rules
       are used to analyze words, numbers, and punctuation. The second
       phase converts the intermediate phonetic language in speech
       signals; algorithms drive distinct speech signals into smooth
       flowing, continuous, clear speech. Real time synchronization of
       mouth movement and word boundaries allows animation of a graphical
       talking character, or highlighting of displayed text as it is
       spoken.
       Necessary tools and examples are provided for programmers to
       manipulate the ProVoice speech technology; including installation
       instructions, extensive samples programs, and complete
       documentation. In addition, sample code is provided on disk to
       illustrate speech programming techniques.
     * Note 1: First Byte will perform custom work for embedded systems.
     * Note 2: ProVoice Windows includes support for the Microsoft SAPI.
       It will speak through any Windows-supported wave audio device.
     * Note 3: Distribution of ProVoice for commercial use is subject to
       execution of a Commercial Product Distribution License Agreement.
     * WWW: For more detailed information and examples go to the First
       Byte WWW page: http://www.firstbyte.davd.com/
     * See also: Monologue for Windows from First Byte
     * Price and Availability: Contact First Byte
     * Contact: First Byte
       19840 Pioneer Ave., Torrance, CA 90503
       Ph: 310-793-0610, Fax: 310-793-0611
       Email: info@firstbyte.davd.com
       WWW: http://www.firstbyte.davd.com/



RC Systems V8600/V8601 Text to Speech synthesizers

     * Platform 1: IBM PC: ISA card.
     * Platform 2: Interface to PC/104 standard microcontrollers.
     * Platform 3: Standalone (or embedded) hardware thru RS232 or
       parallel printer port or processor bus.
     * Description: Converts plain ASCII text to speech. Programmable
       voices, pitch rate, volume, etc. Built-in DTMF and tone
       generators.
     * Price: $151-$299 US (qty 1)
     * Contact: RC Systems

    1609 England Avenue, Everett, WA 98203, USA
    Ph: (206) 355-3800 Fax: (206) 355-1098
    Europe: +44181 539-0285



rsynth

     * Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI
       Irix4.x, Linux)
     * Description: Public domain text-to-speech systm assembled from a
       variety of sources. It supports CMU and BEEP format dictionaries
       (as described in Q1.10) and now utilises stress marks in the
       dictionary in synthesising intonation.
     * Price: Free
     * Misc: Axel Belinfante has implemented a WWW rsynth demo:
       http://wwwtios.cs.utwente.nl/say.
     * Availability: by anonymous ftp from

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsy
                nth-2.0.tar.Z 

                ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/rsy
                nth-2.0.tar.gz 



SENSYN speech synthesizer

     * Platform: PC/DOS/Windows, Macintosh, Sun, and NeXT
     * Rough Cost: $300
     * Description: This formant synthesizer produces speech waveform
       files based on the (Klatt) KLSYN88 synthesizer. It is intended for
       laboratory and research use. Note that this is NOT a
       text-to-speech synthesizer, but creates speech sounds based upon a
       large number of input variables (formant frequencies, bandwidths,
       glottal pulse characteristics, etc.) and would be used as part of
       a TTS system. Includes full source code.
     * Availability: Sensimetrics Corporation
       Sidney Street, Cambridge MA 02139.
       Fax: (617) 225-0470; Tel: (617) 225-2442.
       Email: sensimetrics@sens.com
       WWW: http://www.sens.com/



SGI Developers Toolbox Synthesiser

     * Platform: SGI
     * Description: The SGI Developer Toolbox 4.0 CDROM contains a
       basicpublic domain text-to-speech program in the publics/speak
       directory. The directory includes man pages and source.
     * Availability: on the SGI Developer Toolbox 4.0 CDROM



SIMTEL

   A wide range of speech related software, sound-blaster software and
   signal processing software for PCs is available on SimTel and its
   mirror sites. It can be obtained by ftp from:

          ftp://ftp.coast.net/SimTel/msdos/voice/

   and is now on the WWW:

          http://www.acs.oakland.edu/oak/SimTel/win3/sound.html

    Voicemaker

   The archives include the program Voicemaker which synthesises speech
   from phonemes using "concatenation" of phonemes recorded by the user.
   Voicemaker is a freeware program. It requires an IBM or compatible,
   512KB RAM, sound blaster compatible sound card.

          ftp://ftp.coast.net/SimTel/msdos/voice/



Sound Bytes DeveloperUs Kit

     * Platform: Subroutine library for Windows, OS/2 and Macintosh
     * Hardware: Windows - 16 MHz 80386 (minimum) running Windows 3.1; 4
       Mb RAM with at least 1.4 Mb RAM free. Disk space 1.4 Mb.
       OS/2 - 16 MHz 80386 (minimum) running OS/2 2.0 or above; 8 Mb RAM
       with at least 1.4 Mb RAM free.
       Mac - Any Mac with at least 2.5 Mb of RAM running 6.0.4 or higher.
       Telephone compatible. Compatible with commonly used sound cards.
     * Description: SBDK is a software-only sentence-level synthesizer
       that converts unrestricted English text (ASCII) into synthesized
       voice through diphone concatenation. SBDK utlizes parsing to
       incorporate the intonational and rhythmic patterns of normal
       speech. The developerUs kit includes two voices, one female and
       one male. The product has a 55,000-word built-in dictionary and a
       tool for creating customized user dictionaries. It converts
       numbers, dates, dollars, phone numbers and times to words, and has
       a SoundOut facility that provides a choice of pronouncing unknown
       words phonetically or spelling them out. Developers can vary voice
       pitch (130-220 Hz) and rate (65-200 wpm), synchronize speech to
       other events, have multiple channels of speech to the same or
       different boards, etc. Speech sampling options: 8-bit linear;
       8-bit companded at 11 kHz (Windows); 8-bit mu-law PCM at 8 or 11
       kHz; 16-bit linear at 11 kHz.
     * Cost: Sound Bytes may be licensed for internal use or resale. Site
       license fee= $3750. Resale or Internal runtime fees= 2% of net
       sales price per runtime sold, OR $150 per telephone port, OR per
       unit pricing for internal use determined case-by-case.
     * Misc: Demo disks are available for Windows and the Mac.
     * Availability: Natural Speech Technologies, Inc.
       Ph: (619) 457-2526.



spchsyn.exe

     * Platform: DOS
     * Availability: By anonymous ftp as a self extracting DOS archive.
       ftp://evans.ee.adfa.oz.au/mirrors/tibbs/applications/
     * Requirements: May require special TI product(s), but all source is
       there.



"Speak" - a Text to Speech Program

     * Platform: Sun SPARC
     * Description: Text to speech program based on concatenation of
       pre-recorded speech segments. A function library can be used to
       integrate speech output into other code.
     * Hardware: SPARC audio I/O
     * Availability: by anonymous ftp
       ftp://wilma.cs.brown.edu/pub/speak.tar.Z



Speech Manager and PlainTalk

     * Platform: Macintosh
     * Description: Apple's text-to-speech system extensions that enable
       applications to perform text-to-speech conversion. The Speech
       Manager runs on most Macs, but PlainTalk (and the high quality
       voices) requires a 68020 Mac or better.
     * Availability: By anonymous ftp from:
       ftp://ftp.support.apple.com/pub/apple_sw_updates/US/Macintosh/Syst
       em/PlainTalk 1.4.1/
       This directory contains subdirectories for recent versions of
       PlainTalk. The current release (PlainTalk 1.4.1) contains the
       English Text-To-Speech with about a dozen voices
       (English_Text-to-Speech.hqx: 5.3 MByte), Mexican Spanish
       (Mexican_Spanish_TTS.hqx: 2.8 MByte), and the English Speech
       Recognition software (English_Speech_Recognition.hqx: 2.3MByte).
     * Cost: Free
     * WWW: The latest information is available from Apple's WWW page for
       speech recognition and synthesis:
       http://www.speech.apple.com/
     * Note 1: Check out Kevin Lenzo's list of Macintosh Speech
       Applications.
     * Note 2: Joshua Baer (josh@skyweyr.com) runs a mailing list for
       Plaintalk. For subscription and other information visit the
       Plaintalk Discussion List Home page
     * Contact: Apple Computer, Inc.
       1 Infinite Loop, Cupertino, CA 95014, USA
       WWW: http://www.speech.apple.com/
       Email: PlainTalk@atg.apple.com



Text to phoneme program (1)

     * Platform: unknown
     * Description: Text to phoneme program. Based on Naval Research
       Lab's set of text to phoneme rules.
     * Availability: by anonymous ftp
       ftp://shark.cse.fau.edu/pub/src/



Text to phoneme program (2)

     * Platform: unknown
     * Description: Text to phoneme program.
     * Availability: by anonymous ftp
       ftp://ftp.doc.ic.ac.uk/packages/unix-c/utils/



Text to phoneme program (3)

     * Description: A public domain version of the same Naval Research
       Lab text to phoneme rules.
     * Availability: By anonymous ftp
       ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/synthesis/english2phon
       eme.tar.gz



Tinytalk

     * Platform: DOS / Windows???
     * Description: Shareware package is a speech 'screen reader' which
       is used by many blind users.
     * Price: Tinytalk is now $150. There are package deals on Tinytalk
       with various speech synthesizers.
     * Availability: Tinytalk is available by anonymous ftp from the
       following site

        Files: ttexe167.zip and ttdoc167.zip (executable and
                documenation)
                ftp://ftp.netcom.com/pub/eb/ebohlman/

       (Note: it is a busy ftp server.)
     * Contact: Eric Bohlman

    OMS Development
    610-B Forest Ave., Wilmette, IL 60091
    Ph: (800)831-0272 Fax: 708-251-5793
    Outside North America: (708)-251-5787
    Email: ebohlman@netcom.com



TrueTalk

     * Platform: Sun Sparcstation 1+/2/LX/5/10/20 with SunOS 4.1.3, or
       SGI Indy/Indigo/Indigo2 with IRIX 5.2. More platforms in
       development.
     * Description: Personal TrueTalk, by Entropic Research Laboratory,
       Inc., is an all-software Text-to-Speech (TTS) system designed to
       voice-enable UNIX X-Windows workstations. It combines a graphical
       interface with a powerful TTS engine based on technology developed
       by AT&T Bell Laboratories. Features include:
          + Intelligible, prosodically natural speech.
          + Text taken from file input, highlighted X selections, the
            interface scratch pad, other programs connected through a
            TCP/IP socket, or Tcl/Tk applications via the Tk "send"
            mechanism.
          + Stop, pause and resume while speech is in progress.
          + Visual indication of corresponding text position when paused.
          + Nine speaking voices, with Male and Female versions of each
            voice.
          + Adjustable speaking rate and volume.
          + Supports drop-in text filters; "email" and "lively" examples
            included.
          + Audio output through workstation headphones or speaker.
          + Complete on-line documentation, including mouse-activated
            help windows.
     * Misc: A more detailed description of TrueTalk is available on the
       Entropic WWW server: http://www.entropic.com/truetalk.com
     * Availability: You can obtain Personal TrueTalk through the
       Internet. For details, see

                ftp://ftp.entropic.com/pub/truetalk/README.ptt

       Personal TrueTalk is available free of charge for evaluation
       purposes. You can fully-enable your evaluation copy at any time by
       purchasing a license key from Entropic.
     * Requirements: 12MB disk space, 8MB process size (24MB system RAM
       recommended).
     * Cost: US$495; US$395 academic
     * Contact: Entropic Research Laboratory, Inc.,
       Washington, D.C.
       Voice: 1-800-ENTROPIC (North America), (202) 547 1420
       Fax: (202) 547-6648
       Email: truetalk@entropic.com
       WWW: http://www.entropic.com/



TruVoice from Centigram

     * Platform: Windows-NT, Windows 95, Windows 3.1 (limited release),
       Sun Solaris 2.x
     * Description: TruVoice., an advanced text-to-speech converter, is
       available for multiple environments. TruVoice converts text into
       spoken language. TruVoice adds intelligible, natural-sounding
       speech to sound enabled platforms.
          + Small, 1.5MB, memory footprint
          + Advanced text pre-processing
          + No vocabulary restrictions
          + User-definable pronunciation dictionary
          + Accurately pronounces surnames and place names
          + Preprocessor provides e-mail and spreadsheet reading
            capabilities and expands abbreviations.
          + Multiple languages available: American English, Latin
            American Spanish, German, French, Italian
          + Flexible pitch, volume and speech rate
          + Intonation support for punctuation
          + Supports navigational capabilities such as, pause, resume and
            jump forward / jump back with sentence or word boundaries
       More detailed information is provided in the brochure page on the
       Centigram WWW site.
       A demonstration of TruVoice is available on the Centigram WWW
       pages.
     * Cost:
          + Windows versions are $495 for the SDK
          + Solaris versions are $995
          + Contact Centigram for other pricing.
     * Contact: TruVoice Sales
       Centigram Communications Corporation
       91 East Tasman Drive, San Jose, CA 95134
       Ph: (408) 944-0250 Fax: (408) 428-3732
       Demo: 800-746 1632
       Email: webmaster@centigram.com
       WWW: http://www.centigram.com/



WinSpeech

     * Platform: Windows
     * Description: WinSpeech is a text-to-speech application that reads
       text and produces speech to the audio output. Features basic text
       editing tools, talk from editing window, DDE server allows other
       Windows applications to send text for talking, coach mode for
       providing audio instructions throughout the program, dictionary
       editing tools for customizing pronunciation.
       WSPLIB text-to-speech DLL is a speech functions library for
       developers. More information available by email.
     * Requirements: System requirements: IBM PC or compatible computer
       with Windows 3.1 or higher. Sound card is recommended but not
       required.
     * Availability: Freeware available through the PC WholeWare WWW
       page.
     * Contact: PC WholeWare
       33 Justin Street, Lexington, MA 02173, U.S.A.
       Email: info@pcww.com
       WWW: http://www.pcww.com/index.html



WreadFiles: File reader for Commodore Amiga

     * Platform: Commodore Amiga
     * Description: WreadFiles is a vocal text file reader program for
       use on the Commodore Amiga. The text is printed to the screen and
       spoken. Features include:
          + Text is read in sentences rather than lines.
          + Dynamic Speech Correction on over 4000 word or word
            fragments.
          + Pronunciations for many place names, personal names, foreign
            names, foreign expressions and abbreviations.
          + Run from Workbench or CLI.
          + Used with A1000 (OS 1.3), A3000 (OS 2.04-2.1), and A4000 (OS
            3.0)
     * Requirements: Standard Amiga Translator.library and
       Narrator.device required. 2.04 versions recommended. 1 Meg or more
       ram recommended. External speakers required.
     * Availability: No fee requested for non-commercial use. From:
          + GEnie: Page 555,3 File Number 24627
          + Aminet
            ftp://ftp.wustl.edu/pub/aminet/util/misc/
     * Contact: Written by Michael L. Barlow
       Email: M.Barlow1@GEnie.geis.com or mbarlow@pacific.telebyte.com or
       MikeB@cuix.pscu.com



ZMD Speech Synthesis

  "Speaky" Speech Synthesis from ZMD

     * Platform: DSP solution for platform independent speech synthesis
       implementation
     * Description: "Speaky" provides German speech synthesis system in a
       DSP solution. It includes pre-processing of input ASCII text with
       unlimited vocabulary, both parametric and non-parametric speech
       synthesis algorithms, and prosody modelling. More detailed
       information and audio samples can be found at the ZMD WWW Site.
     * Contact: Zentrum Mikroelektronik Dresden GmbH
       Grenzstrasse 28, D-01109 Dresden, Germany
       Ph: +49-351-8822-306, Fax: +49-351-8822-337
       Email: assp@zmd-gmbh.de
       WWW: http://www.zmd-gmbh.de/

  ZMD PCMCIA Speech Synthesis Card

     * Platform: MS-DOS, Windows
     * Description: Complete text-to-speech synthesis system for the
       German language with unlimited vocabulary using VOICE Processor
       "Speaky". The required pre-processing of the input ASCII text is
       performed by a software programm that is downloaded automatically
       from the PCMCIA Speech Synthesis Card during the card's
       initialising routine. Headphone or active loudspeaker can be
       connected directly for signal output. More detailed information
       and audio samples can be found at the ZMD WWW Site.
     * Requirements: PC Card slot, Card & Socket Services Software
     * Contact: Zentrum Mikroelektronik Dresden GmbH
       Grenzstrasse 28, D-01109 Dresden, Germany
       Ph: +49-351-8822-306, Fax: +49-351-8822-337
       Email: assp@zmd-gmbh.de
       WWW: http://www.zmd-gmbh.de/


___________________________________________________________________________

                             Speech Recognition

                         comp.speech FAQ Section 6

          * SpeechLinks: Speech Recognition
          * Q6.1: What is speech recognition?
          * Q6.2: How is speech recognition performed?
          * Q6.3: How can I build a simple speech recogniser?
          * Q6.4: References & books on speech recognition
          * Q6.5: Speech Recognition Hardware/Software
          * Q6.6: Speaker Recognition (Verification and Identification)
          * Q6.7: Integrated Speech Products


___________________________________________________________________________

                   Q6.1: What is speech recognition?

Automatic Speech Recognition

   Automatic speech recognition is the process by which a computer maps
   an acoustic speech signal to text.

   Automatic speech understanding is the process by which a computer maps
   an acoustic speech signal to some form of abstract meaning of the
   speech.

What does speaker dependent / adaptive / independent mean?

   A speaker dependent system is developed to operate for a single
   speaker. These systems are usually easier to develop, cheaper to buy
   and more accurate, but not as flexible as speaker adaptive or speaker
   independent systems.

   A speaker independent system is developed to operate for any speaker
   of a particular type (e.g. American English). These systems are the
   most difficult to develop, most expensive and accuracy is lower than
   speaker dependent systems. However, they are more flexible.

   A speaker adaptive system is developed to adapt its operation to the
   characteristics of new speakers. It's difficulty lies somewhere
   between speaker independent and speaker dependent systems.

What does small/medium/large/very-large vocabulary mean?

   The size of vocabulary of a speech recognition system affects the
   complexity, processing requirements and the accuracy of the system.
   Some applications only require a few words (e.g. numbers only), others
   require very large dictionaries (e.g. dictation machines). There are
   no established definitions, however, try

     * small vocabulary - tens of words
     * medium vocabulary - hundreds of words
     * large vocabulary - thousands of words
     * very-large vocabulary - tens of thousands of words.

What does continuous speech or isolated-word mean?

   An isolated-word system operates on single words at a time - requiring
   a pause between saying each word. This is the simplest form of
   recognition to perform because the end points are easier to find and
   the pronunciation of a word tends not affect others. Thus, because the
   occurrences of words are more consistent they are easier to recognise.

   A continuous speech system operates on speech in which words are
   connected together, i.e. not separated by pauses. Continuous speech is
   more difficult to handle because of a variety of effects. First, it is
   difficult to find the start and end points of words. Another problem
   is "coarticulation". The production of each phoneme is affected by the
   production of surrounding phonemes, and similarly the the start and
   end of words are affected by the preceding and following words. The
   recognition of continuous speech is also affected by the rate of
   speech (fast speech tends to be harder).


___________________________________________________________________________

               Q6.2: How is speech recognition performed?

   A wide variety of techniques are used to perform speech recognition.
   There are many types of speech recognition. There are many levels of
   speech recognition / analysis / understanding.

   Typically speech recognition starts with the digital sampling of
   speech. The next stage is acoustic signal processing. Most techniques
   include spectral analysis; e.g. LPC analysis (Linear Predictive
   Coding), MFCC (Mel Frequency Cepstral Coefficients), cochlea modelling
   and many more.

   The next stage is recognition of phonemes, groups of phonemes and
   words. This stage can be achieved by many processes such as DTW
   (Dynamic Time Warping), HMM (hidden Markov modelling), NNs (Neural
   Networks), expert systems and combinations of techniques. HMM-based
   systems are currently the most commonly used and most successful
   approach.

   Most systems utilise some knowledge of the language to aid the
   recognition process.

   Some systems try to "understand" speech. That is, they try to convert
   the words into a representation of what the speaker intended to mean
   or achieve by what they said.


___________________________________________________________________________

           Q6.3: How can I build a simple speech recogniser?

    QUICKY RECOGNIZER sketch:

   Doug Danforth provides a detailed account in article 253 in the
   comp.speech archives. A summary is provided below. It is also
   available by anonymous ftp

          ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/DIY_SpeechReco
          gnition

   This is a simple recognizer that should give you 85%+ recognition
   accuracy. The accuracy is a function of the words you have in your
   vocabulary. Long distinct words are easy. Short similar words are
   hard. You can get 98+% on the digits with this recognizer.

   Overview:

     * Find the begining and end of the utterance.
     * Filter the raw signal into frequency bands.
     * Cut the utterance into a fixed number of segments.
     * Average data for each band in each segment.
     * Store this pattern with its name.
     * Collect training set of about 3 repetitions of each pattern
       (word).
     * Recognize unknown by comparing its pattern against all patterns in
       the training set and returning the name of the pattern closest to
       the unknown.

   Many variations upon the theme can be made to improve the performance.
   Try different filtering of the raw signal and different processing
   methods.

    Public Domain Recognition Software

   Q6.5 contains information on public domain speech recognition software
   including: Lotec and Myers' Hidden Markov Model software.

    Discrete Hidden Markov Model Demonstration Software

   Hidden Markov Models (HMMs) are widely used in speech recognition
   systems. Joe Picone has put together some demonstration software for
   basic discrete HMMs including Viterbi and Baum-Welch training and
   evaluation, random sequence generation (generating data from a model),
   and model updating (useful for incremental training). There is a
   simple demo program that supports all of these modes from command line
   arguments. This allows experiments to test the classic coin-toss
   examples commonly described in textbooks. The code closely parallels
   the following textbook:

     * J.R. Deller, Jr., J.G. Proakis, and J.H.L. Hansen, Discrete-Time
       Processing of Speech Signals, MacMillan, 1993, ISBN:
       0-02-328301-7.

   The code is written in C++ and is intended to facilitate learning and
   understanding of the algorithms. The code is available on the ISIP web
   site:
   http://www.isip.msstate.edu/software/

   Lecture notes corresponding to the examples are also available:
   http://www.isip.msstate.edu/publications/1996/speech_recognition_short
   _course


___________________________________________________________________________

             Q6.4: References & books on speech recognition

     * Product Reviews and Comparisons
     * Using Speech Recognition: Health Issues
     * On the WWW
     * Technology: General and Introductory
     * Technical
     * Course Notes
     * Bibliographies and Reference Lists

  Product Reviews and Comparisons

     * "Talk Show", Wayne Rash Jr., PC Magazine (USA), Dec 20, 1994.
     * "Seybold Report on Desktop Publishing" published a nine-page,
       head-to-head comparison of Dragon's DOS software with IBM's OS/2
       software. March 7, 1994; Volume 8, Number 7; Pages 3-11;
       ISSN:0889-9762; Seybold Publications, P.O. Box 644, Media, PA
       19063 USA, phone (610) 565-2480.
     * McGraw-Hill Inc.'s "BYTE, the Magazine of Technology Integration,"
       published a two-page review of IBM's Personal Dictation System
       software. May 1994; Volume ?, Number ?; Pages 145-146;
       ISSN:0360-5280; Editorial, Executive, and Circulation address: One
       Phoenix Mill Lane, Peterborough, NH 03458 USA, phone ?

  Using Speech Recognition: Health Issues

     * The National Center for Voice and Speech provides some basic
       information on preserving "Vocal Health" on their WWW site:
       http://www.shc.uiowa.edu/hygiene/home.html
     * Voice Users Mailing List: detail in Q1.4.html of the FAQ.
     * Typing Injury FAQ: http://www.cs.princeton.edu:80/~dwallach/tifaq/
       has a range of information on Typing Injuries, avoiding them,
       alternatives and more.
     * Typing Injuries Page:
       http://alumni.caltech.edu/~dank/typing-archive.html has links to
       dozens of useful resources.
     * Voice Problems -- Prevention and Correction: advice on preserving
       your voice with specific hints for using speech recognition.
       ftp://ftp.csua.berkeley.edu/pub/typing-injury/voice-problems
     * " Talking to a PC May Be Hazard To Your Throat", by Julie Chao in
       the Wall Street Journal.
     * " Talking to Computers Has its Hazards", by Gordon Arnaut in The
       Globe and Mail

  On the WWW

     * Survey of the State of the Art in Human Language Technology:
       Report edited by Ronald A. Cole et. al. with a section on Spoken
       Input Technologies.
       http://www.cse.ogi.edu/CSLU/HLTsurvey/ch1node2.html

  Technology: General and Introductory

   Some general introduction books on speech recognition technology:

     * Fundamentals of Speech Recognition; Lawrence Rabiner & Biing-Hwang
       Juang Englewood Cliffs NJ: PTR Prentice Hall (Signal Processing
       Series), c1993, ISBN 0-13-015157-2
     * Speech recognition by machine; W.A. Ainsworth London: Peregrinus
       for the Institution of Electrical Engineers, c1988
     * Speech synthesis and recognition; J.N. Holmes Wokingham: Van
       Nostrand Reinhold, c1988
     * Speech Communication: Human and Machine, Douglas O'Shaughnessy;
       Addison Wesley series in Electrical Engineering: Digital Signal
       Processing, 1987.
     * Electronic speech recognition: techniques, technology and
       applications, edited by Geoff Bristow, London: Collins, 1986
     * Readings in Speech Recognition; edited by Alex Waibel & Kai-Fu
       Lee. San Mateo: Morgan Kaufmann, c1990

  Technical

     * Hidden Markov models for speech recognition; X.D. Huang, Y. Ariki,
       M.A. Jack. Edinburgh: Edinburgh University Press, c1990
     * Speech Recognition: The Complete Practical Reference Guide; T.
       Schalk, P. J. Foster: Telecom Library Inc, New York; ISBN
       O-9366648-39-2; 377 pages; paperback only. Covers speech
       recognition in a telephony environment and wish to use call
       processing hardware based in PCs. It is written using Dialogic
       hardware as the example for the hardware.
     * Automatic speech recognition: the development of the SPHINX
       system; by Kai-Fu Lee; Boston; London: Kluwer Academic, c1989
     * An Introduction to the Application of the Theory of Probabilistic
       Functions of a Markov Process to Automatic Speech Recognition, S.
       E. Levinson, L. R. Rabiner and M. M. Sondhi; in Bell Syst. Tech.
       Jnl. v62(4), pp1035--1074, April 1983
     * Review of Neural Networks for Speech Recognition, R. P. Lippmann;
       in Neural Computation, v1(1), pp 1-38, 1989.
     * Automatic Speech and Speaker Recognition: Advanced Topics, C.H.
       Lee, F.K. Soong and K.K. Paliwal (Eds.), Kluwer, Boston, 1996.

  Course Notes

     * Joseph Picone of the Institute for Signal and Information
       Processing (ISIP) at Mississippi State University has put the
       course notes for "Fundamentals of Speech Recognition" on the WWW.
       The course covers background probability and phonetics/acoustics,
       speech signal analysis, dynamic programming, dynamic time warping,
       hidden Markov modelling, language modelling, neural networks, etc.
       The WWW sites provides the syllabus and lecture notes.
       WWW: http://www.isip.msstate.edu/publications/1996/ee_8993/

  Bibliographies and Reference Lists

     * WWW searchable online-bibiliography for Phonetics and Speech
       Technology with more than 8000 entries. Provided by Institut fur
       Phonetik at Johann Wolfgang Goethe-Universitat Frankfurt.
       http://www.uni-frankfurt.de/~ifb/bib_engl.html
     * Computational Speech Processing: Speech Analysis, Recognition,
       Understanding, Compression, Transmission, Coding, Synthesis ; Text
       to Speech Systems, Speech to Tactile Displays, Speaker
       Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.
       Sabourin, 1994, 2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA
       inc., P.O. Box 187 Snowdon, Montreal, H3X 3T4, Canada.
       See also: http://gomer.mlink.net/infolingua.html


___________________________________________________________________________

             Q6.5: Speech Recognition Hardware and Software

   The number of speech recognition packages, and the information about
   the software is changing rapidly. Any help with keeping this
   information up to date will be appreciated.

     * Products in the FAQ
     * Speech Recognition Processors (ICs)
     * Recognition Information on the WWW
     * Speech Recognition Resellers and Value-Add

  In the FAQ:

   The following speech recognition software/hardware is described in the
   comp.speech FAQ.

   _Apple Macintosh_
          * Digital Dreams Speech Recognition Plug-Ins 
          * Dragon Dictation Products 
          * Macintosh Speech Recognition Manager 
          * PowerSecretary 

   _Windows (including 95, NT, 3.1)_
          * AT&T Watson Speech Recognition 
          * Cambridge Voice for Windows 
          * CustomVoice and CustomTelephone: A&G Graphics Interface Inc. 
          * DragonDictate for Windows 
          * Dragon Dictation Products 
          * Dragon Developer Tools 
          * Ficomp Interpreter 6000 
          * IBM VoiceType Dictation and Control 
          * IN CUBE 
          * Kurzweil Speech Recognition (2 products) 
          * Lernout & Hauspie ASR SDK 
          * Listen for Windows 2.0 from Verbex Voice Systems 
          * Microsoft Speech Recognition 
          * NCC Dictate 
          * Phonetic Engine 500 (PE500) from Speech Systems, Inc. 
          * Philips Speech Recognition (2 products) 
          * ProNotes Voice Tools 
          * PureSpeech 
          * smARTspeak from Advanced Recognition Technologies, Inc. 
          * Visual Voice from Stylus Innovation 
          * VoiceAssist for Windows from Creative Labs, Inc. 
          * VoiceServer for Windows 
          * Whisper 
          * WildCard Speech Products 

   _DOS_
          * DATAVOX - French 
          * Dragon Developer Tools 
          * Ficomp Interpreter 6000 
          * Jialong He's Speech Recognition Research Tool 
          * smARTspeak from Advanced Recognition Technologies, Inc. 
          * Votan VPC2100 Voice Card and VSP 1010 Speech Processor 

   _OS/2_
          * IBM VoiceType Dictation and Control 

   _Unix_
          * AbbotDemo 
          * BBN Hark Telephony Recognizer 
          * EARS: Single Word Recognition Package 
          * Ficomp Interpreter 6000 
          * Hidden Markov Model Toolkit (HTK) from Entropic 
          * IN CUBE 
          * Jialong He's Speech Recognition Research Tool 
          * Lotec Speech Recognition Package 
          * Myers' Hidden Markov Model software 
          * NICO Artificial Neural Network Toolkit 
          * Nuance Speech Recognition System 
          * PureSpeech 
          * recnet 

   _Integrated Circuits and Dedicated Hardware_
          * HM2007 - Speech Recognition Chip 
          * OKI VRP6679 - Speech Recognition Chip 
          * Sensory Inc. Integrated Circuits 
          * Speech Commander - Verbex Voice Systems 
          * Voice Control Systems Recognition 
          * VCS 2030 & 2060 Voice Dialer 

   _Other Platforms_
          * Simon Says (NeXT) 
          * Voice Command Line Interface (Amiga) 
          * Visus SpeechKit 

   _Unknown_
          * Berkeley Restaurant Project (BeRP) 
          * Lernout & Hauspie ASR (3 products) 
          * Voice-Trek 2.0 
          * Voicetek Corp. 
          * Voice Processing Corporation Speech Recognition Product Line 

  Speech Recognition Processors (ICs)

   Jean-Pierre Lereboullet has put together a detailed list of Voice
   Recognition Processors which covers about 15 ICs and pieces of related
   hardware (including D6106, HM2007, MSM6679, RSC-164, TC8860F/64F/65F,
   5A128).
   The document is available on the comp.speech ftp server:
   ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/VoiceRecognitionProce
   ssors

  Recognition Information on the WWW

   In addition to the entries on speech recognition in this FAQ, the
   following WWW sites provide information on speech recognition:

    Commercial Speech Recognition: Russ Wilcox of PureSpeech Inc.

          http://www.tiac.net/users/rwilcox/speech.html

    Macintosh Speech Resources and Apps
          http://www.cs.cmu.edu/~lenzo/mac_speech_apps.html

    Speech Recognition Information: 21st Century Eloquence
          http://www.voicerecognition.com/

    Applied Speech Technology Laboratory of CLSI at Stanford
          http://csli-www.stanford.edu/users/bscott/SRTech.html

    Speech Toys Speech Recognition Page
          http://www.speechtoys.com/spchtoys/sprec.html

    Speech recognition product lists: postings to comp.speech
          ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/info/SpeechRecognit
          ionProducts

    Search Alta Vista for Speech Recognition

    Search Lycos for Speech Recognition

    Yahoo pages on Speech Recognition
          http://www.yahoo.com/business/corporations/computers/software/v
          oice_recognition/ 
          http://www.yahoo.com/Science/Computer_Science/Artificial_Intell
          igence/Natural_Language_Processing/Speech_Recognition/ 

  Speech Recognition Resellers and Value-Added Services

    1stVoice
          2470 El Camino Real, Suite 110, Palo Alto CA 94306-1701
          Ph: 415-857-1320, Fax: 415-856-6996
          WWW: http://www.1stvoice.com/
          Email: mail@1stvoice.com
          Dragon Dictation Products

    21st Century Eloquence
          325-A Royal Poinciana Plaza, Palm Beach, Florida 33480, USA
          Ph: 800-245-2133, Fax: 407-835-4901
          WWW: http://www.voicerecognition.com/
          Kurzweil, IBM VoiceType, Dragon, Kolvox

    Auscript (Australia)
          Suite 2, Level 3, 60-70 Elizabeth St, Sydney, NSW 2000,
          Australia
          Ph: +61-2-238 6565, Fax: +61-2-238 6566
          WWW: http://www.auscript.com.au/
          Dragon Systems

    BRITE
          WWW: http://www.brite.com/
          Computer Telephony Integration & Interactive Voice Response

    DAX Systems, Inc.
          30 Chapin Road, Unit 1201, P.O. Box 778, Pine Brook, NJ/USA
          07058
          Ph: +1-201-227-8111, Fax: +1-201-227-8197
          Email: info@daxsystems.com
          WWW: http://www.daxsystems.com/
          Computer Telephony and Integrated Voice Response

    HealthCare Resources
          1444 Aviation Blvd, #103, Redondo Beach, CA 90278, USA
          Ph: +1-310-937-5156, Fax: +1-310-937-5159
          EMail: Scalif@AOL.COM
          Power Secretary & Dragon Dictate. Specializing in:
          Medical/Dental, Motion Picture Industry, Carpal Tunnel related
          and Disabled Persons.

    O'Brien Resources
          Ph: (540) 347-4988 (Address unknown)
          Email: obrien@crosslink.net
          WWW: http://www.crosslink.net/~obrien/
          Kurzweil Voice Recognition Products

    SCI VoiceAutomated
          215 1/2 Main Street, Huntington Beach, CA 92648, USA
          Ph: 800-597-6600, Ph: +1-714-969-7632, Fax: +1-714-969-0122
          http://www.voiceautomated.com/
          IBM VoiceType, Kurzweil Voice, DragonDictate and Philips
          speech.

    Synapse
          3095 Kerner Blvd., Suite S, San Rafael, CA 94901, USA
          Ph: (415) 455-9700, Fax: (415) 455-9801
          Email: SYNAPSE_ADAPTIVE@msn.com
          WWW: http://www.synapseadaptive.com/
          Dragon Systems, Kurzweil and IBM products.

    Talk Technology
          Ph: 1-800-270-1672, Fax: 1-516-360-1213
          Email: info@talktechnology.com
          http://www.talktechnology.com/

    Talk Technology, Inc.
          Tel: +1-718-745-9199, Fax: +1-718-499-6480
          Email: mnm@pipeline.com
          WWW: http://www.usbusiness.com/talk/
          Dragon Dictate and portable (notebook) solutions

    ToppCopy Telecom
          Email: ffalzett@toppcopy.com
          WWW: http://www.toppcopy.com/
          Philips Digital Dictation

    VoiceWare Systems
          230 California Street, Suite 410, San Francisco, CA 94111
          Ph: (415) 433-2001, Fax: (415) 433-6909
          Email: info@talk2type.com
          WWW: http://www.talk2type.com/home.htm
          IBM, Dragon Systems, Kurzweil Applied Intelligence, WildCard
          Technologies

    WorkLink
          A.D.A. Solutions by WorkLink
          2566-A Telegraph Avenue, Berkeley, California 94704 USA
          Ph: 510-848-8363, Fax:510-848-7322
          WWW: http://www.worklink.net/
          Email: wayne@worklink.net
          Dragon Dictation Products



AbbotDemo

     * Platform: SunOS4, IRIX, Linux, HU-UX
     * Description: Large vocabulary, speaker independent, continuous
       automatic speech recognition system. Uses recurrent neural
       networks and hidden Markov models with a 5,000 word vocabulary
       upgradable) and a trigram word grammar. Includes a front end for
       waveform capture and display (including spectrogram) and a
       graphical display of the phoneme representation as well as a
       rewriting display of the best guess word sequence.
     * Requirements: UN*X, X, 8 Mbyte free RAM, 486DX or faster
       processor, 16 bit soundcard, reasonable quality microphone and a
       copy of the Wall Street Journal newspaper.
     * Price: Free for non-commercial use
     * Availability: By anonymous ftp from

        ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/AbbotDemo

     * Note 1: This is not a complete system for dictation.
     * Note 2: At present there are no sources with this distribution.
       For sources for an earlier version see the recnet entry.
     * Note 3: Not supported.
     * Contact: AbbotDemo@compute.demon.co.uk
       Tony Robinson
       Cambridge University Engineering Department
       Trumpington Street, Cambridge, CB2 1PZ, UK
       Tel: +44-1223-332815 Fax: +44-1223-332662



AT&T Watson Speech Recognition

     * Platform: Windows 95/NT on a Pentium 75 Mhz or higher
     * Description: Watson is a software implementation of AT&T Bell
       Laboratories voice processing technology. Watson includes BLASR
       Speech Recognition and FlexTalk speech synthesis (see Q5.5). It
       requires no special hardware to run other than a standard sound
       card and/or phone card. Technical details for BLASR Speech
       Recognition include:
          + Compliant with Microsoft Speech API and Telephone API
          + Speaker independent, continuous speech recognition
          + Fast, run-time vocabulary change
          + Open mic and telephone line environments
          + SoundBlaster compatible sound card and drivers required
          + Subword models and whole-word digit models
          + Background, silence, and filler/garbage models
          + 50 word name vocabulary or 100 word phrase real-time
            recognition with 95% accuracy
          + Rejection of out-of-vocabulary words
          + American English only - other languages in development
          + Barge-in speech begin/end notification - requires hardware
            echo cancellation
       The AT&T Advanced Speech Products Group home page provides more
       detailed information including a Frequently Asked Questions list,
       information for application developers on the Independent Software
       Vendor (ISV) Program (including info on the SDK, licensing, and
       the training program).
     * Requirements: Uses 2 MB RAM, 10 MB Disk. Requires a Pentium 75 MHz
       or higher CPU (uses
     * Cost and Availability: WATSON is a software-based speech platform
       with a Software Developers Kit (SDK) that allows application
       developers to use voice processing in their applications. It is
       not available as a stand-alone product.
       Licensing information (inc. price) is provided in the AT&T
       Advanced Speech Products Group home page
     * See also: Watson FlexTalk speech synthesis in Q5.5, Microsoft
       Speech API, and Advanced Speech API.
     * Contact: AT&T Advanced Speech Products Group
       Suite 700, 44 East Mifflin Street, Madison, WI 53703, USA
       Ph: 1-800-5-WATSON, Fax: 1-608-259-2269
       Email: aspg@attmail.com
       WWW: http://www.att.com/aspg/



BBN Hark Telephony Recognizer

     * Platform: Available for Unix-based workstation and PC platforms
       including IBM RS6000/AIX and Pentium/SCO Unix.
     * Description: Large vocabulary (2,000+ words), speaker independent,
       continuous ASR software. Specifically designed for large scale
       telephony applications. Using a client/server architecture, all
       features and capabilities are integrated in one software product
       instead of on separate boards. Very memory efficient, the Hark
       Telephony Recognizer runs in as little as 2MB of physical memory.
       Multiple recognizers can be run on a single platform. Uses Hidden
       Markov Model and phoneme-based BBN recognition algorithms. An API
       is provided for integration with existing applications. A
       developer's toolkit is available.
     * Price and availability: Price varies depending on vocabulary size.
       Version 3.0 available immediately.
     * Misc: BBN Hark provides application design and human factors
       consulting services. Regular monthly training classes on
       developing speech-enabled applications are held at BBN Hark's
       Cambridge (Mass) headquarters.
     * WWW: For additional information see BBN Hark's home page.
     * Contact: BBN Hark Systems
       70 Fawcett Street, Cambridge, MA 02138, USA
       Tel: 617-873-4636 Fax: 617-873-2473
       WWW: http://www.bbn.com/bbn_hark/HarkHome.html



Berkeley Restaurant Project (BeRP)

     * Description: BeRP is a test bed for a speech recognition system
       being developed by the International Computer Science Institute in
       Berkeley, CA. BeRP is a medium-vocabulary, speaker-independent
       spontaneous continuous speech understanding system. BeRP functions
       as a knowledge consultant whose domain is the restaurants in the
       city of Berkeley. The system serves as a testbed for several
       research projects, including robust feature extraction,
       connectionist phonetic likelihood estimation, automatic induction
       of multiple pronunciation lexicons, foreign accent detection and
       modeling, advanced language models, and lip-reading.
     * Note: As far as I know the BeRP software is in-house software -
       that is, it is not made available for distribution.
     * More information: http://www.icsi.berkeley.edu/real/berp.html



Cambridge Voice for Windows

     * Platform: Windows
     * Description: Speaker-independent recognition of continuous speech
       in real time. Vocabularies can range from small to very large
       (more than 60,000 word forms). Support is planned for languages
       including English, Danish, Dutch, French, German, Italian,
       Norwegian, Spanish, Swedish, and Japanese. The engine complies
       with the Microsoft Speech API.
     * Contact: Cambridge Group Research, Ltd.
       Box 7290, Buffalo Grove, IL 60089
       Ph: (708) 821-1040, Fax: (708) 821-1041
       E-mail: 76061.3350@compuserve.com



CustomVoice and CustomTelephone: A&G Graphics Interface Inc.

     * Platform: Windows
     * CustomVoice: Speech recognition custom control for Visual Basic,
       Visual C++, Borland C++, and other development platforms that
       support *.VBX. Provides an engine/proprietary independent
       development platform for speech recognition. Currently supports
       ICSS, but should soon support other platforms. Includes a grammar
       debugger and parser APIs to parse spoken speech into useful data
       types.
       Requirements: 486/DX or better PC, Windows 3.1 or Windows for
       Workgroups, 8Mb RAM (minimum), SoundBlaster 16, microphone, and
       mouse. Supports Visual Basic, Visual C++, Borland C++, and Delphi.
     * CustomTelephone: Windows-based developers tool that allows
       programmers to build speech enabled "telephony" applications via
       standard custom control properties (VBX). It supports IBM
       VoiceType Application Factory (VTAF), a continuous speech, speaker
       independent speech recognizer, and supports voice response boards
       such as Dialogic. Comes with a VB custom control, pre-built
       grammar sets for common data types, an interactive grammar
       debugger to identify valid speech patterns, and parser API
       functions that convert recognized speech into data types supported
       by VB, C++ and Delphi. Includes sample applications with source
       code, and VBX, VCL and DLLs. Bundled with speech recognition
       engines.
       Requirements: 486/DX or better, Windows 3.1 or Windows for
       Workgroups, 8Mb RAM (minimum), SoundBlaster or compatible sound
       card, Dialogic D2X or D4X board, and mouse. Microphone and speaker
       optional. Supports Visual Basic, Visual C++, Borland C++, and
       Delphi.
     * Contact: A&G Graphics Interface
       51 Gore Street, Cambridge, MA 02141-1213 , USA
       Ph: +1-617-492-0120, Fax: +1-617-427-2133
       Email: customvc@world.std.com
       CompuServe: 74774,273 CompuServe ( GO SPEECH )
       WWW: http://www.customvoice.com/



DATAVOX - French

     * Platform: PC / DOS
     * Description: Continuous speech - speaker independent or dependent.
     * Requirements: 2 PC format boards (RdF1000 and TdS 96/25) and an
       A/D - D/A module (ASA116)
     * Misc: Application software may dialog with DATAVOX through 2 types
       of interfaces :
          + Keyboard overlay: The application software may be used with
            any PC compatible package. No specific adaptation is
            necessary, you only need to define your configuration with
            the application software.
          + C library: Allows a user-written program to drive the
            recognition system.
       DATAVOX is based on the AMADEUS speech recognition software
       developed at LIMSI. It provides
          + Continuous speech recognition with 500 words speaker
            dependent, 50 words speaker independent (custom-made
            vocabulary).
          + Grammar of the application language (syntax acquisition,
            verification and simplification software).
          + Large vocabulary : DATAVOX can recognize vocabularies of
            several thousand words as long as there are no more than 500
            words in the active vocabulary at any given node. It takes
            less than 1 second to change syntax and vocabulary.
          + Training controlled by the system (use of co-articulation
            models).
          + Response time less than 500 ms for any phrase length.
          + Synthetis (ADPCM) can be heard simultaneously while
            recognition is being carried out.
     * Contact: VECSYS
       Le Chene rond, 91570 Bievres, France
       Voice: 33 1 69 41 15 04, Fax: 33 1 69 41 24 30



Digital Dreams Speech Recognition Plug-Ins

     * Platform: Apple Macintosh
     * Description (General): A suite of speech plug-ins for the
       interactive multimedia market which enable developers to quickly
       incorporate speech recognition into their titles without having to
       resort to a low-level programming language, such as C. Speech
       plug-ins bridge the gap between a speech recognition API, such as
       Apple's PlainTalk Speech Recognition technology, and
       authoring/development environments, such as Macromedia Director or
       HyperCard. Digital Dreams currently offers Macintosh speech
       plug-ins for Macromedia Director and HyperCard. Support for other
       environments, including AppleScript, Apple Media Tool, Authorware,
       and Windows is being developed. Currently available for North
       American Adult English. More information is available on the
       Digital Dreams WWW site.
     * ShockTalk: is a combination of Netscape, ShockWave and Speech
       Recognition technologies for the Power Macintosh and Quadra AVs
       that enables you to navigate web sites and hyperlinks using spoken
       commands as well as create shockwave movies that respond to spoken
       user interactions.
     * Requirements: Power Macintosh (PowerPC w/ MacOS)
       Microphone (PlainTalk compatible)
       PlainTalk Speech Synthesis and PlainTalk Speech Recognition
       Netscape Navigator
     * Contact: Digital Dreams
       4308 Harbord Drive, Oakland, CA, 94618, USA
       Tel: (510) 547-6929 Fax: (510) 547-6799
       email: dreams@surftalk.com
       WWW: http://www.surftalk.com/
       FTP: ftp://ftp.surftalk.com/



DragonDictate for Windows

     * Platform: Windows
     * Description: Information moved to the page on Dragon Dictation
       products including DragonDictate for Windows



Dragon Dictation Products

     * Dragon NaturallySpeaking
     * DragonDictate for Windows
     * Dragon PowerSecretary
     * General Information

  Dragon NaturallySpeaking

     * Platform: Windows
     * Description: General purpose, continuous speech dictation system.
       Personal Edition has a 30,000 word active vocabulary and comes
       with a 200,000+ word pronunciation dictionary; users can also add
       their own words or phrases.
       More information on Dragon's NaturallySpeaking web site.
     * Requirements: 133Mhz Pentium, 32 MB RAM (Windows 95) or 48 MB RAM
       (Windows NT 4.0), supported sound card.
     * Price: see Dragon's NaturallySpeaking web site.
     * Related products: see general information below
     * Contact: see general information below

  DragonDictate for Windows

     * Platform: Windows
     * Description: Speech-to-text dictation system. Discrete dictation;
       continuous command/control; speaker-adaptive. Also provides mouse
       movement for hands-free operation of Windows. Comes with a 120,000
       word pronunciation dictionary; users can also add their own words
       or phrases. Dictate directly into any application. Available in US
       and UK English, French, Italian, German, Spanish, and Swedish.
       Add-on vocabularies for medicine, law, business and finance,
       computers and technology, journalism.
       Available as DragonDictate Singles Editions (10,000 words active),
       DragonDictate Personal Edition (10,000 words active),
       DragonDictate Classic Edition (30,000 words active), DragonDictate
       Power Edition (60,000 words active).
       Includes Office97 support.
       More information on the Dragon Systems web site.
     * Requirements: 486/66, 7-10 MB dedicated RAM (depending on
       edition), Windows 3.1x, NT 3.51, or 95.
       Supported sound boards: Creative Labs Sound Blaster 16, Microsoft
       Windows Sound System, IBM M-Audio Capture/Playback Adapter, many
       notebooks with built-in audio.
       See Dragon Systems Compatibility list for details.
     * Price: Check at the Dragon Systems web site.
     * Related products: see general information below
     * Contact: see general information below

  Dragon PowerSecretary

     * Platform: Apple Macintosh
     * Description: Speaker dependent/adaptive system requiring words to
       be separated by short pauses. Available as PowerSecretary Power
       Edition, Personal Edition, PowerSecretary MED for Healthcare
       Professionals.
       Vocabulary: 30,000 - 60,000 at any one time, automatically
       selected from 120,000-word dictionary.
     * Requirements: Power Macintosh 6100, 7100, 8100, Performa 6100
       series, Powerbook 540, 68040 class Macintosh such as Quadra 660AV,
       700, 800, 840AV, 900, 950, Centris 650 and 660AV.
       Hard Disk with at least 25Mb free.
       System 7.5 or greater
       (Some systems require add-on hardware)
     * More information: PowerSecretary home page
     * Related products: see general information below
     * Contact: see general information below

  General Information

    Dragon Dictation Products

     * Dragon NaturallySpeaking
     * DragonDictate for Windows
     * Dragon PowerSecretary
     * General Information

    Dragon Developer Products

     * Dragon PhoneQuery
     * DragonXTools
     * Dragon SpeechTool
     * Dragon VoiceTools

    Related Web Sites

     * Simon Crosby's FAQ for DragonDictate

    Contact:

     * Dragon Systems, Inc.
       320 Nevada Street, Newton, MA 02160, USA
       Tel: 1-617-965-5200 or 1-800-TALK-TYP
       Fax: 1-617-527-0372
       Email: info@dragonsys.com
       WWW: http://www.dragonsys.com/
       CompuServe: GO DRAGON



Dragon Developer Tools

     * Dragon PhoneQuery
     * DragonXTools
     * Dragon SpeechTool
     * Dragon VoiceTools

  Dragon PhoneQuery

     * Platform: Windows NT
     * Description: Software for building voice response systems. Callers
       are able to do the following: Ask for information using completely
       natural and continuous language. Have a spoken dialog to fine tune
       a request. Request information to be faxed, sent by electronic
       mail, or read over the phone, using text-to-speech.
       More information on the Dragon Systems telephony pages.
     * Requirements: Pentium or Pentium Pro PC running Windows NT 4.0.
       Telephone interconnect requirements vary by application.
     * Related products: see general information below
     * Contact: see general information below

  DragonXTools

     * Platform: Windows
     * Description: VBX and OCX controls that allow an application to
       control DragonDictate's capabilities, ranging from small
       vocabulary command and control to customized large vocabulary
       dictation. More information is available on the Dragon Developer
       pages
     * Related products: see general information below
     * Contact: see general information below

  Dragon SpeechTool

     * Platform: Windows
     * Description: Create small, optimized vocabularies for your
       speech-enabled applications, or supplement DragonDictate's
       extensive built-in vocabularies with specialized terms and names.
       More information is available on the Dragon Developer pages
     * Related products: see general information below
     * Contact: see general information below

  Dragon VoiceTools

     * Platform: Windows, DOS
     * Description: integrate small-vocabulary speech recognition
       directly into your DOS and Windows 3.1x applications. More
       information is available on the Dragon Developer pages
     * Related products: see general information below
     * Contact: see general information below

  General Information

    Dragon Dictation Products

     * Dragon NaturallySpeaking
     * DragonDictate for Windows
     * Dragon PowerSecretary
     * General Information

    Dragon Developer Products

     * Dragon PhoneQuery
     * DragonXTools
     * Dragon SpeechTool
     * Dragon VoiceTools

    Related Web Sites

     * Simon Crosby's FAQ for DragonDictate

    Contact:

     * Dragon Systems, Inc.
       320 Nevada Street, Newton, MA 02160, USA
       Tel: 1-617-965-5200 or 1-800-TALK-TYP
       Fax: 1-617-527-0372
       Email: info@dragonsys.com
       WWW: http://www.dragonsys.com/
       CompuServe: GO DRAGON



EARS: Single Word Recognition Package

     * Platform: Linux and Unixs with the Voxware sound driver
     * Description: Intended as a limited ready-to-use single word
       recognizer. However, its design aims at being a platform for
       various kinds of methods used in speech recognition (SR). EARS is
       designed to be a flexible environment for recognition system
       components; for example, take this feature extractor and that
       recognizing method, and this list of words. New methods for single
       word recognition can be integrated easily, as EARS uses C++
       abstract base classes. You speak the words you want to be
       recognized later. Your utterances can be saved to RIFF WAV files
       for inspection, change or delete them before they are further
       processed to the pattern files on which the recognizer is finally
       trained. As of version 0.20, the feature extractors are:
       Rasta-PLP, PLP, LPC, Mel-Cepstrum. The implemented recognizers
       are: DTW and non-recurrent neural nets on fixed-size sound
       patterns.
     * Requirements: Soundcard with mic
     * Misc 1: The current version is an Alpha release.
     * Misc 2: For more information subscribe to the EARS mailing list.
       Send email to majordomo@phil.uni-sb.de with "subscribe ears-list"
       in the body.
     * Misc 3: Niels Thorwirth (thorwir@pi4.informatik.uni-mannheim.de)
       has made changes to Version 0.14 which support the AF audio server
       software (see Q1.11) and the OGI Speech Tools (see Q1.9) so that
       EARS is more portable to other UNIX platforms. Available by email
       to Niels.
     * Requirements: Soundcard with mic
     * Availability: Source and Linux binaries are available by anonymous
       ftp
       ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/recognition/ears-0.26.
       tar.gz
       ftp://sunsite.unc.edu/pub/Linux/apps/sound/speech/ears-0.26.tar.gz
     * Contact: Ralf W. Stephan: ralf@ark.franken.de



Ficomp Interpreter 6000

     * Platform: DOS, Windows 3.1, Win95, Win NT, UNIX
     * Description: Ficomp Systems, inc., is a systems integrator that
       has developed commercial speaker-dependent, continuous-speech
       recognition applications for use in high noise environments on
       several platforms. Applications are specialized in the finance
       industry for exchange floors, banks and brokerage firms.
     * Contact: Ficomp Systems, Inc.
       Ph: (732) 274-2600, Fax: (732) 274-2601
       117 Docks Corner Road, Dayton, NJ 08810
       E-Mail: fsisales1@aol.com
       WWW: http://www.ficompsystems.com/



HM2007 - Speech Recognition Chip

     * Platform: Intergrated circuit.
     * Description: HM2007 is a 48-pin single chip CMOS voice recognition
       LSI circuit with on-