Search the FAQ Archives

3 - A - B - C - D - E - F - G - H - I - J - K - L - M
N - O - P - Q - R - S - T - U - V - W - X - Y - Z
faqs.org - Internet FAQ Archives

comp.speech Frequently Asked Questions - part 1/3

( Part1 - Part2 - Part3 )
[ Usenet FAQs | Web FAQs | Documents | RFC Index | Airports ]
Archive-name: comp-speech-faq/part1
Last-modified: 1998/07/06
URL: http://www.speech.su.oz.au/comp.speech/

See reader questions & answers on this topic! - Help others by sharing your knowledge
                   COMP.SPEECH FAQ POSTING - PART 1/3


[Note: this document has been automatically extracted from a WWW site:
        http://www.speech.su.oz.au/comp.speech/
This may introduce some formatting errors.]


                    Comp.Speech Frequently Asked Questions

   The Frequently Asked Questions (FAQ) is a regular posting to
   comp.speech which attempts to answer some of the regular questions in
   the comp.speech newsgroup. It covers speech synthesis, speech
   recognition, speech coding and a range of related material. It
   contains lists of speech technology software and hardware, including
   commerical products, public domain and freeware software, plus it
   contains over 500 links to speech technology sites and software.

   The FAQ is not meant to discuss any topic exhaustively. It will
   hopefully provide readers with pointers on where to find useful
   information, especially material available on the Internet.

   If you have not already read the Usenet introductory material posted
   to news.announce.newusers, please do. For help with FTP (file transfer
   protocol) look for a regular posting of anonymous FTP FAQ in
   comp.misc, comp.archives.admin or news.answers.

   This FAQ is posted every 4 weeks to comp.speech, comp.answers and
   news.answers.

   It is also available on the World Wide Web:

     * Australia: http://www.speech.su.oz.au/comp.speech/
     * Britain: http://svr-www.eng.cam.ac.uk/comp.speech/
     * Japan: http://www.itl.atr.co.jp/comp.speech/
     * USA: http://www.speech.cs.cmu.edu/comp.speech/

   Or by anonymous ftp from the comp.speech archive site:

     * ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/FAQ-complete

   Or from the news.answers ftp site (and its mirrors):

     * ftp://rtfm.mit.edu/pub/usenet/comp.speech/*

   Or by sending email to mail-server@rtfm.mit.edu with the following
   line in the body of the message:

     * send usenet/news.answers/comp-speech-faq/*

   If you only have email access to the internet, then I suggest you
   obtain the Internet-by-email guide. Send email to
   mail-server@rtfm.mit.edu with the following line in the body of the
   message:

     * send usenet/news.answers/internet-services/access-via-email

                                   Admin

   Minor changes each month. Thanks to all the companies and individuals
   who send in information.

                              Acknowledgements

   Hundreds of people and companies have made contributions to the
   comp.speech FAQ over the last few years - too many to name
   individually. Special thanks go to Tony Robinson and Kevin Lenzo who
   have provided a wide range of information and assistance. Tony
   Robinson also maintains the comp.speech ftp site which is an excellent
   resource for all people working with speech technology. I am grateful
   to the people at Sydney University, Cambridge University, ATR ITL and
   CMU for supporting the FAQ on their WWW sites.

                                 Disclaimer

   The comp.speech FAQ and WWW pages are provided as is without any
   express or implied warranties. While every effort has been taken to
   ensure the accuracy of the information presented here, the author
   assumes no responsibility for errors or omissions, or for damages
   resulting from the use of the information contained herein.
   The comp.speech FAQ and WWW pages should not be construed as
   representing the views or products of my employer, Sun Microsystems,
   Inc.

                         Copyright and Reproduction

   Copyright (c) 1994-6 by Andrew Hunt, all rights reserved.
   The comp.speech FAQ posting may not be distributed for financial gain.

   The comp.speech FAQ posting may not be included in any collections or
   compilations without express permission from the author.
   The comp.speech FAQ posting may be posted to any USENET newsgroup,
   on-line service, or BBS as long as it is posted in its entirety with
   this copyright statement, and that a current version is always
   maintained.
   [Note: hyperlinks to the comp.speech WWW pages are encouraged.]

Maintainer

   The FAQ posting and the Comp.Speech WWW Site are maintained on a
   volunteer basis by

    Andrew Hunt
    Speech Applications Group, Sun Microsystems Laboratories
    Two Elizabeth Drive, Chelmsford, MA, 01824-4195, USA
    Ph: (508) 442 2681 Fax: (508) 250 5067
    andrew.hunt@east.sun.com


___________________________________________________________________________

                              comp.speech FAQ

                             Table of Contents

  + SpeechLinks: Speech Technology Hyperlinks Pages

          * SpeechLinks: 500+ Speech Technology Links 
          * SpeechLinks: General Speech Technology Links 
          * SpeechLinks: Signal Processing for Speech 
          * SpeechLinks: Speech Coding 
          * SpeechLinks: Speech Synthesis 
          * SpeechLinks: Speech Recognition 

  + List Of Software/Hardware

  + Update Times 

  + Availability

  + Odds 'n Ends

  + FAQ Section 1: General Information on Speech Technology

          * SpeechLinks: General
          * Q1.1: What is comp.speech?
          * Q1.2: comp.speech ftp site
          * Q1.3: Common abbreviations and jargon
          * Q1.4: Related newsgroups and mailing lists
          * Q1.5: Associations, publications and conferences
          * Q1.6: Handicap Aids
          * Q1.7: Speech Databases
          * Q1.8: Speech File Formats and Conversion
          * Q1.9: Speech Laboratory Environments and Audio Editors
          * Q1.10: Speech Research Sites
          * Q1.11: Miscellaneous Software and Resources

  + FAQ Section 2: Signal Processing

          * SpeechLinks: Signal Processing for Speech
          * Q2.1: What sampling do I need for speech?
          * Q2.2: Finding the pitch of a speech signal
          * Q2.3: How do I find the start and end points of a speech
          signal?
          * Q2.4: Where can I find FFT software?
          * Q2.5: Signal processing in speech technology
          * Q2.6: Speech sampling and signal processing hardware
          * Q2.7: How do I convert to/from mu-law format?
          * Q2.8: Signal Processing Software

  + FAQ Section 3: Speech Coding and Compression

          * SpeechLinks: Speech Coding
          * Q3.1: Speech compression techniques
          * Q3.2: Information on speech coding and compression
          * Q3.3: Speech Compression / Coding Software

  + FAQ Section 4: Natural Language Processing

          * Q4.1: NLP References and Books
          * Q4.2: NLP Software

  + FAQ Section 5: Speech Synthesis

          * SpeechLinks: Speech Synthesis
          * Q5.1: What is speech synthesis?
          * Q5.2: How can speech synthesis be performed?
          * Q5.3: References/Books on Synthesis
          * Q5.4: Speech Synthesis on the WWW
          * Q5.5: Speech Synthesis Software/Hardware

  + FAQ Section 6: Speech Recognition

          * SpeechLinks: Speech Recognition
          * Q6.1: What is speech recognition?
          * Q6.2: How is speech recognition performed?
          * Q6.3: How can I build a simple speech recogniser?
          * Q6.4: References & books on speech recognition
          * Q6.5: Speech Recognition Hardware/Software
          * Q6.6: Speaker Recognition (Verification and Identification)
          * Q6.7: Integrated Speech Products


___________________________________________________________________________

                   List of Software/Hardware/Information

   The comp.speech FAQ provides information on a range of software,
   hardware and resources.

Q1.6: Handicap Aids

          * Man-Machine Interfacing
          * SpeechViewer II

Q1.7: Speech Data

          * Bavarian Archive for Speech Signals
          * BUPT Spoken Digit Database (Chinese)
          * Center for Spoken Language Understanding (CSLU)
          * Examples of IPA Symbols
          * Linguistic Data Consortium (LDC)
          * NOISEX
          * Oxford Acoustic Phonetic Database
          * Phonemic Samples
          * RELATOR project
          * ShATR
          * University of Victoria Phonetic Database

Q1.9: Speech Processing Environments

          * CSRE: Computerized Speech Research Environment
          * DADiSP from DSP Development Corporation
          * Entropic Signal Processing System (ESPS) and Waves
          * GoldWave
          * Kay Elemetrics Computer Speech Lab
          * Khoros
          * Matlab plus Signal Processing Toolbox
          * MacSpeech Lab II
          * N!Power
          * OGI Speech Tools
          * Ptolemy
          * Quadravox Speech Processing Products - Qbox
          * Speech Filing System (SFS)
          * Signalyze 3.0 from InfoSignal
          * SoundScope

Q1.11: Miscelaneous Software and Resources

  Speech Application Interfaces

          * ASAPI: Advanced Speech API (AT&T)
          * SAPI: Microsoft Windows Speech API
          * SRAPI: Speech Recognition API
          * TAPI: Microsoft Windows Telephony API

  Network "Phone" Software

          * CUSeeMe
          * CyberPhone
          * DigiPhone
          * InterFACE from Hijinx
          * FAQ: How can I use the Internet as a telephone?
          * Nautilus: Secure Computer Telephony
          * NEVOT (1.4v) from AT&T BL
          * PGPfone
          * Speak Freely
          * Internet Phone from VocalTec
          * WebPhone
          * WebTalk

  Audio Processing Software

          * AF version AF3R1
          * Voice E-Mail from Bonzi Software
          * MicNotePad Recording Software for Macs
          * MixViews
          * Network Audio System Release 1.1
          * NIST Software - SPHERE and SCORE
          * Sound Processing Kit
          * TCPplay

  Human Audio Perception

          * Auditory Modeller 1
          * Auditory Modeller 2
          * Auditory Toolbox for Matlab
          * Human Audio Perception Document

  Dictionaries and other Lexical Tools

          * BEEP dictionary
          * CMU dictionary
          * CUVOLAD dictionary (Oxford Dictionary)
          * Comprehensive Word List
          * EAT: Edinburgh Associative Thesaurus
          * Homophone List
          * Moby Lexical Resources
          * MRC Psycholinguistic Database
          * WordNet
          * Dictionaries on the WWW

  Phonetic Fonts and Phonetic Samples

          * International Phonetic Alphabet
          * WWW: Phonetic Fonts and Examples Online
          * Summer Institute of Linguistics IPA Fonts
          * Phonetic Fonts for TeX and LaTeX
          * Yamada Language Center

  Very Miscellaneous Software

          * The vOICe
          * The Learning Company's Language Training
          * Wildfire - an Electronic Assistant

Q2.6: Audio Hardware

          * Macintosh Audio Hardware
          * PC Audio Hardware
          * Unix Audio Hardware

Q2.8: Signal Processing Software

          * SigLib from Numerix Ltd.

Q3.3: Compression Software and Hardware

          * 32 kbps ADPCM
          * Castleton Network Systems - G.729 Voice Coder
          * CELP 3.2a & LPC-10
          * 8 Kbit/s CELP on the TMS320C5x family of DSP chips
          * CyberVoice
          * Rockwell's DigiTalk
          * File format conversion
          * G.711/721/723 Compression
          * G.728 LD-CELP vocoder
          * G.728 Compression
          * GSM 06.10 Compression
          * Lernout & Hauspie Speech Coding (5 products)
          * Lernout & Hauspie Speech Coding SDK
          * MPEG Audio
          * shorten - a lossless compressor for speech signals
          * Sipro Lab Telecom Inc. Coding
          * Sonarc: Digital Audio Compression
          * StarAudio Compressor/Player
          * TrueSpeech from DSP Group
          * U.S.F.S. 1016 CELP vocoder for DSP56001
          * ToolVox from Voxware

Q4.2: Natural Language Processing

     * Natural Language Software Registry (NLSR) - NLP Tools
     * Part of Speech Tagger

Q5.5: Speech Synthesis

   _Apple Macintosh_
          * BeSTspeech from Berkeley Speech Technologies, Inc., (BST) 
          * Infovox Product Range 
          * Macintosh Speech Output Applications 
          * Macintosh Speech Synthesis Manager 
          * MacYack Pro 
          * MBROLA: Free Speech Synthesis Project 
          * ProVoice Developer's Speech Toolkit from First Byte 
          * SENSYN speech synthesizer 
          * Sound Bytes DeveloperUs Kit 
          * Macintosh Speech Synthesis Manager 

   _Windows (including 95, NT, 3.1)_
          * AcuVoice 
          * AT&T Watson Speech Synthesis 
          * BeSTspeech from Berkeley Speech Technologies, Inc., (BST) 
          * Creative TextAssist and TextAssist API 
          * DECtalk: Text-to-Speech from Digital 
          * ETI-Eloquence 
          * HADIFIX 
          * Infovox Product Range 
          * IPOX: All Prosodic Speech Synthesis Architecture 
          * Lernout and Hauspie Text-To-Speech Windows SDK 
          * Listen2 Text Reader 
          * MBROLA: Free Speech Synthesis Project 
          * Monologue for Windows from First Byte 
          * PAM - A Text-To-Speech Application 
          * ProVerbe Speech Engine from ELAN Informatique 
          * ProVoice Developer's Speech Toolkit from First Byte 
          * SENSYN speech synthesizer 
          * Sound Bytes DeveloperUs Kit 
          * Tinytalk 
          * TruVoice from Centigram 
          * WinSpeech 
          * ZMD Speech Synthesis 

   _DOS_
          * CSRE: Computerized Speech Research Environment 
          * Infovox Product Range 
          * MBROLA: Free Speech Synthesis Project 
          * ProVoice Developer's Speech Toolkit from First Byte 
          * SENSYN speech synthesizer 
          * spchsyn.exe 
          * Tinytalk 
          * ZMD Speech Synthesis 

   _OS/2_
          * ProVerbe Speech Engine from ELAN Informatique 
          * ProVoice Developer's Speech Toolkit from First Byte 
          * Sound Bytes DeveloperUs Kit 

   _Unix_
          * AcuVoice 
          * AsTeR 
          * BeSTspeech from Berkeley Speech Technologies, Inc., (BST) 
          * DECtalk: Text-to-Speech from Digital 
          * ETI-Eloquence 
          * Emacspeak - A Speech Output Subsystem For Emacs 
          * Festival Speech Synthesis System 
          * JSRU 
          * Klatt-style synthesiser 
          * KPE80 - A Klatt Synthesiser and Parameter Editor 
          * "learph": Trainable text-to-phoneme software by Antonio Lucca

          * Lucent Technologies Bell Labs Text-to-Speech system 
          * MBROLA: Free Speech Synthesis Project 
          * Orator from Bellcore 
          * ProVerbe Speech Engine from ELAN Informatique 
          * rsynth 
          * SENSYN speech synthesizer 
          * SGI Developers Toolbox Synthesiser 
          * Speak 
          * TrueTalk 
          * TruVoice from Centigram 

   _Integrated Circuits and Dedicated Hardware_
          * Eurovocs 
          * Infovox Product Range 
          * ProVerbe Speech Engine from ELAN Informatique 
          * RC Systems V8600/V8601 Text to Speech synthesizers 

   _Other Platforms_
          * BeSTspeech from Berkeley Speech Technologies, Inc., (BST) 
          * TheBigMouth (NeXT) 
          * MBROLA: Free Speech Synthesis Project 
          * Narrator Translator Library (Amiga) 
          * Narrator (Amiga) 
          * TextToSpeech Kit (NeXT) 
          * Orator from Bellcore 
          * SENSYN speech synthesizer 
          * WreadFiles: File reader for Commodore Amiga 

   _Unknown_
          * Lernout and Hauspie Text-To-Speech (3 products) 
          * SIMTEL 
          * Text to Phoneme Program 1 
          * Text to phoneme program 2 
          * Text to phoneme program 3 

Q6.5: Speech Recognition

   _Apple Macintosh_
          * Digital Dreams Speech Recognition Plug-Ins 
          * Dragon Dictation Products 
          * Macintosh Speech Recognition Manager 
          * PowerSecretary 

   _Windows (including 95, NT, 3.1)_
          * AT&T Watson Speech Recognition 
          * Cambridge Voice for Windows 
          * CustomVoice and CustomTelephone: A&G Graphics Interface Inc. 
          * DragonDictate for Windows 
          * Dragon Dictation Products 
          * Dragon Developer Tools 
          * Ficomp Interpreter 6000 
          * IBM VoiceType Dictation and Control 
          * IN CUBE 
          * Kurzweil Speech Recognition (2 products) 
          * Lernout & Hauspie ASR SDK 
          * Listen for Windows 2.0 from Verbex Voice Systems 
          * Microsoft Speech Recognition 
          * NCC Dictate 
          * Phonetic Engine 500 (PE500) from Speech Systems, Inc. 
          * Philips Speech Recognition (2 products) 
          * ProNotes Voice Tools 
          * PureSpeech 
          * smARTspeak from Advanced Recognition Technologies, Inc. 
          * Visual Voice from Stylus Innovation 
          * VoiceAssist for Windows from Creative Labs, Inc. 
          * VoiceServer for Windows 
          * Whisper 
          * WildCard Speech Products 

   _DOS_
          * DATAVOX - French 
          * Dragon Developer Tools 
          * Ficomp Interpreter 6000 
          * Jialong He's Speech Recognition Research Tool 
          * smARTspeak from Advanced Recognition Technologies, Inc. 
          * Votan VPC2100 Voice Card and VSP 1010 Speech Processor 

   _OS/2_
          * IBM VoiceType Dictation and Control 

   _Unix_
          * AbbotDemo 
          * BBN Hark Telephony Recognizer 
          * EARS: Single Word Recognition Package 
          * Ficomp Interpreter 6000 
          * Hidden Markov Model Toolkit (HTK) from Entropic 
          * IN CUBE 
          * Jialong He's Speech Recognition Research Tool 
          * Lotec Speech Recognition Package 
          * Myers' Hidden Markov Model software 
          * NICO Artificial Neural Network Toolkit 
          * Nuance Speech Recognition System 
          * PureSpeech 
          * recnet 

   _Integrated Circuits and Dedicated Hardware_
          * HM2007 - Speech Recognition Chip 
          * OKI VRP6679 - Speech Recognition Chip 
          * Sensory Inc. Integrated Circuits 
          * Speech Commander - Verbex Voice Systems 
          * Voice Control Systems Recognition 
          * VCS 2030 & 2060 Voice Dialer 

   _Other Platforms_
          * Simon Says (NeXT) 
          * Voice Command Line Interface (Amiga) 
          * Visus SpeechKit 

   _Unknown_
          * Berkeley Restaurant Project (BeRP) 
          * Lernout & Hauspie ASR (3 products) 
          * Voice-Trek 2.0 
          * Voicetek Corp. 
          * Voice Processing Corporation Speech Recognition Product Line 

Q6.6: Speaker Verification and Identification

          * ImagineNation: Voice Activated UnLock Technology 
          * Jialong He's Speaker Recognition (Identification) Tool
          * Keyware Biometric Security Products
          * SpeakerKey Voice Verifier from ITT
          * SpeakEZ Voice Print Speaker Verification
          * Voice Control Systems: Speaker Verification Technology

Q6.7: Integrated Speech Products

          * SpeechWorksfrom Applied Language Technologies, Inc.
          * Nortel Speech Technology Products


___________________________________________________________________________

                         General Speech Technology

                         comp.speech FAQ Section 1

          * SpeechLinks: General
          * Q1.1: What is comp.speech?
          * Q1.2: comp.speech ftp site
          * Q1.3: Common abbreviations and jargon
          * Q1.4: Related newsgroups and mailing lists
          * Q1.5: Associations, publications and conferences
          * Q1.6: Handicap Aids
          * Q1.7: Speech Databases
          * Q1.8: Speech File Formats and Conversion
          * Q1.9: Speech Laboratory Environments and Audio Editors
          * Q1.10: Speech Research Sites
          * Q1.11: Miscellaneous Software and Resources



                          Q1.1: What is comp.speech?

   Comp.speech is an unmoderated newsgroup for discussion of speech
   technology and speech science. It covers a wide range of issues from
   the application of speech technology, to research, to products and
   lots more. By its nature, speech technology is an inter-disciplinary
   field and the newsgroup reflects this. However, computer application
   is the basic theme of the group.

   Note: If you don't know what a newsgroup is, then talk to your local
   system administration about how to get access. A useful newsgroups for
   beginners is news.announce.newusers. You might also find the following
   documents useful.

          ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/What_is_Us
          enet?

          ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Answers_to
          _Frequently_Asked_Questions_about_Usenet

          ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/Rules_for_
          posting_to_Usenet

          ftp://rtfm.mit.edu/pub/usenet/news.announce.newusers/FAQs_about
          _FAQs

   The following is a list of some of the topics covered by comp.speech.

     * Speech Recognition - discussion of methodologies, training,
       techniques, results and applications. This should cover the
       application of techniques including HMMs, neural-nets and so on to
       the field.
     * Speech Synthesis - discussion concerning theoretical and practical
       issues associated with the design of speech synthesis systems.
     * Speech Coding and Compression - both research and application
       matters.
     * Phonetic/Linguistic Issues - coverage of linguistic and phonetic
       issues which are relevant to speech technology applications. Could
       cover parsing, natural language processing, phonology and prosodic
       work.
     * Speech System Design - issues relating to the application of
       speech technology to real-world problems. Includes the design of
       user interfaces, the building of real-time systems and so on.
     * Other matters - relevant conferences, jobs, books, software,
       hardware, and products.


___________________________________________________________________________

                       Q1.2: comp.speech ftp site

   Tony Robinson maintains the comp.speech ftp site. The ftp site is a
   comprehensive repository of software and information related to speech
   technology. The site is

     * ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/

  Comp.speech Archives

   The comp.speech ftp site provides full archives of the comp.speech
   newsgroup dating back to the creation of the group in 1991. The
   postings are stored in the order in which they arrive. Batches of 1000
   articles are grouped into gzip'ed tar file. Matching files listing the
   subjects are also provided.

     * ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/archive/

  Software and Other Resources

   The comp.speech ftp site includes a wide range of useful software and
   resources. Tony has arranged it into a series of sub-directories:

   /analysis : Speech analysis software
          FFT code, a pitch tracker, RASTA code, and IEEE DSP code.

   /auditory : Auditory model software
          AIM, Auditory Toolbox and Lutear.

   /coding : Speech coding software
          ADPCM, CELP 3.2a, G711, G721, G723, GSM, LDCELP, LPC10,
          Shorten.

   /data : Repository for (small) speech-related databases
          BEEP, CMUDict, Homophone list, hVd database, Peterson Barney
          database

   /dictionaries : Phonetic dictionaries
          BEEP, CMUDict, CUVOALD, Homophone list, MRC database

   /info : Key postings to comp.speech archives by subject
          Lots of interesting info!

   /recognition : Speech recognition software
          AbbotDemo, Ears, Lotec, recnet, sound blaster recognition,
          whistle

   /simtel_sound : Mirror of the simtel/msdos/sound directory
          Range of useful software

   /simtel_voice : Mirror of the simtel/msdos/voice directory
          Another range of useful software

   /synthesis : Speech synthesis software
          Klatt synthesis software, Klatt parameter editor and rsynth.

   /tools : Miscelaneous tools
          Part-of-speech tagger, OGI speech tools, sox audio file format
          conversion, SPHERE software and more.


___________________________________________________________________________

                 Q1.3: Common abbreviations and jargon.

     * ANN - Artificial Neural Network.
     * ASR - Automatic Speech Recognition.
     * ASSP - Acoustics Speech and Signal Processing
     * AVIOS - American Voice I/O Society
     * CELP - Code-book Excited Linear Prediction.
     * COLING - COmputational LINGuistics
     * DTW - Dynamic Time Warping.
     * FAQ - Frequently Asked Questions.
     * HMM - Hidden Markov Model.
     * IEEE - Institute of Electrical and Electronics Engineers
     * JASA - Journal of the Acoustic Society of America
     * LPC - Linear Predictive Coding.
     * LVQ - Learned Vector Quantisation.
     * MFCC - Mel Frequency Cepstral Coefficients
     * NLP - Natural Language Processing.
     * NN - Neural Network.
     * TIMIT - A speech corpus with phoneme labels - see Q1.7
     * TTS - Text-To-Speech (i.e. speech synthesis).
     * VQ - Vector Quantisation.


___________________________________________________________________________

              Q1.4: Related newsgroups and mailing lists.

Newsgroups

   comp.ai - Artificial Intelligence newsgroup.
          Postings on general AI issues, language processing and AI
          techniques. The comp.ai FAQ covers NLP, NN and other AI
          information.

   comp.ai.nat-lang - Natural Language Processing Group
          Postings regarding Natural Language Processing. Set up to cover
          a broard range of related issues and different viewpoints. A
          comp.ai.nat-lang FAQ posting is available.

   comp.ai.nlang-know-rep - Natural Language Knowledge Representation
          Moderated group.

   comp.ai.neural-nets - discussion of Neural Networks and related
          issues.
          There are often posting on speech related matters - phonetic
          recognition, connectionist grammars and so on. A
          comp.ai.neural-nets FAQ posting is available.

   comp.compression - occasional articles on compression of speech.
          The comp.compression FAQ has some info on audio compression
          standards.

   comp.dcom.telecom - Telecommunications newsgroup.
          Has occasional articles on voice products.

   comp.dsp - discussion of signal processing - hardware and algorithms
          and more.
          Has a good FAQ posting which is also available on the WWW and
          by ftp (addresses below). Has a regular posting of a
          comprehensive list of Audio File Formats.

          + http://www.bdti.com/faq/dsp_faq.htm
          + ftp://rtfm.mit.edu/pub/usenet/comp.dsp/

   comp.multimedia - Multi-Media discussion group.
          Has occasional articles on voice I/O.

   sci.lang - Language.
          Discussion about phonetics, phonology, grammar, etymology and
          lots more. A sci.lang FAQ is available.

   alt.sci.physics.acoustics
          Some discussion of speech production & perception.

   alt.binaries.sounds.* - posting and discussion of sound samples.

Mailing Lists

   Voice-Users Mailing List
          For discussion of any aspect of using voice recognition
          systems.

          + Using such systems safely, without muscle or voice strain
          + Techniques for improving recognition accuracy
          + How to set up the physical voice workstation
          + Tips for effective use of voice interfaces
          + Configuration of specific systems, troubleshooting, etc

          To subscribe fill out the web-based subscription form
          Posts to the list should go to:
          voice-users@voicerecognition.com

   Colibri
          News about language, speech, logic and information.
          Email: colibri@let.ruu.nl
          WWW: http://colibri.let.ruu.nl/

   ECTL - Electronic Communal Temporal Lobe
          Founder & Moderator: David Leip. Moderated mailing list for
          researchers with interests in computer speech interfaces. This
          list serves a broad community including persons from signal
          processing, AI, linguistics and human factors. To subscribe,
          send your name, institute, department, daytime phone and email
          address to:

          + ectl-request@snowhite.cis.uoguelph.ca

          The ECTL archive site is
          ftp://snowhite.cis.uoguelph.ca/pub/ectl

   Prosody Mailing List
          Unmoderated mailing list for discussion of prosody. The aim is
          to facilitate the spread of information relating to the
          research of prosody by creating a network of researchers in the
          field. If you want to participate, send the following one-line
          message to

          + listserv@msu.edu
          + subscribe prosody Your Name

   foNETiks
          A moderated monthly newsletter distributed by e-mail. It
          carries job advertisements, notices of conferences, and other
          news of general interest to phoneticians, speech scientists and
          others. The editors are Linda Shockey and Gerry Docherty. To
          subscribe send the following 1 line message to

          + mailbase@mailbase.ac.uk
          + join fonetiks your_first_name your_second_name

   Digital Mobile Radio
          Covers lots of areas include some speech topics including
          speech coding and speech compression. Mail Peter Decker
          dec@dfv.rwth-aachen.de to subscribe.


___________________________________________________________________________

              Q1.5: Associations, Journals and Conferences

   [Note: Also see the list provided in Shikano's WWW site on Speech and
   Acoustics:
   http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-resource/e
   -www-site.html.]

Associations

    Institute of Electrical and Electronics Engineers (IEEE)

     * Publications: include IEEE Transactions on Signal Processing, IEEE
       Transactions on Speech and Audio (from Jan 93), IEEE Transactions
       on Acoustics, Speech, and Signal Processing (now obsolete), IEEE
       Signal Processing Magazine. (More information on the WWW:
       http://www.ieee.org/sp/index.html).
     * Speech-Related Conferences: ICASSP - Intl. Conf. Acoustics,
       Speech, and Signal Processing. IEEE also runs speech technology
       related workshops and many other conferences. (Does anyone have a
       list?)
     * Contact: IEEE Service Center
       445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855, USA
       Phone: 1-800-678-IEEE or (201) 981-0060
     * WWW: IEEE: http://www.ieee.org/
       IEEE Signal Processing Society http://www.ieee.org/sp/index.html

    The Acoustical Society of America (ASA)

     * Publications: Journal of the Acoustical Society of America (JASA)
     * Conferences: ASA holds four meetings a year. Information is
       available on the WWW: http://asa.aip.org/meetings.html.
     * Contact: ASA Office Manager,
       500 Sunnyside Blvd, Woodbury, NY 11797-2999, USA
       Ph: (516) 576-2360, FAX (516) 576-2377
       Email: asa@aip.org
     * WWW: http://asa.aip.org/

    European Speech Communication Association (ESCA)

     * Publications: Speech Communications
     * Conferences: EUROSPEECH is held every two years. E'97 will take
       place in Patras, Greece, in September 1997. ESCA organises regular
       speech-related workshops: see their WWW pages for details.
     * Contact: Secretariat ESCA
       ICP, Universite Stendhal,
       BP 25X, F38400 Grenoble Cedex 9, France
       Ph: (+33).76.82.43.36 Fax (+33).76.82.43.35
       Email: esca@icp.grenet.fr
     * WWW: http://ophale.icp.grenet.fr/esca/esca.html

    Association for Computational Linguistics (ACL)

     * Publications: Computational Linguistics
     * SIGPHON: Special Interest Group for Computational Phonology. The
       home page is provided by the Centre for Cognitive Science at the
       University of Edinburgh. A special issue on Computational
       Phonology appeared in Vol 20, Num 3 of Computational Linguistics
       and included an Introduction to Computational Phonology by Steven
       Bird
     * Conferences: COLING is held bi-annually. ACL also organises a
       range of workshops. See the WWW pages for details.
     * Contact: P.O. Box 6090
       Somerset, NJ 08875, USA
       Ph: (908) 873 3893
       Email: acl@bellcore.com
     * WWW: http://www.cs.columbia.edu:80/~acl/

    American Voice Input/Output Society (AVIOS)

     * Description: AVIOS is a not-for-profit organization, dedicated to
       disseminating information about applications using speech
       technology. It aims "to bridge the gap between emerging voice
       technology and its application, by providing an interactive forum
       for the technologists, students, system developers, business
       managers, and users actively involved in or with an interest in
       the field of voice processing."
     * Publications: International Journal of Speech Technology (with
       Kluwer Academic Publishers)
       The Journal of the American Voice Input/Output Society was
       published from 1984 to 1994.
     * Conferences: The International Voice Input/Output Applications
       Conference is held annually (since 1982): Sept 10-12, San Jose,
       CA.
     * Contact: 4010 Moorpark Avenue, Suite 105M, San Jose, CA 95117, USA

       Ph: +1-408-248-1353, Fax: +1-408-248-0251
       Email: avios@pilot.net
       WWW: http://www.avios.com/

    European Language Resources Association

     * Description: The European Language Resources Association was
       established in Luxembourg in February, 1995, with the goal of
       creating an organization to promote the creation, verification,
       and distribution of language resources in Europe. A non-profit
       organization, ELRA aims to serve as a central focal point for
       information related to language resources in Europe, It will help
       users and developers of European language resources, as well as
       government agencies and other interested parties, exploit language
       resources for a wide variety of uses. It will also oversee the
       distribution of language resources via CD-ROM and other means and
       promote standards for such resources.
     * More info: see the ELRA Home page for membership information,
       lists of resources etc.
     * Contact: K. Choukri, Executive Director ELRA
       87, Avenue d'Italie, 75013 Paris, FRANCE
       Ph: +33 1 45 86 53 00, Fax: +33 1 45 86 44 88
       Email: elra@calvanet.calvacom.fr
       WWW: http://www.icp.grenet.fr/ELRA/home.html

    ASSTA: Australian Speech Science and Technology Association

     * Conference: SST, the Australian conference on Speech Science and
       Technology, is held bi-annually. SST-96 will be held in Adelaide.
     * WWW: Home Page: http://cslab.anu.edu.au/~bruce/assta/
       List of members: http://ciips.ee.uwa.edu.au/~roberto/assta-users/

    SALT: UK Speech and Language Technology Club

     * WWW home page: http://salt.essex.ac.uk/salt/

    Linguistic Associations

     * A comprehensive list of linguistic associations and linguistic WWW
       links is available at
       http://engserve.tamu.edu/files/linguistics/linguist/associations.h
       tml

Industry Publications

    ASR News

     * Description: Monthly newsletter covering developments in the
       speech recognition and speech synthesis marketplace.
     * Note: Voice Information Associates also publish "Automatic Speech
       Recognition: A study of the world-wide market" (revised 1995) and
       "Text-to-Speech Technology Markets: 1995-2000" (revised 1995)
     * Contact: Voice Information Associates, Inc.
       14 Glen Road South, P.O. Box 625, Lexington, MA 02173, USA
       Ph: +1-617-861-6680, Fax: +1-617-863-8790
       Email: asrnews@tiac.net
       WWW: http://www.tiac.net/users/asrnews/

    Voice News

     * Description: Monthly newsletter reporting on voice mail, voice
       response, speech recognition, speech synthesis, digital voice
       record/playback and related technologies, markets and company
       activities. Review copy available on request.
     * Contact: Stoneridge Technical Services
       P.O. Box 1891, Rockville, MD, 20849, USA
       Ph: +1-301-424-0114, Fax: +1-301-424-8971
       Email: info@stoneridgetech.com
       WWW: http://www.stoneridgetech.com/

    Speech Recognition Update

     * Description: Monthly news and analysis of speech recognition
       markets, applications and technology.
       A free sample copy is available by contacting TMA Associates.
     * Also: TMA Associates also publishes market studies, including The
       Advanced Speech Technology Market: Recognition, Synthesis and
       Compression (1996) and Voice ID (1996)

   .

     Contact: TMA Associates
   6021 Wish Avenue, Encino, CA 91316, USA
   Ph: +1-818-708-0962, Fax: +1-818-345-2980
   Email: 72162.3172@compuserve.com
   http://www.tmaa.com/

    Voice Technology and Services News

     * Description: Follows integrated PC LAN messaging (voice, fax,
       mail, video) and speech technology. It follows the merging
       computer and telephone technologies, provides insights into
       business and marketing opportunities and offers executive timely
       information on industry trend analysis.
     * Contact: Phillips Business Information
       1201 Seven Locks Rd., Potomac, Maryland, 20854, USA
       Ph: 1-800-777-5006 OR +1-301-340-1520
       Subscription FAX: +1-301-309-3847
       Editorial FAX: +1-424-4297

    Telleconnect

     * Contact: +1-212-691-8215

    Computer Telephony

     * Contact: +1-212-691-8215

    Voice Processing Magazine

     * Contact: 1-800-854-3112

    Speech Technology

     * Description: No longer published

Technical and Research Publications

    Computer Speech and Language

     * Price: $US170 (Institutions), $US75 (Individuals), 4 issues per
       year.
     * Publisher: Academic Press Limited
       24-28 Oval Road, London NW1, England
       WWW: http://www.apnet.com/

    Speech Communication

     * Contact: ESCA (see above)
     * Publisher: Elsevier Science B.V.
       P.O. Box 521, 1000 AM Amsterdam, The Netherlands.
       WWW: http://www.elsevier.com/

    IEEE Transactions on Speech and Audio Processing,

    IEEE Signal Processing Magazine,

    IEEE Transactions on Acoustics, Speech, and Signal Processing: OBSOLETE

     * Contact: IEEE (see above)

    Free Speech Journal

     * Description: A Web Journal dedicated to the state of the art in
       human language technology. Past volumes, editorial and submission
       information, and so on are
     * Contact: Editor-In-Chief: Ron Cole: cole@cse.ogi.edu
       WWW: http://www.cse.ogi.edu/CSLU/fsj/html/masthead.html

    Linguistics Abstracts Online

     * Description: online access to all abstracts published in
       Linguistics Abstracts since 1985, plus all current material as it
       becomes available. Over 250 publications are indexed. Free trial
       available.
       http://www.blackwellpublishers.co.uk/labs/

    Computational Linguistics

     * Contact: Published by Computational Linguistics Assoc. (see above)

    Journal of the Acoustical Society of America (JASA)

     * Contact: Published by Acoustical Society of America (see above)

    International Journal of Speech Technology (was the AVIOS Journal)

     * Description: Focuses on speech technology and its applications,
       and promotes research and description of all aspects of speech
       input and output: applications, base technology, theory, approach,
       experiment, and testing.
     * Publisher: Kluwer Academic Publishers
       101 Philip Drive, Norwell, MA 02061, USA
       Ph: +1-617-871-6300, Fax: +1-617-871-0449
     * Submissions to: International Journal of Speech Technology
       Journals Editorial Office, Ms. Kelly Riddle
       Kluwer Academic Publishers
       (Address, phone, fax as above)
       Email: krkluwer@world.std.com

Conferences

   ICSLP: Intl. Conference on Spoken Language Processing
          Next: 30 Nov to 4 Dec, 1998, Sydney, Australia
          Held in even years.

   ICASSP - Intl. Conf. Acoustics, Speech, and Signal Processing

   Eurospeech

   Computational Linguistics (COLING), held bi-annually

   International Voice Input/Output Applications Conference

   SST: Australian Speech Science and Technology Conference

   Also see the following lists on the WWW:

   Shikano's WWW site on Speech and Acoustics
          http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-res
          ource/e-www-site.html

   Institute of Phonetic Sciences WWW list
          http://fonsg3.let.uva.nl/Other_pages.html#Meetings


___________________________________________________________________________

                          Q1.6: Handicap Aids

   The following are products and companies which support users who can
   benefit from the use of speech technology in a user interface. Please
   feel free to submit information on relevant products, names of
   companies and links to useful information on the Internet (especially
   WWW sites).
   [Of course, most of the products listed in Q5.5 and Q6.5 are useful.]

          * Man-Machine Interfacing
          * SpeechViewer II



Man-Machine Interfacing

     * Description: Offers a service designed for people with physical
       challenges. Can successfully implement a computerized voice
       controlled system adapted to unique needs.
       They have developed a free-standing microphone and signal
       processing system to compensate for speech/articulation
       distortions, and background noise produced by electronic devices
       such as wheelchairs and respirators.
     * Contact: Man-Machine Interfacing
       P.O. Box 5371, Evanston, IL 60204
       Ph: 1-888-425-2001, Fax : (847) 328-7975
       Email: jwhite@mcs.com
       WWW: http://www.speechrec.com/



SpeechViewer II

     * Platform: IBM Machines from Mod 25 on.
     * Description: SpeechViewer II is a speech therapy tool. It provides
       graphical feedback of various speech features so that speech
       impaired individuals can improve their speech. It works with an
       audio bandwidth of 7.3 Khz and thus allows the therapist to work
       with sustained vowels and fricatives. A wide range of graphics are
       used to provide adequate variability to hold client interest. An
       extensive set of statistics are gathered which allows a therapist
       to do research or keep therapy records. The speech therapy modules
       are:
          + Awareness - Sound, Loudness, Pitch, Voicing Onset, Voicing
          + Skill Building - Pitch, Voicing, Phonology
          + Patterning - Pitch & Loudness - Waveform & Spectrogram,
            Spectra
          + Clinical Management - Profiles, Models, Client Data
       A multilingual option is available which provides support for 12
       languages: Danish, Dutch, Finnish, French, German, Icelandic,
       Italian, Norwegian, Portuguese, Spanish, Swedish, and UK English.
       With the Multilingual Option, clinicians can use SpeechViewer II
       as a training tool for English as a second language and for
       foreign language training.
     * Hardware: Requires an IBM M-ACPA (Multimedia-Audio Capture
       Playback Adapter). It has a TI TMS320C25 DSP chip. The input
       sampling rate is 44.1 Khz stereo, 88.2 Khz mono. This is a 16 bit
       card. It has the following jacks: mic in, stereo line in, stereo
       line out, speaker out. Note: This card is being replaced by Mwave
       technology. For more info on Mwave contact Texas Instruments.
     * Price:
          + The software is $2130 list, $1491 educational, part number
            92F2066.
          + The M-ACPA is $370 list, $222 educational, part number
            92F3378.
          + The MicroChannel adapter part number is 92F3379 (same price).
     * Contact: IBM Special Needs Information
       1000 N. W. 51st Street, Internal Zip 5432, Boca Raton, Florida
       33431, USA
       Ph: 1-800-426-4832, TDD: 1-800-426-4833, Fax: 1-407-982-6059
       Email: IBM_SPEC_NEEDS_INFO@vnet.ibm.com
       WWW: http://www.austin.ibm.com/pspinfo/snsspv2.html


___________________________________________________________________________

                         Q1.7: Speech databases

   A wide range of speech databases have been collected. These databases
   are primarily for the development of speech synthesis/recognition and
   for linguistic research.

   Some databases are free but most are not. The databases normally
   require lots of storage space (100's of MBytes is not unusual). Do not
   expect to be able to ftp large amounts of speech data.

   In addition to the descriptions of speech databases and speech
   database providers below, information can be obtained from

    LDC: Linguistic Data Consortium
          Provides a very wide range of speech and text data to research
          and commercial users: see below.

    COCOSDA Home Page: http://www.itl.atr.co.jp/cocosda/
          The International Committee for the Co-ordination and
          Standardisation of Speech Databases and Assesment Techniques
          for Speech Input/Output.

    Shikano's WWW site on Speech and Acoustics
          http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-res
          ource/e-www-site.html

    RELATOR Project
          European resource initiative: see below.

   The following speech data resources are described in the FAQ.

          * Bavarian Archive for Speech Signals
          * BUPT Spoken Digit Database (Chinese)
          * Center for Spoken Language Understanding (CSLU)
          * Examples of IPA Symbols
          * Linguistic Data Consortium (LDC)
          * NOISEX
          * Oxford Acoustic Phonetic Database
          * Phonemic Samples
          * RELATOR project
          * ShATR
          * University of Victoria Phonetic Database



Bavarian Archive for Speech Signals

     * Description: The Bavarian Archive for Speech Signals (BAS) was
       founded in January 1995 as an initiative of the Institute of
       Phonetics at the University of Munich, Germany. The BAS will
       develop, validate, administrate and disseminate corpora of spoken
       German to the speech community as well as to speech engineering
       industry. Presently the following German speech corpora are
       available on ISO 9660 CDROM:

        Siemens 1000 - SI1000
                5 CDROMs, newspaper corpus, read speech, 10 speakers x
                1000 utterances

        Siemens 100 - SI100
                7 CDROMs, read speech, 101 speakers x 100 sentences

        PhonDat 1 - PD1
                6 CDROMs, new edition in preparation, read speech, 201
                speakers x 450+ sentences

        PhonDat 2 - PD2
                1 CDROM, read speech, 2nd edition, 16 speakers x 200
                sentences, various labelled information

        Verbmobil
                Spontaneous speech recorded in a dialog task (appointment
                scheduling). More information on the VERBMOBIL project:
                http://www.dfki.uni-sb.de/verbmobil/

       Corpora in Preparation

        PhonDat I - PD1: 2nd extended edition (Jul 1995)

        Strange Corpora - SC
                Reference Corpora that reflect certain well known
                problems in speech processing, like accents, repair,
                breaks, hesitations, repetitions, extreme F0, backround
                noise, pathological speech, speaker adaptation. The first
                SC corpus (SC1 Accents) will be edited in Jul 1995.

        BAS Edition of Verbmobil Corpora - VM: 2nd extended edition

        Articulatory data - AD: EMA data of speakers of SI1000 corpus

        ERBA: 10000 utterances from a train inquiry task

     * Misc: BAS is currently developing tools for the automatic
       annotation and segmentation of very large speech corpora. This
       includes the automatic detection of variants of pronunciation, a
       statistical based alignment and a rule-based refinement of the
       outcome. The BAS seeks to cooperate with public institutions as
       well as with industrial partners to further develop new German
       speech databases. BAS can be a platform to re-distribute existing
       German speech.
     * Contact and More Information: The BAS is located at the University
       of Munich, Germany.
       BAS c/o Institut fuer Phonetik
       Schellingstr. 3/II
       80799 Muenchen, Germany
       Ph: +49-89-21802758, Fax: +49-89-2800362
       Email: bas@sun1.phonetik.uni-muenchen.de
       WWW: http://www.phonetik.uni-muenchen.de/BASSeng.html



BUPT Spoken Digit Database (Chinese)

     * Vocabulary : {0, 1/yi/, 2, 3, 4, 5, 6, 7, 8, 9, 1/yao/, /dui/,
       /cuo/ }, 13 words in total.
     * Size: 1202 speakers in total, 789 Males and 413 Females. Each
       speaker utters each word 2 times. Total of 31252 utterances.
     * Format: 8000Hz 14bit sampling. One utterance per file.
     * Contact:

    GLuck Co.
    195 Berlioz 1C, Nun's Island
    Verdun H3E 1C1, Canada
    e-mail: weigang@zaphod.math.mcgill.ca



Center for Spoken Language Understanding (CSLU)

     * The ISOLET speech database of spoken letters of the English
       alphabet. The speech is high quality (16 kHz with a noise
       cancelling microphone). 150 speakers x 26 letters of the English
       alphabet twice in random order. The ISOLET data base can be
       purchased for $100 by sending an email request to
       vincew@cse.ogi.edu. (This covers handling, shipping and medium
       costs). The data base comes with a technical report describing the
       data.
     * CSLU has a telephone speech corpus of 1000 English alphabets.
       Callers recite the alphabet with brief pauses between letters.
       This database is available to not-for-profit institutions for
       $100. The data base is described in the proceedings of the
       International Conference on Spoken Language Processing.
          + Contact vincew@cse.ogi.edu if interested.
     * CSLU has released for universities its Continuous English Speech
       Corpus. The corpus contains recorded speech from 690 different
       speakers, with label files at various levels - including word
       level and phonetic labels. The data were collected as part of the
       OGI Multi-language telephone corpus. CSLU provides speech corpora
       to all universities without charge. To order a corpus, print the
       license agreement/order form, complete it, and fax it to the CSLU.
       A description of the corpora and an order form are available:

                http://www.cse.ogi.edu/CSLU/
                ftp://speech.cse.ogi.edu/pub/releases

     * Contact: Mike Noel: noel@cse.ogi.edu



Examples of IPA Symbols

  UCLA Sounds of the World's Languages

     * Description: The UCLA Sounds of the World's Languages are
       available for Macintosh users (no DOS based system currently
       available). The sounds are stored in a Hypercard database
       developed at the UCLA Phonetics Laboratory. The aim is to
       illustrate and teach about the range of sounds used in human
       languages with material on more than 80 languages. The set
       demonstrates particular highlights of the sound systems focusing
       especially on rarer sounds that students may not otherwise have a
       chance to hear from a native speaker. The recordings are based on
       the archives of recordings collected at UCLA, with additional
       contributions from outside collaborators. All the languages can be
       accessed from the list of language names, or by clicking on the
       language name in a set of maps. Support for part of this work was
       provided by NSF. The database currently includes examples of
       languages from Agul and Akan to Zulu.
     * Availability: 15 DSDD disks, requiring about 35 meg of disk space
       when expanded. Available for $50 individual $100 institutions.
       Prepayment in US dollars (checks or international money orders
       payable to "UC Regents") must accompany all orders.
     * Contact: The UCLA Phonetics Laboratory
       Linguistics Department, UCLA, Los Angeles, CA 90095 1543
       Tel: (310) 825-1254
       E-mail: oldfogey@ucla.edu

  John Eslings "IPA Labels"

     * Description: A HyperCard stack which is available for free or a
       nominal fee.
     * Contact: John Esling can be reached by email: pdb@uvvm.uvic.ca.



Linguistic Data Consortium (LDC)

   The LDC was established to broaden the collection and distribution of
   speech and natural language data bases for the purposes of research
   and technology development in automatic speech recognition, natural
   language processing and other areas where large amounts of linguistic
   data are needed. Detailed information on the LDC is now available on
   the WWW: http://www.ldc.upenn.edu/. The LDC WWW server provides
   information on membership agreements, license agreements, and
   summaries of speech and text corpora available.

    Speech Corpora

     * TIMIT Acoustic-Phonetic Continuous Speech Corpora and NYNEX
       Telephone Version of TIMIT Corpus (NTIMIT)
     * Resource Management Corpora
     * Air Travel Information System (ATIS) Corpora (multiple)
     * ARPA Continuous Speech Recognition Corpora (WSJ etc)
     * Switchboard Corpus of Recorded Telephone Conversations and
       Switchboard Corpus Excerpts (Credit Card Conversations)
     * Texas Instruments 46-Word Speaker-Dependent Isolated Word Corpus
       (TI46)
     * Texas Instruments Speaker-Independent Connected-Digit Corpus
       (TIDIGITS)
     * Road Rally Conversational Speech Corpus
     * HCRC Map Task Corpus
     * Air Traffic Control Corpus (ATC0)
     * SPIDRE Speaker Identification Corpus
     * YOHO Speaker Verification Corpus
     * OGI Multi-Language Corpus and OGI Spelled and Spoken Telephone
       Corpus
     * BRAMSHILL
     * MACROPHONE
     * King Corpus for Speaker Verification Research
     * WSJCAM0: Cambridge Read News Corpus
     * TRAINS Spoken dialog corpus
     * NYNEX PhoneBook Database
     * Frontiers in Speech Processing

    Text Corpora

     * Association for Computational Linguistics Data Collection
       Initiative (ACL/DCI)
     * The Penn Treebank Project - Release 2
     * TIPSTER Information Retrieval Text Research Collection
     * United Nations Parallel Text Corpus (English, French, Spanish)
     * Japanese Language Financial New
     * European Corpus Initiative-1

    Lexical Databases

     * CELEX Lexical Database
     * COMLEX : COMmon LEXical Database of English (English syntax and
       pronunciation)

    Contact information:

   Linguistic Data Consortium
   3615 Market Street, Suite 200, Philadelphia, PA, 19104-2608, USA.
   Phone: +1 (215) 898-0464 Fax: +1 (215) 573-2175
   e-mail: ldc@ldc.upenn.edu
   WWW: http://www.ldc.upenn.edu/


NOISEX-92

     * Description: Database of recording of various noises available on
       2 CDROMs. Some material from the same source is available by
       anonymous ftp in the IEEE's Signal Processing Information Base.
       The samples include
          + Voice babble
          + Factory noise
          + HF radio channel noise, pink noise, white noise
          + Various military noises; fighter jets (Buccaneer, F16),
            destroyer noises (engine room, operations room), tank noise
            (Leopard, M109), machine gun
          + Volvo 340
     * Availability 1: The cost of this database is 135 Pounds Sterling
       for the set of two CD-ROMs. Send payment with order to:
       The Speech Research Unit,
       Ex1, DRA Malvern, St.Andrew's Road,
       Malvern, Worcestershire, WR14 3PS, UK
       Tel +44-684-894074 Fax +44-684-894384
       Note: The supply of CD-ROMs is limited so please check that they
       are still available before placing an order. The only acceptable
       methods of payment are cheques (from the UK only) or bank drafts
       in Pounds Sterling drawn on a UK bank. They should be made payable
       to:-
       Public Sub Account HMG 4768.
     * Availability 2: Information on how to obtain a copy of the NATO
       RSG.10 NOISE-ROM-0 can be obtained from the DRA Speech Research
       Unit (address above) or from:
       Dr. Herman Steeneken,
       TNO Institute for Perception,
       P.O. Box 23, 3769 ZG Soesterberg,
       The Netherlands.
     * Availability 3 (WWW): Examples of the NOISEX database are
       available on the Rice University Digital Signal Processing (DSP)
       group home page. (Note the files are large (>20MB).
       http://spib.rice.edu/spib/select_noise.html



Oxford Acoustic Phonetic Database

     * Available on compact disc, from J. Pickering and B. Rosner. It
       contains data on vowel-consonant and consonant-vowel combinations
       in both stressed and unstressed locations. The language covered
       include French, German, Hungarian, Italian, Japanese, British
       English, Spanish and English. For further information write to

    Electronic Publishing, Oxford University
    Press, Walton Street, Oxford OX2 6DP, UK.
    The ISBN is 0-19-268086-2
     * Contact:

    Prof. B. Rosner
    Dept. of Experimental Psychology
    South Parks Rd, Oxford, OX1 3UD, UK
    email: burton.rosner@wolfson.ox.ac.uk



Phonemic Samples

     * Some basic data. The following ftp sites have samples of English
       phonemes (American accent I believe) in Sun audio format files.
       See Question 1.8 for information on audio file formats.

          ftp://sounds.sdsu.edu/.1/: This ftp site appears to be
          obsolete. Does anyone know a new address?

          ftp://phloem.uoregon.edu/pub/Sun4/lib/phonemes: There appears
          to be some config problem with this ftp server.

          ftp://sunsite.unc.edu/pub/multimedia/sun-sounds/



The RELATOR project

     * Description: RELATOR is a European-wide consortium of researchers
       who, with the support of the European Commission, are striving to
       establish a European repository of linguistic resources.
       Linguistic resources comprise a variety of spoken and written
       language materials, including lexicons, grammars, corpora, and
       spoken language databases. RELATOR will ensure that the
       requirements of the European language processing community receive
       attention.
       The RELATOR WWW pages provide information on the consortium, The
       languages currently covered by the RELATOR consortium include
       Danish, Dutch, English, French, German, Greek, Italian,
       Portuguese, Spanish plus multilingual resources. The resources
       include both text and speech.
     * WWW: http://cristal.icp.grenet.fr/Relator/homepage.html



ShATR

     * Description: Multi-simultaneous-speaker corpus available on one
       CDROM. This specialised corpus is primarily intended to provide
       acoustic material for studies in auditory scene analysis. However
       many researchers in the speech sciences, ranging from acoustics to
       discourse analysis may find it a valuable source of information.
       The corpus has been transcribed and aligned at four different
       levels of analysis. An overlap analysis between the individual
       speaker channels and word counts are available. There is also a
       general tool for accessing concurrent events in transcribed
       multi-sound-source databases.
     * Cost: 30 Pounds Sterling for one CD-ROM. Availability, licensing
       and ordering information is provided on ShATR's home page.
     * Examples: Samples of the ShATR database are available on ShATR's
       home page and by anonymous ftp
       ftp://ftp.dcs.shef.ac.uk/share/spandh/ShATR/
     * Contact: Speech and Hearing Research Group
       Department of Computer Science, University of Sheffield
       Regents Court, 211 Portobello Street, Sheffield S1 4DP, U.K.
       WWW:
       http://www.dcs.shef.ac.uk/research/groups/spandh/pr/ShATR/ShATR.ht
       ml



University of Victoria Phonetic Database

     * Platform: Computerized Speech Lab CSL4300, MultiSpeech on Winxx or
       Win95 with any multimedia card, or a SoundBlaster16 option with
       support from the PDBAUDIO program.
     * Description: Phonetic database consisting of proprietary format
       digitized speech samples from 45 world languages on CDROM. The
       CDROM is supported by hardcopy documentation containing the
       phonetic inventory of each language, transcriptions and
       orthography of each digitized speech sample. The PDB depicts and
       compares the the sounds, symbols and conventions of transcription
       used by these languages. More information is available from the
       STR web site.
     * Contact: Speech Technology Research Ltd.,
       Suite B - 1623 McKenzie Avenue, Victoria, B.C. V8N 1A6, Canada
       Ph: +1-250-477-0544
       Email: products@speechtech.com
       WWW: http://www.speechtech.com/home/speechtech/


___________________________________________________________________________

                Q1.8: Speech File Formats and Conversion

   Q2.7 of this FAQ has information on mu-law coding.

   A very good and very comprehensive list of audio file formats is
   prepared by Guido van Rossum. The list is posted regularly to comp.dsp
   and alt.binaries.sounds.misc, amongst others. It includes information
   on sampling rates, hardware, compression techniques, file format
   definitions, format conversion, standards, programming hints and lots
   more. It is also available by ftp from

          WWW: ftp://ftp.cwi.nl/pub/audio/index.html

          Text: ftp://ftp.cwi.nl/pub/audio/AudioFormats.part1,2

   A useful source of software (Sox, ulaw conversion, SoundKit etc) is:

          http://peace.wit.com/sounds/SoundConversion/


___________________________________________________________________________

         Q1.9: Speech Laboratory Environments and Audio Editors

   First, what is a Speech Laboratory Environment? A speech lab is a
   software package which provides the capability of recording, playing,
   analysing, processing, displaying and storing speech. Your computer
   will require audio input/output capability. The different packages
   vary greatly in features and capability - best to know what you want
   before you start looking around.

   Most general purpose audio editing packages will be able to process
   speech but do not necessarily have some specialised capabilities for
   speech (e.g. formant analysis).

   The following article provides a good survey.

     * Read, C., Buder, E., & Kent, R. "Speech Analysis Systems: An
       Evaluation" Journal of Speech and Hearing Research, pp 314-332,
       April 1992.

   The following is a list of the speech labs described in the FAQ.

          * CSRE: Computerized Speech Research Environment
          * DADiSP from DSP Development Corporation
          * Entropic Signal Processing System (ESPS) and Waves
          * GoldWave
          * Kay Elemetrics Computer Speech Lab
          * Khoros
          * Matlab plus Signal Processing Toolbox
          * MacSpeech Lab II
          * N!Power
          * OGI Speech Tools
          * Ptolemy
          * Quadravox Speech Processing Products - Qbox
          * Speech Filing System (SFS)
          * Signalyze 3.0 from InfoSignal
          * SoundScope



CSRE: Computerized Speech Research Environment

     * Platform: DOS
     * Description: CSRE (pronounced "Caesar") is a speech processing
       system for the PC. It provides
          + Signal recording and playback
          + Signal editing
          + Pitch and spectral analysis and formant analysis
          + Speech synthesis with an implementation of the Klatt-1980
            parametric speech synthesizer
     * Requirements: PC compatible (80486DX), 1 Meg RAM (recommend 4M),
       DOS 3.2 (recommend 6.22), VGA graphics (640x480; 16 colors) 30 Meg
       of hard disk space (5 Meg for CSRE plus space for audio
       recordings), and a supported audio card .
     * Cost: See AVAAZ WWW Pages
     * Contact: AVAAZ Innovations Inc.
       P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G
       2B0
       Ph: +1-519-472-7944, Fax: +1-519-472-7814
       Email: info@avaaz.com
       WWW: http://www.icis.on.ca/homepages/avaaz/
     * Note: See also the CSRE entry in Q5.5 on speech synthesisers.



DADiSP from DSP Development Corporation

     * Platform: Windows and various Unix
     * Description: DADiSP is designed for scientists and engineers to
       collect, analyze, and display scientific and technical data.
       Packages available include AdvDSP, Controls, DADiMP, Filters,
       GPIBLab, NeuralNet, and Stats.
       A description of the application of DADiSP to speech processing is
       provided on the DSP Development Corporation WWW site.
       Detailed product information is available on the DSP Development
       Corporation WWW site and by filling out a WWW form.
     * Cost: Unknown
     * Availability: See the DSP Development Corporation WWW site
       A free, fully featured demo of DADiSP 4.0 is available from the
       DSP Development Corporation WWW site and can be mailed on floppy
       disk.
       A special Student Edition of DADiSP is available for free.
     * Contact: DSP Development Corporation
       One Kendall Square, Cambridge, MA 02139, USA
       Ph: (617) 577-1133 Fax: (617) 577-8211
       EMail: info@dadisp.com
       WWW: http://www.dadisp.com/



Entropic Signal Processing System (ESPS) and Waves

     * Platform: Range of Unix platforms.
     * Description: ESPS is a comprehensive set of speech
       analysis/processing tools for the UNIX environment. The package
       includes UNIX commands, and a comprehensive C library (which can
       be accessed from other languages). Waves is a graphical front-end
       for speech processing. Speech waveforms, spectrograms, pitch
       traces etc can be displayed, edited and processed in X windows and
       Openwindows (versions 2 & 3). Waves also includes a signal
       labelling utility which provides multiple feature labelling and
       useful features for fast labelling of large speech databases.
       Other Entropic products are HTK (see Q6.5) and TrueTalk (see
       Q5.5).
     * Misc: A more detailed description is provided on the Entropic WWW
       pages (http://www.entropic.com/esps.html).
     * Cost: On request.
     * Contact:

    Entropic Research Laboratory, Washington Research Laboratory
    600 Pennsylvania Ave, S.E. Suite 202, Washington, D.C. 20003
    (202) 547-1420
    email: info@entropic.com
    WWW: http://www.entropic.com/



GoldWave

     * Platform: Windows
     * Description: GoldWave is a digital audio editor for Microsoft
       Windows. It features realtime amplitude/spectrum oscilloscopes,
       large file editing, effects, and support for a wide variety of
       sound formats.
          + Editing of multiple waveforms and large waveforms
          + Realtime amplitude/spectrum oscilloscopes
          + Resizable device controls window for accessing audio devices
          + Realtime fast forward and rewind playback
          + Effects: distortion, Doppler, echo, filter, mechanize,
            offset, pan, volume shaping, invert, resample, transpose, etc
          + Multiple file formats and conversions: .WAV, .AU, .IFF, .VOC,
            .SND, .MAT, .AIFF, and raw data
          + CD-ROM controls window
       More information is available on the GoldWave home page.
     * Cost: Shareware
     * Availability: Through the GoldWave home page:
       http://web.cs.mun.ca/~chris3/goldwave/goldwave.html
     * Contact: Chris Craig: chris3@cs.mun.ca



Kay Elemetrics CSL (Computer Speech Lab) 4300

     * Platform: Minimum IBM PC-AT compatible with extended memory (min
       2MB) with at least VGA graphics. More powerful machines
       preferable.
     * Description: Speech analysis package, with optional separate LPC
       program for analysis/synthesis. Uses its own file format for data,
       but has some ability to export data as ascii. The main
       editing/analysis prog (but not the LPC part) has its own macro
       language, making it easy to perform repetitive tasks.
       Options - more information on the Kay Elemetrics Corp. WWW site:
          + Multi-Dimensional Voice Program (MDVP)
          + Voice Range Profile (Phonetograph)
          + Real-Time Spectrogram
          + Sona-Match
          + Palatometer Database
          + IPA Transcription Tutorial
          + Delayed Auditory Feedback (DAF)
          + Disordered Voice Database
          + Auditory Perception Program and Database
          + Motor Speech Profile Program
          + CSL-Pitch
          + Real-Time EGG Processing
          + Signal Enhancement in Noise Program
          + Synthesis Program
          + DAT Interface and Four Channel Input
          + Phonetic Database
          + Direct-to-Disk Program
          + Programmers Kit
          + Condenser Microphone
          + Multi-Speech
     * Cost: Contact Kay Elemetrics Corp.
     * Contact: Kay Elemetrics Corp.
       2 Bridgewater Lane, Lincoln Park, NJ 07035, USA
       Ph: +1-201-628-6200, Fax: +1-201-628-6363
       Toll free tel. 1-800-289-5297
       [WWW: http://www.kayelemetrics.com/ - available soon]



Khoros

     * Platform: Any Unix - source code available.
     * Description: Khoros is a technical computing environment for image
       and signal processing, visual programming and software
       development.
     * Price: On request.
     * Availability: Khoral Research Inc.
       6001 Indian School Rd. NE Suite 200, Albuquerque, NM 87110, USA
       Ph: (505)837-6500, Fax: (505) 881-3842
       Email: info@khoral.com
       ftp: ftp://ftp.khoral.com/
       WWW: http://www.khoral.com/



Matlab plus Signal Processing Toolbox

     * Platform: Wide range
     * Description: Matlab (MATrix LABoratory) is a technical computing
       environment for numerical computation and visualization based on a
       matrix oriented, interpreted programming language. The programming
       environment provides support for the development of customized
       operations, along with debugging facilities and a graphical user
       interface toolkit. Audio output is provided.
       A specialised Signal Processing Toolbox is available which
       provides many functions which are useful for speech analysis. It
       includes filter design, spectral estimation, statistical signal
       processing, waveform generation, and signal and spectrogram
       display.
       A specialised Auditory Toolbox is available which contains
       functions useful to people interested in auditory/cochlear models.
       A more detailed description is given in Q1.10.
     * Price: On request.
     * Contact: The Math Works Inc. 24 Prime Park Way, Natick, MA
       01760-1500 USA
       Ph: 1-508-653 1415 Fax: 1-508-653 6284
       Email: info@mathworks.com
       ftp: ftp://ftp.mathworks.com
       WWW: http://www.mathworks.com/



MacSpeech Lab II (MSL II)

     * Platform: Macintosh
     * Description: A sound analysis and acquisition for Macs. MSL II
       delivers the most common functions for speech analysis (FFTs,
       LPCs, f0 extraction, etc.) & produces grayscale spectrographic
       displays. Can be used for various speech technology and phonetic
       training tasks.
     * Hardware: Requires MacADIOS ("Macintosh Analog/Digital
       Input/Output System") hardware for speech I/O at 12/16 bits.
     * Misc: Software no longer updated by GW Instruments; MSL
       soft/hardware will not perform input/output on Quadras, for
       example, though analysis seems fine. Known to operate properly on
       systems as high as IIcx & II fx.
     * Availability: MSL has been replaced by SoundScope; see the
       SoundScope entry for more detail.
     * Contact:

    GW Instruments
    35 Medford Street, Somerville, MA 02143, USA
    Phone: (617) 625-4096 Fax: (617) 625-1322



N!Power

     * Platform: SUN, DEC and HP workstations.
     * Description: An object-oriented software package with a MOTIF GUI
       interface and a range of functionality for data analysis/editing,
       signal analysis, speech processing, real-time A/D and D/A, and
       2D/3D interactive graphics. N!Power replaces ILS.
       N!Power can provide a Block Diagram user interface, menus,
       pop-ups, and a high-level IEEE standard symbolic scripting
       language. You can customize the blocks, menus and pop-ups with
       mouse point-and-click operations.
     * Contact: Signal Technology, Inc.
       104 W. Anapamu, Suite J, Santa Barbara, CA 93101-3126
       Phone: +1-805-899-8300, Fax: +1-805-899-4344
       Email: stisales@signal.com
       WWW: http://www.silcom.com/~stilarry/



OGI Speech Tools

     * Developers from the Center for Spoken Language Understanding
       (CSLU) at the Oregon Graduate Institute of Science and Technology
       (Portland Oregon)
     * Platform: Unix
     * Description: The OGI Speech tools include :
          + An X windows display tool (LYRE) for displaying data in a
            time synchronous fashion for a. the speech signal b.
            spectrograms c. phoneme labels, and other information.
          + A Neural Network (NOPT) training package.
          + An set of C library routines (LIBNSPEECH) for the
            manipulation of speech data, including: a. PLP Analysis, b.
            Rasta PLP Analysis, c. Linear Predictive Coding, d. Mel
            Cepstrum Coding, e. Fast Fourier Transform
          + A set of utilities for converting file formats such as ADC,
            NIST, mu-law, binary files, and ascii. Includes filtering.
          + A database utility (find_phone) to automate speech database
            related enquiries. It allows the user to specify a particular
            label or set of labels in a given context, display all
            occurrences of the label, and relabel the occurrences if
            desired.
          + A Vector-Quantizer based on the Linde Buzo and Gray (LBG)
            algorithm.
          + A set of PERL Scripts which have been used mainly to automate
            the use of the OGI Speech Tools.
          + MAN Pages for all routines and programs developed, as well as
            a User manual in both in postscript and tex format.
     * Misc: Software is written in ANSI C.
     * Contact: Email: tools@cse.ogi.edu
       WWW: http://www.cse.ogi.edu/CSLU/
       ftp: ftp://speech.cse.ogi.edu/pub/tools/



Ptolemy

     * Platform: Sun SPARC, DecStation (MIPS), HP (hppa).
     * Description: Ptolemy provides a highly flexible foundation for the
       specification, simulation, and rapid prototyping of systems. It is
       an object oriented framework within which diverse models of
       computation can co-exist and interact. Ptolemy can be used to
       model entire systems.
       Ptolemy has been used for a broad range of applications including
       signal processing, telecomunications, parallel processing,
       wireless communications, network design, radio astronomy, real
       time systems, and hardware/software co-design. Ptolemy has also
       been used as a lab for signal processing and communications
       courses. Ptolemy has been developed at UC Berkeley over the past 3
       years. Further information, including papers and the complete
       release notes, is available from the FTP site.
     * Cost: Free
     * Availability: The source code, binaries, and documentation are
       available by anonymous ftp from

                 ftp://ptolemy.berkeley.edu/pub/



Quadravox Speech Processing Products - Qbox

     * Platform: Windows 3.1, Windows 95
     * Description: Qbox comprises a Windows-based LPC-12 analysis and
       editing sytem and a parallel-port driven programmer for
       one-time-programmable TI TSP50P11 synthesis chips. The analysis
       software utilizes standard 11025Hz, 16bit monaural .wav files for
       input and allows graphical editing of the coded pitch, gain, and
       reflection coefficients. It can also be used to define
       concatenation sequences of individual phrases. Data rates depend
       on the original sound, but are typically below 2000bits/sec. The
       processed data can then be merged with synthesis and control
       routines and programmed into the TI synthesizer. The
       Quadravox-developed synthesis routine accepts run-time
       modifications of pitch and frame-length (speed), as well as
       externally defined concatenation sequences. The synthesis chip
       interface can be defined as a matrixed-keyboard drive, a simple
       parallel control, or a serial bus control supporting up to 31
       individually addressed devices and modules.
     * Cost: $90-$150 depending on options selected.
     * Contact: Quadravox, Inc.
       1701 N. Greenville Ave., Suite 608, Richardson, TX, 75081 USA
       Ph: 214-669-4002
       Email: info@quadravox.com
       WWW: http://www.quadravox.com/



Speech Filing System (SFS)

     * Platform: Unix and DOS
     * Description: SFS provides a computing environment for conducting
       speech research. It comprises software tools, file and data
       formats, subroutine libraries, graphics, standards and special
       programming languages. It performs standard operations such as
       recording, replay, waveform editing and labelling, spectrographic
       and formant analysis and fundamental frequency estimation. For
       more information, see
       ftp://pitch.phon.ucl.ac.uk/pub/sfs/README
     * Misc: SFS is copyrighted University College London, but is
       currently supplied free of charge to research establishments for
       non-profit use.
     * Availability: SFS source code is available by anonymous FTP from:
       ftp://pitch.phon.ucl.ac.uk/pub/sfs/
     * Contact: Mark Huckvale
       University College London, Gower Street, London WC1E 6BT, UK
       Email: SFS@phonetics.ucl.ac.uk
       ftp: ftp://pitch.phon.ucl.ac.uk/pub/sfs/



Signalyze 3.0 from InfoSignal

     * Platform: Macintosh
     * Description: Signalyze is an interactive program for the analysis
       of speech and other acoustic material. Signalyze's basic concept
       revolves around the display of up 100 signals in HyperCard
       fashion. The program offers a range of signal editing features,
       spectral analysis tools, manual scoring tools, pitch extraction
       routines, signal manipulation tools, and extensive input-output
       capacity. It also has a range of capabilities for creating,
       editing and manipulating label files with flexibility in labelling
       format.
       Signalyze handles the following file formats: Signalyze, MacSpeech
       Lab, AudioMedia, SoundDesigner II, SoundEdit/MacRecorder,
       SoundWave, sound resource formats, and ASCII-text.
       Sound I/O: Direct sound input from Apple 8- or 16-bit sound input
       Sound output via Macintosh 8- or 16-bit sound.
     * Compatibility: MacPlus and higher. Takes advantage of large
       screens, multiple screens and 16/256 color/grayscales. System 7.0
       compatible. Runs in background with adjustable priority.
     * Misc: Manuals and tutorials included (250 pp.). Program is
       switchable to English, French, and German. For more information
       and demo:
       WWW: http://www.agoralang.com:2410/pubdirsoftware.html
       WWW: http://www.agoralang.com:2410/signalyze.html
       Gopher: gopher://uldns1.unil.ch:70/11/unilgophers/gopher_lett/LAIP
     * Cost: Individual licence US$450, departmental license US$750,
       organisational license US$1250, plus shipping. Upgrades from
       version 2.0 are available.
     * Contact: The Americas: Network Technology Corporation
       91 Baldwin St., Charlestown, MA 02129, USA
       Phone: +1-617-241-9205, Fax: +1-617-241-5064
       ---
       Elsewhere: InfoSignal Inc.
       C.P. 73, 1015 LAUSANNE, Switzerland,
       Fax: +41 21 691-1372,
       Email: 76357.1213@COMPUSERVE.COM



SoundScope

     * Platform: Macintosh: 68K and PowerPC native
     * Description: The SoundScope product family is used primarily in
       speech teaching & research, with some applications in animal
       sounds, forensics, and general acoustic analysis. It can record,
       view, analyze, play, copy, paste, store and print sound waveforms.
       Analysis functions include spectrogram, fundamental frequency
       (Fo), Linear Predictive Coding (LPC) including formant tracking,
       LPC residual, jitter (pitch perturbation), shimmer (amplitude
       perturbation), HNR, frequency spectrum, spectral slice, envelope,
       energy and zero crossing. Includes limited built-in filtering,
       runs any filter created with WLFDAP. An integrated text editor
       stores notes and calculation results. SoundScope lets you design
       your own custom "instrument" screen, tasks (macros) and menus.
       Supplied instruments include 1 channel analyser (dual snap, dual
       time, spectrogram, spectrum), 2 channel analyser, segment
       analyser, multi-channel recorder, etc.
     * Note: Supercedes MacSpeech Lab II.
     * Price: $490 to $4990, less educational discount
     * Availability: In North America, directly from GW Instruments.
       Contact the company for international distributors.
     * Contact: GW Instruments
       35 Medford Street, Somerville, MA 02143, USA
       Ph: +1-617-625-4096, Fax: +1-617-625-1322
       Email: info@gwinst.com


___________________________________________________________________________

                      Q1.10: Speech Research Sites

   Rather than try to list the places round the world which perform
   speech research this FAQ lists sites on the WWW where other
   comprehensive lists are maintained. Try the following:

    Shikano's WWW site on Speech and Acoustics
          http://www.aist-nara.ac.jp/IS/Shikano-lab/database/internet-res
          ource/e-www-site.html
          Lists of speech research sites by country. Currently includes
          around 100 sites. The list of Japanese sites is particularly
          comprehensive.

    Mambo Speech Research List
          http://mambo.ucsc.edu/psl/speech.html
          Lists about 50 speech research sites and related information
          sources. Very nice presentation!

    ESCA: European Speech Communication Association
          http://ophale.icp.grenet.fr/esca/labos.html
          Links to around 15 European speech research sites and around 15
          related sources of information.

    Institute for Perception Research: Speech on the Web
          http://www.tue.nl/ipo/hearing/webspeak.htm
          Jan Roelof de Pijper at the Institute for Perception Research
          has a long list of research sites plus links to lots of other
          speech material on the WWW.

    Russ Wilcox's list of Commercial Speech Recognition
          http://www.tiac.net/users/rwilcox/speech.html
          Links to information on speech technology vendors, speech
          research labs, speech resources, on-line demos and more.

    Speech Groups List: Leeds University Cognitive Psychology
          Research Group
          http://lethe.leeds.ac.uk/research/cogn/speechlab/other.html
          List of about 25 research sites.

    Institute of Phonetic Sciences, Amsterdam
          http://fonsg3.let.uva.nl/Other_pages.html#Phonetics
          Good list of European sites.

    Speech and Hearing Research Group, University of Sheffield,
          UK
          http://www.dcs.shef.ac.uk/research/groups/spandh/world/misclink
          s.html
          Links to sites in the UK, USA, Europe and the rest of the
          world.

    Duncan M. Forrest's Speech Recognition Resource List
          http://www.skye.co.za/dmf/speech/

   Most speech research sites have links to other speech research sites
   somewhere in their WWW pages.


___________________________________________________________________________

              Q1.11: Miscellaneous Software and Resources.

   Speech Interface Standards: APIs etc

          * ASAPI: Advanced Speech API (AT&T)
          * SAPI: Microsoft Windows Speech API
          * SRAPI: Speech Recognition API
          * TAPI: Microsoft Windows Telephony API

   Network "Phone" Software

          * CUSeeMe
          * CyberPhone
          * DigiPhone
          * InterFACE from Hijinx
          * FAQ: How can I use the Internet as a telephone?
          * Nautilus: Secure Computer Telephony
          * NEVOT (1.4v) from AT&T BL
          * PGPfone
          * Speak Freely
          * Internet Phone from VocalTec
          * WebPhone
          * WebTalk

   Audio Processing Software

          * AF version AF3R1
          * Voice E-Mail from Bonzi Software
          * MicNotePad Recording Software for Macs
          * MixViews
          * Network Audio System Release 1.1
          * NIST Software - SPHERE and SCORE
          * Sound Processing Kit
          * TCPplay

   Human Audio Perception

   Other useful information on Auditory Modeling can be found in

   Malcolm Slaney's home page
          http://www.interval.com/~malcolm/

   Martin Cooke's home page
          Speech and Hearing Research Group, Dept of Computer Science,
          University of Sheffield, UK.
          http://www.dcs.shef.ac.uk/~martin/

          * Auditory Modeller 1
          * Auditory Modeller 2
          * Auditory Toolbox for Matlab
          * Human Audio Perception Document

   Dictionaries and other Lexical Tools

          * BEEP dictionary
          * CMU dictionary
          * CUVOLAD dictionary (Oxford Dictionary)
          * Comprehensive Word List
          * EAT: Edinburgh Associative Thesaurus
          * Homophone List
          * Moby Lexical Resources
          * MRC Psycholinguistic Database
          * WordNet
          * Dictionaries on the WWW

   Phonetic Fonts and Phonetic Samples

          * International Phonetic Alphabet
          * WWW: Phonetic Fonts and Examples Online
          * Summer Institute of Linguistics IPA Fonts
          * Phonetic Fonts for TeX and LaTeX
          * Yamada Language Center

   Subjective Evaluation of Speech Quality

   Dynastat, Inc.
          Speech Intelligibility Testing with Diagnostic Rhyme Test
          (DRT), Modified Rhyme Test (MRT), Phonetically Balanced Word
          Lists (PB), Diagnostic Medial Consonant Test (DMCT), Diagnostic
          Alliteration Test (DALT), ICAO Spelling Alphabet Test (SpAT)
          Speech Quality (Acceptability) Evaluation with Diagnostic
          Acceptability Measure (DAM), Mean Opinion Score (MOS),
          Degredation Mean Opinion Score (DMOS)
          Contact: Dynastat, Inc.
          2704 Rio Grande, Suite 4, Austin, TX 78705, USA
          Ph: +1-512-476-4797, Fax: 512/472-2883
          Email: sharpley@dynastat.com
          WWW: http://www.bga.com/dynastat/

   ANSI S3.2-1989: American National Standard for Measuring the
          Intelligibility of Speech Over Connunication Systems
          Available from American National Standards Institute (ANSI)
          Ph: +1-212-642-4900, Fax: +1-212-398-0023
          WWW: http://www.ansi.org/

   Louis Pols' List of References on Synthesis Development And Assessment

          700 references:
          http://www.itl.atr.co.jp/cocosda/output/synth.refs

   Very Miscellaneous

          * The vOICe
          * The Learning Company's Language Training
          * Wildfire - an Electronic Assistant



ASAPI: Advanced Speech API (AT&T)

     * Description: The AT&T ASAPI Specification is a open,
       cross-platform, easy-to-use speech API that can support speech
       engines from AT&T and other vendors. ASAPI does not replace the
       Microsoft Speech API, but it provides extensions and enhancements
       to the Microsoft SAPI Specification including support for
       SAPI-compatible applications.
       The ASAPI Specification defines two types of interfaces. The
       "ASAPI Extensions" interface which provides extensions to the
       MS-SAPI interface as well as C++ class encapsulation of SAPI
       functionality. The "Visual ASAPI" interface provides an even
       higher-level abstraction of SAPI/ASAPI low-level functionality
       such that application developers can quickly and easily embed
       speech technology into existing or new applications. Special
       Purpose Recognizers are examples of Visual ASAPI interfaces which
       integrate lower-level functionality that an application developer
       can access via a simple interface.
     * More information: Contact Jose Garcia at AT&T on (908) 957-5457 or
       by email: jrg@att.com. For more information on the WATSON Speech
       Engine which supports ASAPI and news about ASAPI please visit the
       AT&T Advanced Speech Products Group home page or call
       1-800-5-WATSON.



SAPI: Microsoft Windows Speech API

     * Platform: Windows 95 and Windows NT 3.51
     * Description: The Microsoft Speech API provides applications with
       the ability to incoporate speech recognition (command & control or
       dictation) or text-to-speech, using either C/C++ or Visual Basic.
       SAPI follows the OLE Component Object Model (COM) architecture. It
       is supported by many major speech technology vendors. The major
       interfaces are
          + Voice Commands: high level speech recognition API for command
            and control.
          + Voice Text: simple high level text-to-speech API.
          + Speech Recognition: provides detailed control of a speech
            recognition engine for both command-and-control and
            dictation.
          + Text-to-Speech: provides detailed interface to a
            text-to-speech engine for control of playback, speaking
            style, voice quality etc.
          + Multimedia Audio Objects: audio I/O for microphones,
            headphones, speakers, telephone lines, files etc.
     * Availability: Download Microsoft's latest speech technology,
       including the Microsoft Speech SDK, command and control
       recognition, the Microsoft dictation research demonstration and
       text-to-speech.
     * More information: Email: MSSpeech@Microsoft.Com
       WWW: The Microsoft Speech API
       WWW: An Overview of the Microsoft Speech API
       Documentation included with the Microsoft SDK.
     * See also: TAPI: Microsoft Telephone API



SRAPI: Speech Recognition API

     * Platform: Various
     * Description: The SRAPI provides support for speech recognition,
       text-to-speech and other media playback. The SRAPI Committee is a
       nonprofit Utah corporation with the goal of providing solutions
       for interaction of speech technology with applications.
       Core members include: Novell, Inc., Dragon Systems, IBM, Kurzweil
       AI, Intel, and Philips Dictation Systems. Additional contributing
       members include Articulate Systems, DEC, Kolvox Communications,
       Lernout and Hauspie, Syracuse Language Systems, Voice Control
       Systems, Corel, Verbex and Voice Processing Corporation.
     * More information: WWW: http://www.srapi.com/
       Email: For more information on the SRAPI Developer CD, send email
       to srapi@srapi.com with Subject "SRAPI CD Info".



TAPI: Microsoft Windows Telephony API

     * Description: TAPI allows applications to support telephone
       communication. TAPI facilitates include:
          + Connecting directly to a telephone network.
          + Automatic phone dialing.
          + Transmission of data (files, faxes, electronic mail).
          + Access to data (news, information services).
          + Conference calling.
          + Voice mail.
          + Caller identification.
          + Control of a remote computer.
          + Collaborative computing over telephone lines.
       Windows 95 comes with a telephony application, DIALER.EXE, that
       can dial voice calls, act as a proxy for applications making
       simple telephony requests, and maintain a call log.
     * More information: The Win32 Software Development Kit (SDK)
       contains documentation, tools, and sample code for TAPI including
       the Microsoft Telephony Programmer's Reference and the Microsoft
       Telephony Service Provider Interface (TSPI) for Telephony.
       WWW: Tapping in TAPI, TAPI White Paper
     * See also: SAPI: Microsoft Speech API



CUSeeMe

     * Platform: Macintosh and Windows
     * Description: Cornell University software for audio and video
       conferencing over the Internet.
     * Requirments: Macintosh to RECEIVE video:
          + Macintosh platform with a 68020 processor or higher
          + System 7 or higher operating system
          + Minimum 16-level-grayscale (e.g. color)
          + IP network connection and MacTCP
          + Apple's QuickTime, to receive slides with SlideWindow
       Macintosh to SEND video:
          + All the above plus
          + Quicktime installed
          + video digitizer (with vdig software) and Camera
       For Windows:
          + Video receive only 386SX, Video send & receive 386DX, Video
            receive w/Audio 486SX, Video send & receive w/Audio 486DX
          + Windows 3.1 or higher running in Enhanced Mode.
          + Winsock
          + 256 color (8 bit) video driver
          + Video camera and a video capture board that supports
            Microsoft Video For Windows
          + For audio: Windows Sound board that conforms to the Windows
            MultiMedia Specification, speakers and a microphone
     * Availability: Mac: http://cu-seeme.cornell.edu/get_cuseeme.html
       Windows: http://cu-seeme.cornell.edu/PC.CU-SeeMeCurrent.html
     * More information: http://cu-seeme.cornell.edu/



CyberPhone

     * Platform: Sun Workstations running Solaris 2.x (SunOS 5.x)
     * Description: Provides voice communications over the internet. Has
       a graphical user interface and requires no additional hardware. An
       optional centralized server system is available to make finding
       and connecting to other users easier.
     * Availability: a free demonstration is available by anonymous ftp

                ftp://magenta.com/pub/

     * Contact: Email: cyberphone@magenta.com. More information is
       available on the WWW: http://magenta.com/cyberphone/.



DigiPhone

     * Platform: Macintosh, Windows 3.1 and Windows 95
     * Description: DigiPhone provides two-way phone conversations by
       dialing direct and over the Internet. Includes encryption for
       privacy, caller ID, call screening, call timer, adjustable sound
       and compression quality, messaging, and access to the Global
       Directory providing a database of DigiPhone users.
          + DigiPhone v1.03: provides the standard features listed above.
            [ More information].
          + DigiPhone Deluxe: provides the standard features of DigiPhone
            v1.03 and adds conference calling, mute, speed dial, call
            recording and playback, voice effects, customizations, and
            internet tools. [ More information].
          + DigiPhone for Mac: provides the standard features listed
            above, plus cross-platform compatibility and mute. [ More
            information].
     * Requirements: DigiPhone v1.03 requires 386DX/33 or faster, 4MB
       RAM, 9,600 bps modem, Sound Blaster 16 card (or any compatible
       half or full duplex card), and a local internet connection with
       SLIP or PPP. [Recommend 486DX/33 and 14,400 bps modem]
       DigiPhone Deluxe has the same requirements on v1.03 but requires
       486DX/33 or faster.
       DigiPhone for Mac requires a 68030 33Mhz, 68040 25Mhz or Power PC,
       4 MB RAM, System 7.x, 14,400 bps modem or better, Sound Manager
       3.x for System 7, microphone and speakers, MacTCP or Open
       Transport and a local internet connection with SLIP or PPP.
     * Price and Availability: Contact Third Planet Publishing for
       pricing. Trial software is available from Third Planet Publishing.
       Orders and Upgrades can be made on the Web. Also available through
       many retailers.
     * Contact: Third Planet Publishing, Inc.
       17770 Preston Rd, Dallas, Texas 75252, USA
       Ph: +1-972-733-3005, Fax: +1-972-380-8712
       Email: 3pp@planeteers.com
       WWW: http://www.planeteers.com/



InterFACE from Hijinx

     * Platform: Windows
     * Description: InterFACE provides voice communication on the
       Internet through IRC (Internet Relay Chat) services.
     * Requirments: Recommend a 486DX, 8meg Ram, Windows, VGA Monitor and
       a 16 bit sound card.
     * Availability: Available on CD Only for $60.00 US, which includes,
       postage and handling.
       Demo versions available from the HiJiNX WWW site.
     * Contact: HiJiNX, Brisbane, Australia
       Email: jester@hijinx.com.au
       WWW: http://www.hijinx.com.au/



FAQ: How can I use the Internet as a telephone?

     * Description: Kevin M. Savetz and Andrew Sears have prepared an FAQ
       document titled _FAQ: How can I use the Internet as a telephone?_
       The current document has the following sections:
          + Can I use the Internet as a telephone?
          + What do I need to call others on the Internet?
          + How does it work?
          + How do I make calls using a modem?
          + Is the sound quality as good as a regular telephone?
          + Is there a noticeable delay in hearing the other user?
          + What is the difference between full duplex and half duplex?
          + What is multicasting?
          + Can I talk to users of other phone software?
          + What software is available?
       The section on available software covers the following:
          + Mac: Maven, NetPhone, CU-Seeme, PGPfone
          + Windows: Speak Freely, CU-Seeme, Internet Phone, Digiphone,
            Internet Voice Chat, Internet Global Phone, Web Phone
          + UNIX: Speak Freely, nevot, vat, mtalk, ztalk
     * Availability:

        By Email
                Mail voice-faq-request@northcoast.com
                with "Subject: archive"
                and "Body: send voice-faq"

        FTP
                ftp://rtfm.mit.edu/pub/usenet/alt.internet.services/FAQ:_
                How_can_I_use_the_Internet_as_a_telephone?

        WWW:
                http://rpcp.mit.edu/~asears/voice-faq.html

     * Contact: Andrew Sears: asears@mit.edu
       Kevin Savetz: savetz@northcoast.com



Nautilus: Secure Computer Telephony

     * Platform: DOS, Linux, SunOS, Solaris.
     * Description: Nautilus is software which allows two users to hold a
       secure conversation with either over ordinary phone lines or over
       a computer network. Nautilus uses your computer's audio hardware
       to digitize and play back your speech using speech compression
       algorithms built into the program. It encrypts the compressed
       speech using your choice of the Blowfish, Triple DES, or IDEA
       block ciphers, and transmits the encrypted packets over the
       internet or your modem to another computer. At the other end, the
       process is reversed. Nautilus operates in half duplex mode like a
       speakerphone -- only one person can talk at a time. Either user
       can hit a key to switch between talking and listening. Audio
       quality ranges from fair to very good depending on which of the
       four speech coders is selected. The Nautilus WWW page provides
       more detailed information.
     * Requirements: Nautilus runs on IBM PC-compatible computers
       (386DX25 or faster) under MSDOS or Linux as well as audio-capable
       Sun workstations running SunOS or Solaris. The MSDOS version of
       Nautilus requires a Soundblaster compatible sound card and
       currently only runs over ordinary phone lines with a modem. To use
       Nautilus over ordinary telephone lines, a modem capable of
       connecting at 4800 bps or faster is required.
     * Availability: Nautilus is available in three different formats. As
       a DOS executable, it is available as an archive in zip format
       along with it's associated documentation. In source format, it is
       available as either a zip-ed archive, or a gzip-compressed tar
       archive.
       Nautilus is distributed freely (subject to US export restrictions)
       with full source code. This insures that its security can be
       independently examined and verified. Follow the instructions in
       the following README files to obtain Nautilus.
          + ftp://ftp.csn.org/mpj/README
          + ftp://ripem.msu.edu/pub/crypt/
     * More information: WWW: http://www.lila.com/nautilus/
     * Contacts: The Nautilus development team includes Bill Dorsey, Paul
       Rubin, Andy Fingerhut, Paul Kronenwetter, Bill Soley, and Pat
       Mullarky. To contact the developers, send email to
       nautilus@lila.com.



NEVOT (1.4v) from AT&T BL

     * Platforms: Sun Sparc Station (SunOS 4.1.x) and Silicon Graphics
     * Description: Audio-conferencing tool which supports both
       point-to-point and broadcasting of audio using multicast IP. Audio
       encoding:
          + PCM 64kb/s 8-bits u-law encoded 8KHz PCM (G.711)
          + ADPCM 32 kb/s [Sun only] (G.721)
          + DVI ADPCM 32 kb/s
          + ADPCM 24 kb/s [Sun only] (G.723)
          + CELP 4.8 kb/s
          + LPC 2.4 kb/s
     * Availability: by anonymous ftp from

                 ftp://gaia.cs.umass.edu/pub/hgschulz/nevot

     * Contact: Henning Schulzrinne (hgs@researh.att.com)



PGPfone

     * Platform: Macintosh and Windows
     * Description: Pretty Good Privacy Phone is free secure audio
       connection software for the internet. It uses speech compression
       and strong cryptography protocols to give you the ability to have
       a real-time secure telephone conversation via a modem-to-modem
       connection.
     * Requirements (Mac): Fast modem: at least 14.4 Kbps V.32bis (28.8
       Kbps V.34 recommended). An Apple Macintosh with at least a 25MHz
       68LC040 processor (PowerPC recommended), running System 7.1 or
       above, Thread Manager 2.0.1, ThreadsLib 2.1.2, and Sound Manager
       3.0. (These are available from Apple's FTP sites.)
     * Requirements (Windows): Fast modem: at least 14.4 Kbps V.32bis
       (28.8 Kbps V.34 recommended). A multimedia PC running Windows 95
       or NT, with at least a 66 MHz 486 CPU (Pentium recommended), sound
       card, microphone, and speakers or headphones.
     * Contact: Jeffrey I. Schiller
       Email: jis@mit.edu
       WWW: http://web.mit.edu/network/pgpfone/



Speak Freely

     * Platform: Windows and Unix
     * Description: Free "Internet Phone" software supporting voice mail,
       multicasting, encryption and several coding methods. Includes 4
       forms of data compression and encryption with DES, IDEA and PGP.
       The Windows and Unix versions are compatible. You can designate a
       bitmap file to be sent to users who connect so they can see who
       they're talking to. The Unix version does not have the graphical
       user interface of the Windows edition, but supports all its
       compression and encryption modes.
     * More information:
       http://www.fourmilab.ch/netfone/windows/speak_freely.html



Internet Phone from VocalTec

     * Platforms: IBM Compatible
     * Description: Supports real-time conversations with Internet users
       by compressing speech. Voice-activation feature and interactive
       display. Features an graphical interface and on-line help. Up to
       date listing of all on-line users running Internet Phone. Join or
       create topics for conversation with people from all over the
       globe. Supports private topics for private conversations with
       family or with business associates.
     * Requirements: 486SX PC - 25 MHZ, 8MB RAM (recommended)
       An Internet Winsock 1.1 compatible TCP\IP connection (minimum
       connection: a 14,400 baud modem SLIP\PPP connection)
       Windows 3.1
       Windows-compatible sound card
     * Cost: $US59 + shipping. You can order on the internet:
       http://www.vocaltec.com/order.html
     * More Information: WWW: http://www.vocaltec.com/
     * Availability:

                Demo version: 
                ftp://ftp.vocaltec.com/pub/

     * Contact: VocalTec Inc.

    157 Veterans Drive, Northvale, NJ 07647
    Tel: 201-768-9400 Fax: 201-768-8893
    E-mail: info@vocaltec.com



WebPhone

     * Platform: Windows
     * Description: WebPhone provides telephone quality, real-time, full
       duplex, encrypted, point-to-point voice communication over the
       Internet and other TCP/IP based networks. (More detail provided on
       the NetSpeak WWW pages).
     * Requirements: 80486DX-33 MHz running Windows 3.1 or higher, 4 MB
       of RAM, MCI compliant sound card, Winsock 1.1 compliant stack,
       14.4Kbps modem, VGA card capable of displaying 256 colors. Full
       duplex audio card required for full duplex.
     * Price: $49.95 (US)
     * Availability: via the WWW: http://www.netspeak.com/getphone.html
     * Contact: NetSpeak Corporation
       902 Clint Moore Rd., Boca Raton, Fl. 33487, USA
       Ph: +1-407-997-4001, Fax: +1-407-997-2401
       Email: info@netspeak.com
       WWW: http://www.netspeak.com/



WebTalk

     * Platform: Windows 3.1/95
     * Description: Full-duplex or half duplex, telephone-quality voice,
       supports many commercial web browsers.
     * Contact: Quarterdeck Corporation
       13160 Mindanao Way, 3rd Floor, Marina Del Rey, CA 90292-9705, USA
       Ph: +1-310-309-3700, Fax: +1-310-309-4217
       Email: info@quarterdeck.com
       WWW: http://www.quarterdeck.com/



AF version AF3R1

     * Platforms: DEC workstations (Alpha and MIPS), SparcStation, SGI
     * Description: The AF System is a device-independent
       network-transparent system including client applications and audio
       servers. With AF, multiple audio applications can run
       simultaneously, sharing access to the actual audio hardware.
       The AF3R1 distribution of AF includes server support for Digital
       RISC systems running Ultrix, Digital Alpha AXP systems running
       OSF/1, SGI Indigo running IRIX 4.0.5, Sun Microsystems
       SPARCstations running SunOS 4.1.3, and Sun Microsystems
       SPARCstations running Solaris 2.3. The servers support audio
       hardware ranging from the built-in CODEC audio on SPARCstations
       and Personal DECstations to 48 KHz stereo audio using the DECaudio
       TURBOchannel module or the SPARCstation DBRI interface
     * Availability: The source kit is distributed by anonymous ftp from

                 ftp://crl.dec.com/pub/DEC/

                WWW:
                http://www.research.digital.com/CRL/projects/AF/home.html

     * Contact: af-request@crl.dec.com



Voice E-Mail from Bonzi Software

     * Description: Voice E-Mail is an extension to regular e-mail which
       allows recorded voice messages to be transmitted in the same way
       as normal text messages. Voice E-Mail is available in several
       forms: Voice E-Mail 3.0 for WinCIM, Voice E-Mail 3.0 for America
       Online, Voice E-Mail 3.0 for Eudora, and Voice E-Mail 3.0 for
       Netscape. Voice E-Mail uses digital audio and image compression
       technology to compress messages before transferring them through
       CompuServe, America Online, and the Internet.
     * Availability: Go to the Bonzi home page - http://www.bonzi.com/ -
       and follow the links to the Internet Shopping Network's
       "Downloadable Software Division."
     * Further Information: Bonzi Software
       WWW: http://www.bonzi.com/
       Email: info@bonzi.com
       Fax 805-238-5798



MicNotePad Recording Software for Macs

     * Platforms: Macintosh
     * Description: MicNotePad is audio recording tool designed to
       improve dictation (a digital replacement for the old-style
       mechnical tape systems used by typists). It uses the built-in
       microphone or sound input port and the hard disk to record
       conversations or speech of arbitrary length. Speech compression
       techniques are used to reduce the disk-space. Once it is recorded,
       single keystrokes control playback while you type in your word
       processor.
     * Contact: Nirvana Research
       WWW: http://moof.com/nirvana/
       Email: nirvana@got.net



MixViews

     * Description: A Unix/X sound editor. Does waveform play/record, and
       cut/splice. Has various filters, handles native file formats, FFT,
       LPC and more
     * Availability: by anonymous ftp including SunOS 4 and IRIX 5
       binaries.

                 ftp://foxtrot.ccmrc.ucsb.edu/pub/



Network Audio System Release 1.1

     * Platforms: Various (includes SunOS, Solaris, SGI)
     * Description: A device-independent mechanism for transferring,
       playing and recording audio signals over a network. Has a range of
       features suited to networks.
     * Cost: Free
     * Availability: By anonymous ftp from

                ftp://ftp.x.org:/contrib/audio/nas/

       Also available in the same directory are document files and some
       sample sounds.



NIST SPeech HEader REsources Package (SPHERE)

     * Description: Standard speech header software from the National
       Institute of Standards & Technology (NIST). SPHERE headers
       represent information about sample frequency, sample format, etc.
     * Availability: By anonymous ftp from

        Readme File
                ftp://jaguar.ncsl.nist.gov/pub/sphere.README 

        Source Code
                ftp://jaguar.ncsl.nist.gov/pub/sphere_2.5.tar.Z 

NIST Speech Recognition Scoring Package (SCORE)

     * Description: Software for scoring results of speech recognition
       systems from the National Institute of Standards & Technology
       (NIST) .
     * Availability: By anonymous ftp from

        README File
                ftp://jaguar.ncsl.nist.gov/pub/score.README 

        Source Code
                ftp://jaguar.ncsl.nist.gov/pub/score_3.6.2.tar.Z 



Sound Processing Kit

     * Platforms: UNIX
     * Description: Sound Processing Kit (SPKit) is an object-oriented
       class library for audio signal processing. SPKit includes classes
       for various signal processing tasks and a way of implementing
       sound processing algorithms in a simple object-oriented manner.
       Sound Processing Kit is implemented in C++ and is designed to be
       portable. The current version requires a bare-bones C++ 2.0
       compatible compiler (templates and exceptions are not needed).
       ANSI C standard libraries are required. SPKit includes classes for
          + Sound input and output
          + Basic signal processing
          + Dynamics processing (compressor, gating etc)
          + Filtering
          + Delay and reverberation
          + Distortion
          + Signal routing
     * Availability:

        Full documentation on the WWW:
                http://www.music.helsinki.fi/research/spkit/documentation
                /SPKit.html

        Software distribution:
                http://www.music.helsinki.fi/research/spkit/distribution/
                spkit.tar.Z

     * Contact: Kai Lassfolk
       University of Helsinki Music Research Laboratory
       Email: spkit@elisir.helsinki.fi



TCPplay

     * Description: TCPPlay lets you use your mac as an audio server for
       your Unix box. Provided with source code. Written by Bill
       Stafford, Rich Tsoi and Malcolm Slaney.
     * Availability: Anonymous ftp from
       ftp://ftp.apple.com/pub/malcolm/
       ftp://worldserver.com/pub/malcolm/TcpPlay.sit.hqx



Auditory Modeller 1

     * Description: John Holdsworth's implementation of a gammatone
       filter bank and Roy Patterson's spiral model, in C (with X-window
       display).
     * Availability: By anonymous ftp from

                 ftp://ftp.mrc-apu.cam.ac.uk/pub/



Auditory Modeller 2

     * Description:Lowel O'Mard's implementation of peripheral filtering,
       Ray Meddis's hair cell model and other stuff in C (as a library of
       routines).
     * Availability: By anonymous ftp from

                 ftp://suna.lut.ac.uk/public/hulpo/ 



Auditory Toolbox for Matlab

     * Description: This toolbox provides extensions to Matlab which are
       useful to people interested in auditory/cochlear modeling. [Matlab
       is described is the previous section.] This toolbox has been
       tested on both Macintosh and Unix computers. It includes the
       following major models:
          + Lyon's Passive Long Wave Cochlear Model (our conventional
            model)
          + Patterson-Holdsworth ERB Filter bank with Meddis Hair cell
          + Seneff's Auditory Model (Stages I and II)
          + MFCC (Mel-scale frequency cepstral coefficients from the ASR
            world)
          + Spectrogram
          + Correlogram generation and pitch modeling
          + Simple vowel synthesis
     * Availability: From Malcolm Slaney home page and by anonymous FTP:
       ftp://ftp.apple.com/pub/
       The following files are available:
          + AuditoryToolbox.mif.Z
          + AuditoryToolbox.psc.Z
          + AuditoryToolbox.sea.hqx
          + AuditoryToolbox.tar
          + AuditoryToolbox.tar.Z
       The ".mif.Z" file is a Unix compressed version of the FrameMaker
       documentation. The ".psc.Z" file is a Unix compressed version of
       the Postscript documentation. The ".tar" and ".tar.Z" files are
       Unix TAR archives containing all of the m-functions and C-MEX
       source code. Finally, the ".sea.hqx" file is a Macintosh
       self-extracting archive that has been encoded using BinHex. There
       is precompiled version of the three MEX function for the
       Macintosh.
     * Misc: Our lawyers ask you to remind you that there is no warranty.
       We've done some testing but we undoubtably missed things.
     * Contact: Malcolm Slaney, Interval Resarch.
       Email: malcolm@interval.com
       WWW: http://www.interval.com/~malcolm/



Human Audio Perception Document

     * Description: Document prepared by Argiris Kranidiotis on the human
       audio perception system. It lists a number of references, gives
       plenty of numbers and some equations.
     * Availability: by anonymous ftp from the comp.speech archive site

                ftp://svr-ftp.eng.cam.ac.uk/comp.speech/info/HumanAudioPe
                rception 

     * Contact: Argiris A. Kranidiotis
       University Of Athens, Informatics Department
       email: akra@zeus.di.uoa.ariadne-t.gr



BEEP dictionary

     * Description: Phonemic transcriptions of over 250,000 English
       words. (British English pronunciations)
     * Availability: By anonymous ftp:

        BEEP dictionary README file
                svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep-0.7.R
                EADME

        BEEP Dictionary (1.1M)
                svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep.tar.g
                z



CMU dictionary

     * Description: Phonemic transcriptions of 100,000 words with
       American English pronunciation.
     * Availability - WWW: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
     * Availability - ftp: By anonymous ftp from the directory

                ftp://ftp.cs.cmu.edu/project/fgdata/dict/

       with the files README, cmudict.0.2.Z, cmulex.0.1.Z, phoneset.0.1



CUVOLAD dictionary (Oxford Dictionary)

     * Description: Computer Usable Version of the Oxford Advanced
       Learner's Dictionary containing 70,000+ entries. Has British
       English pronunciations and parts of speech.
     * Availability: Anonymous ftp
       ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/
       Documentation:
       ftp://ota.ox.ac.uk/pub/ota/public/dicts/710/text710.doc



Comprehensive Word List

     * Description: A comprehensive word list which should contain most
       common American words, abbreviations, hyphenations, and even
       incorrect spellings. The word lists were compiled from a number of
       sources: commercial news services, UseNet news postings, existing
       dictionaries, name lists, company lists, UNIX man pages, project
       Gutenberg's E-texts, project Wordnet, received mailings, etc. The
       current size is 460,000 words.
     * Availability: anonymous ftp
       ftp://wocket.vantage.gte.com/pub/
       Note 1: There seems to be some sort of network problem reaching
       the server.
       Note 2: There is a README file which explains the file formats.



EAT: Edinburgh Associative Thesaurus

     * Description: A set of word association norms showing the counts of
       word association as collected from subjects.
     * Availability: Source and WWW interactive versions

        Interactive version
                Provided by Computing and Information Systems Department
                (CISD) of Rutherford Appleton Laboratory, UK
                http://www.cis.rl.ac.uk/proj/psych/eat.html

        Set of word association norms
                ftp directory. 6 MB
                http://www.cis.rl.ac.uk/proj/psych/eat/eat/



Homophone List

     * A list of homophones in General American English is available by
       anonymous FTP from the comp.speech archive site:

                ftp://svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/homo
                phones-1.01.txt



Moby Lexical Resources

     * Description: A set of lexical resources compiled by Grady Ward.
       3449 Martha Ct., Arcata, CA 95521-4884, USA
       Email: grady@netcom.com OR grady@northcoast.com
     * Availability: Mirrored by Malcolm Crawford
       (m.crawford@dcs.shef.ac.uk) at the Institute for Language Speech
       and Hearing, the University of Sheffield.
       WWW: http://www.dcs.shef.ac.uk/research/ilash/Moby/
       FTP: ftp://ftp.dcs.shef.ac.uk/share/ilash/Moby/
     * Contents:

        Moby Hyphenator: mhyph.tar.Z
                185,000 entries fully hyphenated. 980kB.

        Moby Language: mlang.tar.Z
                Word lists in five major languages. 2.3MB.

        Moby Part-of-Speech: mpos.tar.Z
                230,000 entries with part(s) of speech listed in priority
                order. 1.2MB.

        Moby Pronunciator: mpron.tar.Z
                175,000 entries fully International Phonetic Alphabet
                coded. 3.1MB.

        Moby Shakespeare: mshak.tar.Z
                The complete unabridged works of Shakespeare. 2.3.MB.

        Moby Thesaurus: mthes.tar.Z
                30,000 root words, 2.5 million synonyms and related
                words. 12MB.

        Moby Words: mwords.tar.Z
                610,000+ words and phrases. 4.0MB.



MRC Psycholinguistic Database

     * Description: A machine usable dictionary containing over 150000
       words with up to 26 linguistic and psycholinguistic attributes for
       each (e.g. pronunciation, part of speech, word frequency).
       Psycholinguistic Database was the basis for the "Oxford
       Psycholinguistic Database" available for Apple Macs from Oxford
       University Press.
     * Availability: Several versions with different formats:

        Interactive Version of MRC Psycholinguistic Database
                Produces lists of words meeting user-definable selection
                criteria. Provided by the Dept. of Psychology, University
                of Western Australia.
                http://www.psy.uwa.edu.au/uwa_mrc.htm

        ftp'able MRC Psycholinguistic Database
                Approximately 12M of data.
                ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/
                README:
                ftp://ota.ox.ac.uk/pub/ota/public/dicts/1054/readme.
                Information: ftp://ota.ox.ac.uk/pub/ota/public/dicts/info



WordNet

     * Description: WordNet is an on-line lexical reference system in
       which English nouns, verbs, adjectives and adverbs are organized
       into synonym sets, each representing one underlying lexical
       concept. Different relations link the synonym sets.
       WordNet was developed in the Cognitive Science Laboratory at
       Princeton University under the direction of Professor George
       Miller.
     * Availability:

        WWW Interface
                http://www.cogsci.princeton.edu/~wn/w3wn.html

        Source Distributions
                Unix (9.1MB), PC (5.8MB), Macintosh (7.5MB), Prolog
                (database only, 4.2MB).
                ftp://clarity.princeton.edu/pub/wordnet/

       Extended interfaces developed by WordNet users (for X, Lisp etc)
       are listed in the WordNet home page.
     * Further information: Email: wordnet@princeton.edu
       WWW: WordNet home page: http://www.cogsci.princeton.edu/~wn/
       README: ftp://clarity.princeton.edu/pub/wordnet/
       Publications: ftp://clarity.princeton.edu/pub/wordnet/



Dictionaries on the WWW

   For a while, there was a range of dictionaries and other lexical
   resources on the WWW and elsewhere on the Internet. However, due to
   copyright reasons, fewer sites are publishing dictionary information.
   When last checked, the following sites provide dictionaries or links
   to dictionaries on the net:

   CMU Dictionary
          http://www.speech.cs.cmu.edu/cgi-bin/cmudict

   Institute of Phonetic Sciences, Amsterdam
          Electronic dictionaries, including French, Norwegian Swahili
          and English.
          http://fonsg3.let.uva.nl/Other_pages.html

   1913 Webster's Revised Unabridged Dictionary
          Available as a searchable HTML form at the University of
          Chicago ARTFL project site, and as a tagged working file and
          downloadable version (45MB) of the HTML at Project Gutenberg.

   Martin Ramsch's Englisch-Worterbucher aller Art
          Lists of on-line dictionaries, translation dictionaries,
          technical dictionaries, etc.
          http://www.uni-passau.de/forwiss/mitarbeiter/freie/ramsch/engli
          sch.html

   Galaxy's list of dictionaries etc.
          A comprehensive list of dictionaries, acronym lists,
          translation resources, and a Thesaurus.
          http://galaxy.einet.net/galaxy/Reference-and-Interdisciplinary-
          Information/Dictionaries-etc.html

   Webster's dictionary online
          http://c.gp.cs.cmu.edu:5103/prog/webster



International Phonetic Alphabet

     * Description: The International Phonetic Association
       (http://www.arts.gla.ac.uk/IPA/ipa.html) defines the International
       Phonetic Alphabet. It is a standard set of symbols for
       transcribing the sounds of spoken languages. The full chart of IPA
       symbols is published on the International Phonetic Association WWW
       site. Also provided are charts for consonants, vowels, tones and
       accents, suprasegmentals, diacritics and other symbols. A cassette
       of sounds is available: see
       http://www.phon.ucl.ac.uk/home/wells/cassette.htm



WWW: Phonetic Fonts and Examples Online

    George L. Dillon's list of phonetic resources
          [http://weber.u.washington.edu/~dillon/PhonResources.html]

         Vowel sounds of American English
                Examples of standard American vowels along with the IPA
                phonetic symbols and links to recordings.
                http://weber.u.washington.edu/~dillon/vowels.html

         Consonant sounds of English
                Examples of consonants along with the IPA phonetic
                symbols and links to recordings.
                http://weber.u.washington.edu/~dillon/consonants.html

         Vowel Quadrilaterals for American and British English
                Charts and audio.
                http://weber.u.washington.edu/~dillon/newstart.html

         IPA-ASCII
                A scheme for representing IPA transcriptions in ASCII for
                use in Usenet articles and email.
                http://weber.u.washington.edu/~dillon/ipaascii.html

    Some things about studying Speech
          Information on speech physiology, acoustic phonetics, speech
          perception, speech recognition and voice recognition.
          http://www.ccp.uchicago.edu/grad/Francis_Alex/speech.html



Summer Institute of Linguistics IPA Fonts

     * Platform: Apple Macintosh and Mircosoft Windows
     * Description: International Phonetic Alphabet (IPA) fonts are
       available as freeware from the Summer Institute of Linguistics
       (SIL). The SIL Encore IPA Fonts are a set of scalable IPA fonts
       containing the full International Phonetic Alphabet with 1990 Kiel
       revisions. Three typefaces are included: SIL Doulos (similar to
       Times), SIL Sophia (similar to Helvetica), and SIL Manuscript
       (monowidth). Each font contains all the standard IPA discrete
       characters and non-spacing diacritics as well as some
       suprasegmental and punctuation marks. Each font comes in both
       PostScript Type 1 and TrueType formats.
     * Availability: Via the WWW and Gopher:
          + WWW: http://www.sil.org/
          + Gopher:
            gopher://gopher.sil.org/11/gopher_root/computing/software/fon
            ts/
          + Ftp for Windows: ftp://ftp.sil.org/fonts/win/silip12a.exe
          + Ftp for Mac: ftp://ftp.sil.org/fonts/mac/silipa12.sea_hqx
       Also available through the SIL email server. Send either of the
       following commands to MAILSERV@sil.org.

        Windows:
                SEND/MODE=BLOCK/ENCODING=UUENCODE
                [FTP.FONTS.WIN]SILIP12A.EXE

        Mac:
                SEND [FTP.FONTS.MAC]SILIPA12.SEA_HQX

       Finally, they are available on diskette from the address below.
       $US5 to cover the cost of shipping.
     * Contact: International Academic Bookstore
       Summer Institute of Linguistics
       7500 W. Camp Wisdom Road, Dallas, TX 75236 U.S.A.
       Ph: 214-709-2404, Fax: 214-709-2433
       e-mail: academic.books@sil.org
       WWW: http://www.sil.org/



Phonetic Fonts for TeX and LaTeX

    Linguistics/Tex mailing list
          ling-tex@ifi.uio.no
          Subscription method unknown.

    TIPA
          Created by Rei Fukui: fkr@tooyoo1.l.u-tokyo.ac.jp.
          Source: ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/
          Postscript manual:
          ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/
          Compressed postscript manual:
          ftp://tooyoo.L.u-tokyo.ac.jp/pub/TeX/tipa/

    WSUIPA: Washington State University International Phonetic
          Alphabet fonts
          A basic WSUIPA font contains 128 phonetic characters and/or
          diacritics in five different point sizes (8, 9, 10, 11 and 12)
          and in three typefaces (roman, slanted and bold extended). Each
          size and typeface includes a TFM (TeX Font Metric) file and its
          related GF, PK or PXL file. A macro package and manual are
          provided. Apparently LaTeX 2.09 compatible - not LaTeX 2e
          compliant.
          Available from ftp://ftp.wustl.edu/packages/TeX/fonts/wsuipa/
          OR from CTAN-ftp-archives: e.g.
          ftp://ftp.digital.com/pub/text/TeX/fonts/wsuipa/



Yamada Language Center

     * Platform: Apple Macintosh and Mircosoft Windows
     * Description: The Yamada Language Center maintains an archive of
       fonts to assist users who wish to display or type non-English
       fonts on their computers. Their WWW and ftp sites include five
       International Phonetic Alphabet fonts (or near IPA). They also
       have fonts for over 40 languages (American Sign Language, Arabic,
       Armenian, Bengali, Burmese, Celtic, Cherokee......).
     * Availability: :

        WWW Font List
                http://babel.uoregon.edu/yamada/fonts.html

        Windows Fonts
                http://babel.uoregon.edu/yamada/winfonts.html

        IPA Fonts
                http://babel.uoregon.edu/yamada/fonts/phonetic.html

        ftp site
                ftp://yftp@www-vms.uoregon.edu/fonts/

     * Contact: Yamada Language Center, University of Oregon



The vOICe

     * Description: Peter Meijer's Java applet/application for sound
       analysis and synthesis.
          + Platform: All (where Java VM available)
          + Interactive spectrographic synthesis: draw your own sound
          + Image sonification
          + Mathematical function sonification
          + Spectrographic sound analysis (Fourier, spectrogram)
          + Vision substitution research
     * Contact: Peter Meijer



The Learning Company's Language Training

     * Platform: Windows and Macintosh
     * Description: Foreign-language training software for Spanish,
       French, German, Italian, Japanese, and English. In the Windows
       version for English, speech-recognition technology is used to help
       users improve accents.
     * Contact: The Learning Company
       Ph: (800) 852-2255
       Email: webmaster@learningco.com
       WWW: http://www.learningco.Inter.net/foreign.html



Wildfire - an Electronic Assistant

     * Platform: ?
     * Description: Wildfire is a phone-based electronic assistant.
       Functions include:
          + Screens, routes, and announces incoming calls.
          + Contact list with voicedialing.
          + Schedules and reminders for follow-up calls and action items.
          + Messaging and advanced voicemail features.
     * Contact: Wildfire Communications, Inc.
       20 Maguire Road, Lexington, MA 02173 USA
       Ph: +1-617-674-1500, Fax: 617-674-1501
       Demo line: 1-800-WILDFIRE
       Email: info@wildfire.com
       WWW: http://www.wildfire.com/


___________________________________________________________________________

   Copyright (c) 1993-6 by Andrew Hunt, all rights reserved.
   This FAQ may be posted to any USENET newsgroup, on-line service, or BBS as
   long as it is posted in its entirety and includes this copyright statement.
   This FAQ may not be distributed for financial gain.
   This FAQ may not be included in any collections or compilations
   without express permission from the author.



 ---

Andrew Hunt
Speech Applications Group
Sun Microsystems Laboratories       Ph:  (978) 442-2681
2 Elizabeth Drive, MS UCHL03-207    Fax: (978) 250-5067
Chelmsford, MA 01824, USA           Email: andrew.hunt@east.sun.com

User Contributions:

Comment about this article, ask questions, or add new information about this topic:

CAPTCHA




Part1 - Part2 - Part3

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
andrew.hunt@east.sun.com (Andrew Hunt)





Last Update March 27 2014 @ 02:11 PM