in terms of attenuation is demonstrated lo bc strongly dependent upon the forcing frequency and the location and naturę of the error sensors. For example, the use of the accelerometers located close to the control actualors, while providing high local reduction, can lead to an inerease of the global response. [Work supported by NASA Langley and DARPA/ONR]
TUESDAY AFTERNOON, 30 APRIL 1991
INTERNATIONAL B, 1:00 TO 4:15 P.M.
Session 3SP
Speech Communication: Speech Processing
Astrid Schmidt-Nielsen, Chair
3SP1. Profiling vectors for speaker identification. Harry Hollien and Ming Jiang (Linguistics and Inst. for Adv. Study of the Commun. Processes, Univ. of Florida, Gainesville, FL 32611)
When speaker vcrification is the issue of interest, it is possible to focus on signal analysis irrespcctive of the speech related features it contains. Such approaches are appropriate in this case because system dislortions are minimal, noise is Iow, talkers are cooperalive, and very sophisticated equipmcnt is available. Not so for speaker identification. Here extensive channel and speaker distortions (including noise) can be expccted; speech is noncontemporary and speakers usually uncoopera-tive. Hence, the signal is so distorted or masked, the usual processing tcchniąues cannot be expected to be very useful. The approach to speaker identification demonstrated in this paper is threefold. First, it is assumed that the signal contains speech features that are robust (i.e., resistant to noise and distortion) and unique to the talker. These idio-syncracies arc based on speaker’s anatomy, physiology, and habitual communicative pattems. Second, it is postulatcd that, while there may be no single attribute within a person’s speech/voice that would permit theni to be differentiated from all other speakers under any set of eon-ditions, the simultaneous use of a large scries of fcaturc analyses may permit identification. Finally, it has beconie possible to reduce bias among the vectors by the normalization of data. In turn, this approach leads to a vcry effective two-dimensiona! profile wherein the unknown speaker musi first be identified and then comparisons madę to known talkers. A system of this lypc has becn structured and tested; it is based on four natura! speech veclors, cach containing 20-40 paranicters. Data regarding this generał approach and these vcctors have been reported prcvious!y. This presentation will focus on the effects (on efficicnt speaker identification in the field) of normalizing the vector data and reducing it to a two-dimensional profile.
1:15
3SF2. Vowel formant tracking for speaker identification. Ming Jiang and Min Shi (Inst. for Adv. Study of the Commun. Proc., Univ. of Florida, Gainesville, FL 32611)
The first two or three spectral peaks, or formants, are crucia) in delermining the vowel quality. In tum, accurate determination vowel formants quality is important to effective speaker identification task. A vowel formant tracking vector (VFT) was developed for the speaker identification (SAUSI) profile. Specifically, the speech spectrum is ob-tained frame-by-frame by using an LPC algorithm with the first three formant frequencies for each frame calculated. The underlying assump-tion was that the vowels will exhibil a contiguous formant frcquency transition from frame-to-frame and, hence, can be separated from con-sonants for the cited formant measurcments. In order to carry out this task, the frequency rangę 0-5000 Hz is divided into 34 semitone bins and three histograms are obtained for first three vowel formants. In turn, these histograms provide an estimation of generał quality of the vowels spoken by each speaker being cvaluated. The rcsult is that the interspeaker differences are large enough to permit identification of the target speaker while the intraspeaker differences are fairly smali evcn for text independent speech. The algorithm utilized will be presented as will dala demonstratmg that this VFT vector is robust enough to effectively perform the speaker identification task.
1:30
3SI*3. SAMREC0: A C30-based reference connected-word recognizer for the evaluation of speech databases. F. Capman and G. Chollet (C.N.R.S. URA 820, Telecom Paris, Dept. Signal, 46 rue Barrault, 75634 Paris Cćdex 13, France)
One of the objcctivc$ of the ESPRIT-SAM project is the elaboration of speech databases for the evaluation of recognizers. In this framework, a reference system [G. Chollet and C. Cagnoulet, “On the evaluation of speech databases using a reference system,” ICASSP, 1982], based on dynamie programming algorithm, was modified to accept connected words [G. Chollet and C. Montacie, “Evaluating speech recognizers and databases,” NATO-ASI, 1988]. This software, which is called SAMREC0 by the SAM speech input assessment group, is now imple-mented using a T I. TMS320C30-based PC-board, so that it can be used efficiently on the SAM PC-AT Workstation. Some rcsults will be pre-sented on the evaluation of the first SAM database EUROMO. This daiabase was recorded in quiet conditions and very few classification errors are observed. Work is under development to simulate noisy conditions using the same database, in order that the limits of the reference or other systems could be measured.
1:45
3SP4. Feature detection using a connectionist network. Gary Bradshaw (Dept. of Psych., Univ. of Colorado. Boulder, CO 80309) and Alan Bell (Univ. of Colorado, Boulder, CO 80309)
1891
1891