885096003

885096003



Speech timing problems Associated with dysarthria often involve the presence of periods of extraneous silence and nonspeech sounds as well as inappropriately timed or misplaced speech gest ur es. This study eval-uated the performance of neural networks in dctecting the presence of inappropriatc or nonspeech sounds and extraneous silence. The “opt" neural network program [E. Barnard and R. Cole, OGCTech. Rep. No. CSE 89-014] that uses a conjugate gradient algorithm to adjust node wcights was traincd to recognize breaths and silence in a reading of the rainbow passage by a single dysarthric (Cerebral Palsy) talker. Input to the network consisted of a sequence of frames of parameters derived from spectral analysis of the speech. The output was a binary (speech/ nonspeech) decision for the segment of signal corresponding to the middle frame of the input sequence. Networks of various size and con-figuration were traincd on half the available data and tested on the remaining data. The best network configurations correctly identified approximately 99% of the frames in the training set and about 97% of the frames in test datasets.

5SP16. Analog speech processor based on the auditory periphery.

Weimin Liu, Andreas G. Andreou, and Moise H. Goldstein, Jr. (Sensory Commun. Lab., Elec. and Comp. Eng. Dcpt., Johns Hopkins Univ., Baltimore, MD 21218)

Neurophysiological studies of the auditory periphery show that syn-chrony coding is a robust representation of speech signals. Recent re-search demonstrated that models of the auditory periphery give superior performance as the front-end signal processor of a speech recognition system, especially when speech inputs are corrupted by noise [H. M. Meng and V. W. Zue, International Conference on Spoken Language Processing, 1053-1056 (1990)]. Here, an auditory periphery model is realized on Silicon with analog integrated semiconductor technology to provide a real-time, low-power dissipation preprocessor for speech Processing tasks such as speech recognition and aids for the deaf. The model, including the middle ear, the basilar membranę and the hair celi, and synapses, is presented along with the design features of analog CMOS implementation. The paltem of the multichannel outputs resem-bles the neurogram of the auditory-nerve fibers, i.e., the time-varying instantaneous discharge rates in fibers of various characteristic frequen-cies, in responsc to speech signals such as consonant-vowel syllables [H. E. Secker-Walker and C. L. Searle, J. Acoust. Soc. Am. 88, 1427-1436 (1990)].

SSP17. The analysis of f'0 reset in relation to phrase dependency structure. Yoshinori Sagisaka (ATR Interpreting Telephony Res. Labs., Japan)

In Japanese speech, the reset of phrasal F0 downtrends has mainly becn analyzed with respcct to the number of prosodic units (Ig) be-tween the phrase preceding the boundary and the phrase it direetły qualifies. This parameter reflects the local structure of the sentence following the boundary and corresponds to the forward limit of local Feon troi. In the analysis, an additional parameter (Id) that reflects the left local structure of the sentence is introduced. This parameter is the number of units that modifies the phrase preceding the boundary. To measure the F0 reset, the ratio (/*'<y) of the averaged Fn values of the phrases preceding and following the boundary is used. Here, F0 reset at right-branching boundaries is expressed as “F0r> 1 when lg> 1.” Using these two parameters Ig, Id and the averagc F0 ratio F^ F0 resetting phenomena were analyzed quantitatively at about 2000 phrase bound-arics in 500 sentences. The results show that (1) F0f inereases in pro-portion to Ig, (2) at right-branching boundaries (/g>2), Fn, inereases in proportion to Id, and (3) at left-branching boundaries, therc is no strong corrclation between For and Id, and FQf is greater than I only at clause boundaries or when a following unit is a headlike one. Moreover, it is also observed that F0, is larger at coordinate phrase boundaries than at other boundaries. These facts support the usefulness of the new parameter Id and will contribute to the quantitative treatment of FQ con-trol for speech synthesis.

5SPI8. Use of stress in an isolated word recognizer for French. Kathleen M. Bishop (The Inst. for Res. in Cognitive Sci., Univ. of Pennsylvania, 3401 Walnut St., Stc. 420C, Philadelphia, PA 19104 and Dragon Systems, lnc., 90 Bridge St., Newton, MA 02158) and Caroline B. Huang (Dragon Systems, lnc., Newton, MA 02158)

An isolated word recognition system for French is now being devel-oped. This effort is part of an ongoing collaborative project between Dragon Systems, lnc. and Lemout and Hauspie Speechproducts n.v. to port and adapt Dragon’s large vocabulary speech recogition system for American English to five European languages: French, Spanish, German, Italian, and Dutch. In English, unstressed syllables have been observed to be quite reduced compared to stressed syllables, while unstressed syllables in French havc not been observed to be greatly reduced compared to their stressed counterparts. In the American English recognition system, performance is enhanced when the syllable stress is taken into account. Whether taking stress into account will enhance performance of the French recognizer is now being investigated. The preliminary French system recognizcs 1000 words and is speaker adapt-able. In deve!oping the system, 3000 tokens from one native speaker of French have been collected. The capabilities of the recognizer will con-tinue to expand and the coilection of data will continue to take place. Preliminary results from the acoustic analysis of the speech data and from the cxperiments with the recognizer will be discussed. (This work is supported by a joint program between Dragon Systems, lnc. and Lemout and Hauspie Speechproducts n.v.] 5SP19. Speaker-inde pendent speech recognition with word models generated from written text. E. L. Bocchieri and J. G. Wilpon (AT&T Bell Labs., Rm. 2C-543, 600 Mountain Ave., Murray Hill, NJ 07974)

Most currcnt subword-based recognition systems requirc manuał generation of the lexical transcription of vocabulary words. In addition, they generally use application specific data to train the acoustic subword models. Hence, a new database is collected for every task. This study shows that the generation of application specific word models can be completely aulomated by combining application-independent phoneti-cally based subword models according to a lexicon provided by a text-to-phone transcriber. Continuous-density hidden Markov models of a set of phonetic units have been trained with the scgmental Af-mcans parameter estimation algorithm for both the TIMIT and the DARPA Naval Resource Management speech corpora. Each new (application specific) vocabulary word is typed into a text-to-phone convertcr [M. D. Riley, Proc. ICASSP, 1991 (to appear)] that provides different tran-scriptions ranked in order of likelihood. The most likely transcription is used to combine phone models into a word model. System evaluation is performed on a 147-item vocabu!ary (e.g., digits, numerals, words, and two-word phrases) chosen independently from the training data and collected from ten speakers in an office environment (i.e., a different environment than was used to record the training data). The expcri-mental evaluation madę use of initial and finał silence models, first- and second-order energy and ccpstrum derivatives, and State duration mod-cls. Currcntly, the recognition accuracy on the entire vocabulary is 90.0% and 97.8% on a 20-word subset (of 2-4 syllable words). These accuracies are consistent across the two different training corpora.

5SP20. loudness levels of three complex stimuli and model predictions. Patricia S. Jengj) (Ctr. for Res. in Speech and Hearing Sci., City Univ. of New York, New York, NY 10036), Joseph L. Hall (Acoust. Res. Dept., AT&T Bell Labs., Murray Hill, NJ), and Harry Levitt (City Univ. of New York, New York, NY 10036)

Loudness levels were measured for three test stimuli (speech, narrow-band noise, and squarc wave) at three levels in three test con-ditions (test stimulus in quiet, test stimulus in presence of masker, and total loudness of test stimulus plus masker). Loudness levels were measured in the traditiona) way by matching loudness of the test stimulus to that of a 1-kHz tonę. In addirion, loudness levels were measured using

1937


J. Acoust. Soc. Am.. Voł. 89, No. 4, Pt. 2, April 1991


121st Meeting: Acoustical Society of America


1937




Wyszukiwarka

Podobne podstrony:
Logistyka - nauka tonnes per kilometre (in comparison with 35 euros in the case of road transport an
2. What do you think is the meaning of the headlines? The titles refer to problems associated with t
img029 (17) Staff development Find twelve words or verbs + preposition associated with human resourc
00003 ?e79c4d386fa66ff3c76b500aaac752 2 Keats & Montgomery problems. This topical grouping clos
AUC generates triplets used in the authentication of SIM card and used in the ciphering of speech, d
16962 k19 (5) Fig. 5 Fig. 6Power Generation with Dual Pressure Systems The advantages of utilising w
IB6 V Drawing the kicking leg with oblique lines brings out the sensation of speed. Weight and
CCF20100301003 ONCE MORĘ TO THE LAKĘ by E.B. White 1.    Descriptive writing often e
114 significantly with BMR at midwinter, the effect of the digestive organs on that variable was not
99 difficult to observe by SEM the presence of Zn in the form of ZnO. Zn was highly associatcd to ir
195 involved in complex reactions, sintering process results in the formation of new crystalline and
Anna Walczak Library space in the opinion of librarians, architects and users (with the examples of
THE ACTS OF KING ARTHUR AND HIS NOBLE KNIGHTSJOHN STEINBECK EDITED BY Chase Horton WITH A FOREWOKD
Janina Falkowska, M.A, PhD 2012 Invitations to the University of Lancaster, UK and St.Andrews Univer

więcej podobnych podstron