885095984

885095984



dynamie aspect, discriminating emphasized and rcduced strelches of speech. A morę global aspect of intensity must be controlled when an attempt is madę to model different speaking styles. Specifically, attempts have becn madę to model the continuum from soft to loud speech. Systematic variation in speech synthesis has been used as a tool to explore possible speaker dimensions, among them reduced and over-articulated speech. Listening experiments havc been carried out with the aim to investigate whether it is possible to describe synthesis samples according to different attitudinal and emotional dimensions.

3:15

3SP9. Improvement of synthetic speech quality through syntactic Information. Tohru Shimizu, Sciichi Yamamoto, Norio Higuchi, and Hisashi Kawai (KDD R&D Labs., 2-1-15 Ohara, Kamifukuoka-shi, Saitama 356, Japan)

Many words in Japanese have identical written expression but different pronunciation. Natural synthetic speech therefore requires selec-tion of the correct pronunciation for words and optimized prosodic features, including accent position and level, sentence intonation and length of pause, through the usc of syntactic features. This paper de-scribes (1) a new method of determining phrase accent level, based on accentual phrase boundary location and compound word structure, and (2) a ncwly proposed syntactic class of phrase boundaries. The results of the automatic determination of pronunciation, and opinion tests of intelligibility and naturalness are also described. About 10000 words are assigned to syntactic and semantic features to determine correct pronunciation, representing about 20% of the total vocabulary. Pronunciation of 99% of the words in a Japanese economic daily were correct, and naturalness of the synthetic speech was 1.1 grades higher under the five-grade opinion test.

3:30

3SP10. Source parameters for the fricative consonants AJ.ę. x/. Christine H. Shadle (Dept. of Electron, and Comput. Sci., Univ. of Southampton, Southampton S09 5NH, U.K.)

A senes of experiment$ with mechanical models of fricative conso-nant articulatory configurations have been conducted to determine where in the tract the turbulence noise is generated and the spectral characteristics of that noise. The latest models, based on a combination of x ray, EPG, and photographic data, havc the correct midsagittal profile and area function, and thus have the most realistic shape of model work to datę. Data obtained from A,// subslantiale earlier rc-sults based on a different subject [C. H. Shadle, J. Acoust. Soc. Am. Suppl. 1 84, S34 (1988); C. H. Shadle, in Speech Production and Speech Modelling, Proc. of NATO-ASI, edited by W. Hardcastlc and A. Mar-chal (Kluwer Academic, Amsterdam, 1990), pp. 127—219] and results from extremely idcalized models [C. H. Shadle, Proc. I2th ICA, paper A3-4, Toronto (1986)1- Comparisons across a rangę of flow rates, with and without sublingual cavity, betwcen measured source and far-field spectra, and betwcen speech and model data for AJ,c, x/ lead to source parameters, a distinction between two source types, and to the conclu-sion that the three-dimensional shape of the tract is crucial in determining source parameters: these parameters can be used in a model based on one-dimensional sound propagation. Three-way comparisons between far-field sound measured (1) for the models and (2) for actual utterances, and (3) far-field sound predicted from measured source parameters used in a model based on one-dimensional sound propagation, will be shown. [Work supported by SERC.]

3:45

3SP11. Reliable glottal-closure-instant (GCI) estimation from short analysis frames. Krishna S. NathanJ) and Harvey F. Silverman (LEMS, Div. Eng., Brown Univ., Providence, RI 02912)

It is well known that the first formant is maximally excited at the instant of glottal closure. Therefore. it is natural to utilize the energy in a band containing the first formant as a cue to the GCI. In practice, however, the actual GCI lies a few samples prior to where this energy signal attains a local maximum. Moreover, such an estimate makes no u$e of any period information regarding the GCI’s. Consequently, sec-ondary excitations within a period can lead to spurious GCI’s. It is therefore proposed to augment the information contained in the first formant with the linear prediction error. Although, prediction error has been widely used for pitch determination, it is not sufficient to locate the GCI reliably bccausc of ambiguitics arising from multiple peaks, espe-cially for vowels like /u/ (as in foot). Interestingly, these experiments have shown that secondary excitations tend to result in peaks in the residual error signal at locations different from those in the formant energy signal. Furthermore, in the absencc of spurious excitation, the residual error can contain valuable independent period information. Therefore, the product of these two signals yields accurate GCI esti-mates. Such an algorithm has been tested on all vowels in a variety of environments and has been found to be verv robust. Analysis frames as short as 5-10 rns have becn used. ł,Current address: IBM Research, T. J. Watson Research Center, Yorktown Heights, NY 10598.

4:00

3SPI2. Isolation and characterization of mieroevents in speech. David A. Berry and William J. Strong (Dept. Physics and Astron., Brigham Young Univ., Provo, UT 84602)

An event-synchronous techniquc has been designed in an attempt to optimize time and frequency resolution in speech analysis. The tech-nique isolates “microevents” in the speech waveform and then analyzes them, thus differing from commonly used asynchronous methods that cmploy a fixed frame length stepped forward in constant time inere-ments. A microevent (ME) is associated with a “packet of energy” in the waveform and is initiated by somc undcrlying input or fluctuation of energy. Therc are four basie types of MEs: (1) a voiced ME is initiated by a pitch pulse; (2) a plosive ME is initiated by a plosive burst; (3) a noise ME is initiated by a positive fluctuation in energy; and (4) a mixture ME. An ME is terminated at the initiation of the next ME or when the energy of the speech signal falls below the background level. ME durations are constrained to lie within a range of 2-20 ms. The current algorithm, developed and tested with portions of the 1988 DA RPA Tl MIT acoust ic-phonctic continuous speech database, isolates over 95% of the MEs correctly. Once isolated, MEs are characterized by their onc-third octave spectra. Results will be illustrated with various examples.

1893 J. Acoust. Soc. Am.. Vol. 89, No. 4, Pt. 2. April 1991

121 st Meeting: Acoustical Society of America 1893



Wyszukiwarka

Podobne podstrony:
Concurrence ofActus Reus and Mens Rea As a generał rule, the actus reus and mens rea of a crime must
141 United Kingdom Dutch method of measurlng cone resistance, while the dynamie aspect uses a "
mechanik_ mechanik Magazine publishes articles covering theoretical and practical problems of produc
Slajd60 Bielactwo - leczenie - inhibitorykalcvneurvn Fig. 322.2 Before (top) and after 6 months of o
00134 ?0fccfa985e4cfd668b181d101a42da 135 Optimization and Sensitivity Analysis Table 5. Example 1
00232 ?9b1ac6d3646bcdf4e254edd2dae625 234 Baxley Choosing menu option #5 results in an individual a
00268 ?a148167663d78e6679e602ee961c3b 270 Montgomery experiments are often used in the design and/o
00383 5e770fc406d16ac5a9608e0d12c453e 387Regret Indices and Capability Quantification of these appr

więcej podobnych podstron