WorldWideScience

Sample records for vocoded speech generalization

  1. Toddlers' recognition of noise-vocoded speech.

    Science.gov (United States)

    Newman, Rochelle; Chatterjee, Monita

    2013-01-01

    Despite their remarkable clinical success, cochlear-implant listeners today still receive spectrally degraded information. Much research has examined normally hearing adult listeners' ability to interpret spectrally degraded signals, primarily using noise-vocoded speech to simulate cochlear implant processing. Far less research has explored infants' and toddlers' ability to interpret spectrally degraded signals, despite the fact that children in this age range are frequently implanted. This study examines 27-month-old typically developing toddlers' recognition of noise-vocoded speech in a language-guided looking study. Children saw two images on each trial and heard a voice instructing them to look at one item ("Find the cat!"). Full-spectrum sentences or their noise-vocoded versions were presented with varying numbers of spectral channels. Toddlers showed equivalent proportions of looking to the target object with full-speech and 24- or 8-channel noise-vocoded speech; they failed to look appropriately with 2-channel noise-vocoded speech and showed variable performance with 4-channel noise-vocoded speech. Despite accurate looking performance for speech with at least eight channels, children were slower to respond appropriately as the number of channels decreased. These results indicate that 2-yr-olds have developed the ability to interpret vocoded speech, even without practice, but that doing so requires additional processing. These findings have important implications for pediatric cochlear implantation.

  2. The effects of noise vocoding on speech quality perception.

    Science.gov (United States)

    Anderson, Melinda C; Arehart, Kathryn H; Kates, James M

    2014-03-01

    Speech perception depends on access to spectral and temporal acoustic cues. Temporal cues include slowly varying amplitude changes (i.e. temporal envelope, TE) and quickly varying amplitude changes associated with the center frequency of the auditory filter (i.e. temporal fine structure, TFS). This study quantifies the effects of TFS randomization through noise vocoding on the perception of speech quality by parametrically varying the amount of original TFS available above 1500Hz. The two research aims were: 1) to establish the role of TFS in quality perception, and 2) to determine if the role of TFS in quality perception differs between subjects with normal hearing and subjects with sensorineural hearing loss. Ratings were obtained from 20 subjects (10 with normal hearing and 10 with hearing loss) using an 11-point quality scale. Stimuli were processed in three different ways: 1) A 32-channel noise-excited vocoder with random envelope fluctuations in the noise carrier, 2) a 32-channel noise-excited vocoder with the noise-carrier envelope smoothed, and 3) removal of high-frequency bands. Stimuli were presented in quiet and in babble noise at 18dB and 12dB signal-to-noise ratios. TFS randomization had a measurable detrimental effect on quality ratings for speech in quiet and a smaller effect for speech in background babble. Subjects with normal hearing and subjects with sensorineural hearing loss provided similar quality ratings for noise-vocoded speech.

  3. Phase Vocoder

    Directory of Open Access Journals (Sweden)

    J.L. Flanagan

    2013-08-01

    Full Text Available A vocoder technique is described in which speech signals are represented by their short-time phase and amplitude spectra. A complete transmission system utilizing this approach is simulated on a digital computer. The encoding method leads to an economy in transmission bandwidth and to a means for time compression and expansion of speech signals.

  4. The relative importance of temporal envelope information for intelligibility prediction: a study on cochlear-implant vocoded speech.

    Science.gov (United States)

    Chen, Fei

    2011-10-01

    Vocoder simulation has been long applied as an effective tool to assess factors influencing the intelligibility of cochlear implants listeners. Considering that the temporal envelope information contained in contiguous bands of vocoded speech is correlated and redundant, this study examined the hypothesis that the intelligibility measure evaluating the distortions from a small number of selected envelope cues is sufficient to well predict the intelligibility scores. The speech intelligibility data from 80 conditions was collected from vocoder simulation experiments involving 22 normal-hearing listeners. The relative importance of temporal envelope information in cochlear-implant vocoded speech was modeled by correlating its speech-transmission indices (STIs) with the intelligibility scores. The relative importance pattern was subsequently utilized to determine a binary weight vector for STIs of all envelopes to compute the index predicting the speech intelligibility. A high correlation (r=0.95) was obtained when selecting a small number (e.g., 4 out of 20) of temporal envelope cues from disjoint bands to predict the intelligibility of cochlear-implant vocoded speech.

  5. Acoustic Context Alters Vowel Categorization in Perception of Noise-Vocoded Speech.

    Science.gov (United States)

    Stilp, Christian E

    2017-06-01

    Normal-hearing listeners' speech perception is widely influenced by spectral contrast effects (SCEs), where perception of a given sound is biased away from stable spectral properties of preceding sounds. Despite this influence, it is not clear how these contrast effects affect speech perception for cochlear implant (CI) users whose spectral resolution is notoriously poor. This knowledge is important for understanding how CIs might better encode key spectral properties of the listening environment. Here, SCEs were measured in normal-hearing listeners using noise-vocoded speech to simulate poor spectral resolution. Listeners heard a noise-vocoded sentence where low-F1 (100-400 Hz) or high-F1 (550-850 Hz) frequency regions were amplified to encourage "eh" (/ɛ/) or "ih" (/ɪ/) responses to the following target vowel, respectively. This was done by filtering with +20 dB (experiment 1a) or +5 dB gain (experiment 1b) or filtering using 100 % of the difference between spectral envelopes of /ɛ/ and /ɪ/ endpoint vowels (experiment 2a) or only 25 % of this difference (experiment 2b). SCEs influenced identification of noise-vocoded vowels in each experiment at every level of spectral resolution. In every case but one, SCE magnitudes exceeded those reported for full-spectrum speech, particularly when spectral peaks in the preceding sentence were large (+20 dB gain, 100 % of the spectral envelope difference). Even when spectral resolution was insufficient for accurate vowel recognition, SCEs were still evident. Results are suggestive of SCEs influencing CI users' speech perception as well, encouraging further investigation of CI users' sensitivity to acoustic context.

  6. Mandarin speech-in-noise and tone recognition using vocoder simulations of the temporal limits encoder for cochlear implants.

    Science.gov (United States)

    Meng, Qinglin; Zheng, Nengheng; Li, Xia

    2016-01-01

    Temporal envelope-based signal processing strategies are widely used in cochlear-implant (CI) systems. It is well recognized that the inability to convey temporal fine structure (TFS) in the stimuli limits CI users' performance, but it is still unclear how to effectively deliver the TFS. A strategy known as the temporal limits encoder (TLE), which employs an approach to derive the amplitude modulator to generate the stimuli coded in an interleaved-sampling strategy, has recently been proposed. The TLE modulator contains information related to the original temporal envelope and a slow-varying TFS from the band signal. In this paper, theoretical analyses are presented to demonstrate the superiority of TLE compared with two existing strategies, the clinically available continuous-interleaved-sampling (CIS) strategy and the experimental harmonic-single-sideband-encoder strategy. Perceptual experiments with vocoder simulations in normal-hearing listeners are conducted to compare the performance of TLE and CIS on two tasks (i.e., Mandarin speech reception in babble noise and tone recognition in quiet). The performance of the TLE modulator is mostly better than (for most tone-band vocoders) or comparable to (for noise-band vocoders) the CIS modulator on both tasks. This work implies that there is some potential for improving the representation of TFS with CIs by using a TLE strategy.

  7. Lexico-semantic and acoustic-phonetic processes in the perception of noise-vocoded speech: implications for cochlear implantation.

    Directory of Open Access Journals (Sweden)

    Carolyn eMcGettigan

    2014-02-01

    Full Text Available Noise-vocoding is a transformation which, when applied to speech, severely reduces spectral resolution and eliminates periodicity, yielding a stimulus that sounds like a harsh whisper (Scott, Blank et al. 2000. This process simulates a cochlear implant, where the activity of many thousand hair cells in the inner ear is replaced by direct stimulation of the auditory nerve by a small number of tonotopically-arranged electrodes. Although a cochlear implant offers a powerful means of restoring some degree of hearing to profoundly deaf individuals, the outcomes for spoken communication are highly variable (Moore and Shannon 2009. Some variability may arise from differences in peripheral representation (e.g. the degree of residual nerve survival but some may reflect differences in higher-order linguistic processing. In order to explore this possibility, we used noise-vocoding to explore speech recognition and perceptual learning in normal-hearing listeners tested across several levels of the linguistic hierarchy: segments (consonants and vowels, single words, and sentences. Listeners improved significantly on all tasks across two test sessions. In the first session, individual differences analyses revealed two independently varying sources of variability: one lexico-semantic in nature and implicating the recognition of words and sentences, and the other an acoustic-phonetic factor associated with words and segments. However, consequent to learning, by the second session there was a more uniform covariance pattern concerning all stimulus types. A further analysis of phonetic feature recognition allowed greater insight into learning-related changes in perception and showed that, surprisingly, participants did not make full use of cues that were preserved in the stimuli (e.g. vowel duration. We discuss these findings in relation cochlear implantation, and suggest auditory training strategies to maximise speech recognition performance in the absence of

  8. Vocoders in mobile satellite communications

    Science.gov (United States)

    Kriedte, W.; Canavesio, F.; dal Degan, N.; Pirani, G.; Rusina, F.; Usai, P.

    Owing to the power constraints that characterize onboard transmission sections, low-bit-rate coders seem suitable for speech communications inside mobile satellite systems. Vocoders that operate at rates below 4.8 kbit/s could therefore be a desirable solution for this application, providing also the redundancy that must be added to cope with the channel error rate. After reviewing the mobile-satellite-systems aspects, the paper outlines the features of two different types of vocoders that are likely to be employed, and the relevant methods of assessing their performances. Finally, some results from computer simulations of the speech transmission systems are reported.

  9. Predicting the intelligibility of vocoded and wideband Mandarin Chinese.

    Science.gov (United States)

    Chen, Fei; Loizou, Philipos C

    2011-05-01

    Due to the limited number of cochlear implantees speaking Mandarin Chinese, it is extremely difficult to evaluate new speech coding algorithms designed for tonal languages. Access to an intelligibility index that could reliably predict the intelligibility of vocoded (and non-vocoded) Mandarin Chinese is a viable solution to address this challenge. The speech-transmission index (STI) and coherence-based intelligibility measures, among others, have been examined extensively for predicting the intelligibility of English speech but have not been evaluated for vocoded or wideband (non-vocoded) Mandarin speech despite the perceptual differences between the two languages. The results indicated that the coherence-based measures seem to be influenced by the characteristics of the spoken language. The highest correlation (r = 0.91-0.97) was obtained in Mandarin Chinese with a weighted coherence measure that included primarily information from high-intensity voiced segments (e.g., vowels) containing F0 information, known to be important for lexical tone recognition. In contrast, in English, highest correlation was obtained with a coherence measure that included information from weak consonants and vowel/consonant transitions. A band-importance function was proposed that captured information about the amplitude envelope contour. A higher modulation rate (100 Hz) was found necessary for the STI-based measures for maximum correlation (r = 0.94-0.96) with vocoded Mandarin and English recognition.

  10. Audiovisual Perception of Noise Vocoded Speech in Dyslexic and Non-Dyslexic Adults: The Role of Low-Frequency Visual Modulations

    Science.gov (United States)

    Megnin-Viggars, Odette; Goswami, Usha

    2013-01-01

    Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…

  11. Audiovisual Perception of Noise Vocoded Speech in Dyslexic and Non-Dyslexic Adults: The Role of Low-Frequency Visual Modulations

    Science.gov (United States)

    Megnin-Viggars, Odette; Goswami, Usha

    2013-01-01

    Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…

  12. Phase vocoder and beyond

    Directory of Open Access Journals (Sweden)

    Marco Liuni

    2013-08-01

    Full Text Available For a broad range of sound transformations, quality is measured according to the common expectation about the result: if a male’s voice has to be changed in a female’s one, there exists a common reference for the perceptive evaluation of the result; the same holds if an instrumental sound has to be made longer, or shorter. Following the argument in Röbel, “Between Physics and Perception: Signal Models for High Level Audio Processing”, a fundamental requirement for these transformation algorithms is their need of signal models that are strongly linked to perceptually relevant physical properties of the sound source. This paper is a short survey about the phase vocoder technique, together with its extensions and improvements relying on appropriate sound models, which have led to high level audio processing algorithms.

  13. Techniques of Very Low Bit-Rate Speech Coding1

    Institute of Scientific and Technical Information of China (English)

    CUIHuijuan; TANGKun; ZHAOMing; ZHANGXin

    2004-01-01

    Techniques of very low bit-rate speech coding,such as lower than 800 bps are presented in this paper. The techniques of multi-frame, multi-sub-band, multimodel, and vector quantization etc. are effective to decrease the bit-rate of vocoders based on a linear prediction model. These techniques bring the vocoder not only high quality of the reconstructed speech, but also robustness.Vocoders which apply those techniques can synthesize clear and intelligent speech with some naturalness. The mean DRT (Diagnostic rhyme test) score of an 800 bps vocoder is 89.2% and 86.3% for a 600 bps vocoder.

  14. Low Bandwidth Vocoding using EM Sensor and Acoustic Signal Processing

    Energy Technology Data Exchange (ETDEWEB)

    Ng, L C; Holzrichter, J F; Larson, P E

    2001-10-25

    Low-power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference [1]. By combining these data with the corresponding acoustic signal, we've demonstrated an almost 10-fold bandwidth reduction in speech compression, compared to a standard 2.4 kbps LPC10 protocol used in the STU-III (Secure Terminal Unit, third generation) telephone. This paper describes a potential EM sensor/acoustic based vocoder implementation.

  15. Median-plane sound localization as a function of the number of spectral channels using a channel vocoder.

    Science.gov (United States)

    Goupell, Matthew J; Majdak, Piotr; Laback, Bernhard

    2010-02-01

    Using a vocoder, median-plane sound localization performance was measured in eight normal-hearing listeners as a function of the number of spectral channels. The channels were contiguous and logarithmically spaced in the range from 0.3 to 16 kHz. Acutely testing vocoded stimuli showed significantly worse localization compared to noises and 100 pulses click trains, both of which were tested after feedback training. However, localization for the vocoded stimuli was better than chance. A second experiment was performed using two different 12-channel spacings for the vocoded stimuli, now including feedback training. One spacing was from experiment 1. The second spacing (called the speech-localization spacing) assigned more channels to the frequency range associated with speech. There was no significant difference in localization between the two spacings. However, even with training, localizing 12-channel vocoded stimuli remained worse than localizing virtual wideband noises by 4.8 degrees in local root-mean-square error and 5.2% in quadrant error rate. Speech understanding for the speech-localization spacing was not significantly different from that for a typical spacing used by cochlear-implant users. These experiments suggest that current cochlear implants have a sufficient number of spectral channels for some vertical-plane sound localization capabilities, albeit worse than normal-hearing listeners, without loss of speech understanding.

  16. Rapid, generalized adaptation to asynchronous audiovisual speech.

    Science.gov (United States)

    Van der Burg, Erik; Goodbourn, Patrick T

    2015-04-01

    The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity.

  17. General-purpose isiZulu speech synthesiser

    CSIR Research Space (South Africa)

    Louw, A

    2005-07-01

    Full Text Available A general-purpose isiZulu text-to-speech (TTS) system was developed, based on the “Multisyn” unit-selection approach supported by the Festival TTS toolkit. The development involved a number of challenges related to the interface between speech...

  18. Arbitrary Phase Vocoders by means of Warping

    OpenAIRE

    Gianpaolo Evangelista; Monika Dörfler; Ewa Matusiak

    2013-01-01

    The Phase Vocoder plays a central role in sound analysis and synthesis, allowing us to represent a sound signal in both time and frequency, similar to a music score – but possibly at much finer time and frequency scales – describing the evolution of sound events. According to the uncertainty principle, time and frequency are not independent variables so that any time-frequency representation is the result of a compromise between time and frequency resolutions, the product of which cannot be s...

  19. Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension

    Science.gov (United States)

    Drijvers, Linda; Ozyurek, Asli

    2017-01-01

    Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…

  20. Comparing sound localization deficits in bilateral cochlear-implant users and vocoder simulations with normal-hearing listeners.

    Science.gov (United States)

    Jones, Heath; Kan, Alan; Litovsky, Ruth Y

    2014-11-10

    Bilateral cochlear-implant (BiCI) users are less accurate at localizing free-field (FF) sound sources than normal-hearing (NH) listeners. This performance gap is not well understood but is likely due to a combination of compromises in acoustic signal representation by the two independent speech processors and neural degradation of auditory pathways associated with a patient's hearing loss. To exclusively investigate the effect of CI speech encoding on horizontal-plane sound localization, the present study measured sound localization performance in NH subjects listening to vocoder processed and nonvocoded virtual acoustic space (VAS) stimuli. Various aspects of BiCI stimulation such as independently functioning devices, variable across-ear channel selection, and pulsatile stimulation were simulated using uncorrelated noise (Nu), correlated noise (N0), or Gaussian-enveloped tone (GET) carriers during vocoder processing. Additionally, FF sound localization in BiCI users was measured in the same testing environment for comparison. Distinct response patterns across azimuthal locations were evident for both listener groups and were analyzed using a multilevel regression analysis. Simulated implant speech encoding, regardless of carrier, was detrimental to NH localization and the GET vocoder best simulated BiCI FF performance in NH listeners. Overall, the detrimental effect of vocoder processing on NH performance suggests that sound localization deficits may persist even for BiCI patients who have minimal neural degradation associated with their hearing loss and indicates that CI speech encoding plays a significant role in the sound localization deficits experienced by BiCI users.

  1. Arbitrary Phase Vocoders by means of Warping

    Directory of Open Access Journals (Sweden)

    Gianpaolo Evangelista

    2013-08-01

    Full Text Available The Phase Vocoder plays a central role in sound analysis and synthesis, allowing us to represent a sound signal in both time and frequency, similar to a music score – but possibly at much finer time and frequency scales – describing the evolution of sound events. According to the uncertainty principle, time and frequency are not independent variables so that any time-frequency representation is the result of a compromise between time and frequency resolutions, the product of which cannot be smaller than a given constant. Therefore, finer frequency resolution can only be achieved with coarser time resolution and, similarly, finer time resolution results in coarser frequency resolution.While most of the conventional methods for time-frequency representations are based on uniform time and uniform frequency resolutions, perception and physical characteristics of sound signals suggest the need for nonuniform analysis and synthesis. As the results of psycho-acoustic research show, human hearing is naturally organized in nonuniform frequency bands. On the physical side, the sounds of percussive instruments as well as piano in the low register, show partials whose frequencies are not uniformly spaced, as opposed to the uniformly spaced partial frequencies found in harmonic sounds. Moreover, the different characteristics of sound signals at the onset transients with respect to stationary segments suggest the need for nonuniform time resolution. In the effort to exploit the time-frequency resolution compromise at its best, a tight time-frequency suit should be tailored to snuggly fit the sound body.In this paper we overview flexible design methods for phase vocoders with nonuniform resolutions. The methods are based on remapping the time or the frequency axis, or both, by employing suitable functions acting as warping maps, which locally change the characteristics of the time-frequency plane. As a result, the sliding windows may have time dependent

  2. General Systems Theory: Application To The Design Of Speech Communication Courses

    Science.gov (United States)

    Tucker, Raymond K.

    1971-01-01

    General systems theory can be applied to problems in the teaching of speech communication courses. The author describes general systems theory as it is applied to the designing, conducting and evaluation of speech communication courses. (Author/MS)

  3. Speech Compression and Synthesis

    Science.gov (United States)

    1980-10-01

    phonological rules combined with diphone improved the algorithms used by the phonetic synthesis prog?Im for gain normalization and time... phonetic vocoder, spectral template. i0^Th^TreprtTörc"u’d1sTuV^ork for the past two years on speech compression’and synthesis. Since there was an...from Block 19: speech recognition, pnoneme recogmtion. initial design for a phonetic recognition program. We also recorded ana partially labeled a

  4. Auditory skills and brain morphology predict individual differences in adaptation to degraded speech.

    Science.gov (United States)

    Erb, Julia; Henry, Molly J; Eisner, Frank; Obleser, Jonas

    2012-07-01

    Noise-vocoded speech is a spectrally highly degraded signal, but it preserves the temporal envelope of speech. Listeners vary considerably in their ability to adapt to this degraded speech signal. Here, we hypothesised that individual differences in adaptation to vocoded speech should be predictable by non-speech auditory, cognitive, and neuroanatomical factors. We tested 18 normal-hearing participants in a short-term vocoded speech-learning paradigm (listening to 100 4-band-vocoded sentences). Non-speech auditory skills were assessed using amplitude modulation (AM) rate discrimination, where modulation rates were centred on the speech-relevant rate of 4 Hz. Working memory capacities were evaluated (digit span and nonword repetition), and structural MRI scans were examined for anatomical predictors of vocoded speech learning using voxel-based morphometry. Listeners who learned faster to understand degraded speech also showed smaller thresholds in the AM discrimination task. This ability to adjust to degraded speech is furthermore reflected anatomically in increased grey matter volume in an area of the left thalamus (pulvinar) that is strongly connected to the auditory and prefrontal cortices. Thus, individual non-speech auditory skills and left thalamus grey matter volume can predict how quickly a listener adapts to degraded speech. Copyright © 2012 Elsevier Ltd. All rights reserved.

  5. Neural correlates of speech anticipatory anxiety in generalized social phobia.

    Science.gov (United States)

    Lorberbaum, Jeffrey P; Kose, Samet; Johnson, Michael R; Arana, George W; Sullivan, Lindsay K; Hamner, Mark B; Ballenger, James C; Lydiard, R Bruce; Brodrick, Peter S; Bohning, Daryl E; George, Mark S

    2004-12-22

    Patients with generalized social phobia fear embarrassment in most social situations. Little is known about its functional neuroanatomy. We studied BOLD-fMRI brain activity while generalized social phobics and healthy controls anticipated making public speeches. With anticipation minus rest, 8 phobics compared to 6 controls showed greater subcortical, limbic, and lateral paralimbic activity (pons, striatum, amygdala/uncus/anterior parahippocampus, insula, temporal pole)--regions important in automatic emotional processing--and less cortical activity (dorsal anterior cingulate/prefrontal cortex)--regions important in cognitive processing. Phobics may become so anxious, they cannot think clearly or vice versa.

  6. Design and performance of an analysis-by-synthesis class of predictive speech coders

    Science.gov (United States)

    Rose, Richard C.; Barnwell, Thomas P., III

    1990-01-01

    The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.

  7. Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems

    DEFF Research Database (Denmark)

    Kolbæk, Morten; Tan, Zheng-Hua; Jensen, Jesper

    2017-01-01

    In this paper, we study aspects of single microphone speech enhancement (SE) based on deep neural networks (DNNs). Specifically, we explore the generalizability capabilities of state-of-the-art DNN-based SE systems with respect to the background noise type, the gender of the target speaker...... general. Finally, we compare how a DNN-based SE system trained to be noise type general, speaker general, and SNR general performs relative to a state-of-the-art short-time spectral amplitude minimum mean square error (STSA-MMSE) based SE algorithm. We show that DNN-based SE systems, when trained...... a state-of-the-art STSA-MMSE based SE method, when tested using a range of unseen speakers and noise types. Finally, a listening test using several DNN-based SE systems tested in unseen speaker conditions show that these systems can improve SI for some SNR and noise type configurations but degrade SI...

  8. La voce senz’anima: origine e storia del Vocoder

    Directory of Open Access Journals (Sweden)

    Paolo Zavagna

    2013-08-01

    Full Text Available A review of the cultural origins of the speaking machines is proposed. These origins are identified in the separation between soul and body, proposed by Descartes. The speaking machines, and then the vocoder, are substantially formed by a controlled part and an automatic part; in the voder, the synthetic speaker is directly controlled by a human being, conversely in the vocoder an authomatic process is implemented. Through historical examples such as the ‘automata’ by Mical, Kratzenstein, von Kempelen, Wheatstone, Helmoltz, until the invention of the Dudley’s vocoder, are described a growth and a stratification of devices and synthesizers applied to varied fields (signal representation, encrypted transmission, medical apparatus, cybernetic, music. The historical period treated ideally ends with Dolson’s article written in 1986, “The Phase Vocoder: A Tutorial”. Through musical examples is described the origin of control data handling in composition. The musical application of the phase vocoder describes the distinction between control and synthesis better than other technologies. Both control (parameter values and synthesis (instruments design are compositional problems that involve all the electroacoustic music composers.

  9. Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation

    Science.gov (United States)

    Dat, Tran Huy; Takeda, Kazuya; Itakura, Fumitada

    We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.

  10. Acquisition of a 250-word vocabulary through a tactile vocoder.

    Science.gov (United States)

    Brooks, P L; Frost, B J; Mason, J L; Chung, K

    1985-04-01

    In a previous experiment [P. L. Scilley, "Evaluation of a vibrotactile auditory prosthetic device for the profoundly deaf," unpublished Masters thesis, Queen's University, Kingston, Canada (1980)] two normal subjects learned to identify 70 and 150 words, respectively, using the Queen's Tactile Vocoder. In the present experiment, the most advanced subject continued word learning until a tactile vocabulary of 250 words was acquired. At this point randomized tests were given to obtain an indication of final performance level. From these data conditional probabilities of correct response for each stimulus word and significant confusions were obtained, which provides insight into the advantages and present limitations of the tactile vocoder.

  11. Start/End Delays of Voiced and Unvoiced Speech Signals

    Energy Technology Data Exchange (ETDEWEB)

    Herrnstein, A

    1999-09-24

    Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measured acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.

  12. Relationship between perceptual learning in speech and statistical learning in younger and older adults

    NARCIS (Netherlands)

    Neger, T.M.; Rietveld, A.C.M.; Janse, E.

    2014-01-01

    Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech re

  13. Communications by vocoder on a mobile satellite fading channel

    Science.gov (United States)

    dal Degan, N.; Perosino, F.; Rusina, F.

    The performance of a LPC vocoder system is analyzed under various bit rate and fading errors. The error generation model developed to estimate error probability, and length of error bursts and distributions is described. Two algorithms that will improve burst are proposed. The evaluation of the spectral distance measures of the voice coding system is examined. The intelligibility, quality, and acceptability of the system are assessed using the mean opinion scores method.

  14. Speech and music shape the listening brain: Evidence for shared domain-general mechanisms

    NARCIS (Netherlands)

    Asaridou, S.S.; McQueen, J.M.

    2013-01-01

    Are there bi-directional influences between speech perception and music perception? An answer to this question is essential for understanding the extent to which the speech and music that we hear are processed by domain-general auditory processes and/or by distinct neural auditory mechanisms. This r

  15. Speech and music shape the listening brain: Evidence for shared domain-general mechanisms

    NARCIS (Netherlands)

    Asaridou, S.S.; McQueen, J.M.

    2013-01-01

    Are there bi-directional influences between speech perception and music perception? An answer to this question is essential for understanding the extent to which the speech and music that we hear are processed by domain-general auditory processes and/or by distinct neural auditory mechanisms. This r

  16. Speech and music shape the listening brain: Evidence for shared domain-general mechanisms

    NARCIS (Netherlands)

    Asaridou, S.S.; McQueen, J.M.

    2013-01-01

    Are there bi-directional influences between speech perception and music perception? An answer to this question is essential for understanding the extent to which the speech and music that we hear are processed by domain-general auditory processes and/or by distinct neural auditory mechanisms. This

  17. Respiratory Dynamics and Speech Intelligibility in Speakers with Generalized Dystonia.

    Science.gov (United States)

    LaBlance, Gary R.; Rutherford, David R.

    1991-01-01

    This study compared respiratory function during quiet breathing and monologue, in six adult dystonic subjects and a control group of four neurologically intact adults. Dystonic subjects showed a faster breathing rate, less rhythmic breathing pattern, decreased lung volume, and apnea-like periods. Decreased speech intelligibility was related to…

  18. Pragmatic Analyses of President Goodluck Jonathan’s Concession Speech and General Muhammadu Buhari’s Acceptance Speech: A Comparative Appraisal

    Directory of Open Access Journals (Sweden)

    Léonard A. Koussouhon

    2016-07-01

    Full Text Available Drawing on Austin’s (1962 Speech Act Theory, this paper investigates President Goodluck Jonathan’s Concession Speech and General Muhammadu Buhari’s Acceptance Speech for the purpose of examining the impacts of context and evaluating their effects on Nigerians. The application of Speech Act Theory to these political discourses has revealed valuable findings. To mention but a few, this study has shown a high proportion of claiming assertive speech acts in Jonathan’s speech indicating thus how unity, stability and progress of Nigeria depends on Jonathan who has excellently proved this by conceding victory to his opponent Buhari. This has been confirmed by the very low proportion of these acts in Buhari’s speech. Furthermore, Jonathan’s acts of thanking, congratulating and praising indicate not only his high degree of recognition, attachment to peace and democracy but also his magnanimity whereas those of Buhari indicate his degree of recognition. Through the use of directive speech acts both Jonathan and Buhari have proved to be law abiding and peaceful. Through the use of commissive speech acts Jonathan has proved to be democratic and patriotic whereas Buhari has proved to be open, cooperative and democratic.  The thoughtful performance of the different speech acts has enabled both speakers especially Jonathan to maintain peace and stability in Nigeria.

  19. Bayesian STSA estimation using masking properties and generalized Gamma prior for speech enhancement

    Science.gov (United States)

    Parchami, Mahdi; Zhu, Wei-Ping; Champagne, Benoit; Plourde, Eric

    2015-12-01

    We consider the estimation of the speech short-time spectral amplitude (STSA) using a parametric Bayesian cost function and speech prior distribution. First, new schemes are proposed for the estimation of the cost function parameters, using an initial estimate of the speech STSA along with the noise masking feature of the human auditory system. This information is further employed to derive a new technique for the gain flooring of the STSA estimator. Next, to achieve better compliance with the noisy speech in the estimator's gain function, we take advantage of the generalized Gamma distribution in order to model the STSA prior and propose an SNR-based scheme for the estimation of its corresponding parameters. It is shown that in Bayesian STSA estimators, the exploitation of a rough STSA estimate in the parameter selection for the cost function and the speech prior leads to more efficient control on the gain function values. Performance evaluation in different noisy scenarios demonstrates the superiority of the proposed methods over the existing parametric STSA estimators in terms of the achieved noise reduction and introduced speech distortion.

  20. Audiovisual Integration in Children Listening to Spectrally Degraded Speech

    Science.gov (United States)

    Maidment, David W.; Kang, Hi Jee; Stewart, Hannah J.; Amitay, Sygal

    2015-01-01

    Purpose: The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Method: Children (n = 69) and adults (n = 15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in…

  1. Prediction of speech intelligibility based on an auditory preprocessing model

    DEFF Research Database (Denmark)

    Christiansen, Claus Forup Corlin; Pedersen, Michael Syskind; Dau, Torsten

    2010-01-01

    in noise experiment was used for training and an ideal binary mask experiment was used for evaluation. All three models were able to capture the trends in the speech in noise training data well, but the proposed model provides a better prediction of the binary mask test data, particularly when the binary...... masks degenerate to a noise vocoder....

  2. Audiovisual Integration in Children Listening to Spectrally Degraded Speech

    Science.gov (United States)

    Maidment, David W.; Kang, Hi Jee; Stewart, Hannah J.; Amitay, Sygal

    2015-01-01

    Purpose: The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Method: Children (n = 69) and adults (n = 15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in…

  3. General Auditory Processing, Speech Perception and Phonological Awareness Skills in Chinese-English Biliteracy

    Science.gov (United States)

    Chung, Kevin K. H.; McBride-Chang, Catherine; Cheung, Him; Wong, Simpson W. L.

    2013-01-01

    This study focused on the associations of general auditory processing, speech perception, phonological awareness and word reading in Cantonese-speaking children from Hong Kong learning to read both Chinese (first language [L1]) and English (second language [L2]). Children in Grades 2--4 ("N" = 133) participated and were administered…

  4. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation.

    Science.gov (United States)

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-05-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials.

  5. Overnight consolidation promotes generalization across talkers in the identification of nonnative speech sounds.

    Science.gov (United States)

    Earle, F Sayako; Myers, Emily B

    2015-01-01

    This investigation explored the generalization of phonetic learning across talkers following training on a nonnative (Hindi dental and retroflex) contrast. Participants were trained in two groups, either in the morning or in the evening. Discrimination and identification performance was assessed in the trained talker and an untrained talker three times over 24 h following training. Results suggest that overnight consolidation promotes generalization across talkers in identification, but not necessarily discrimination, of nonnative speech sounds.

  6. Speech and music shape the listening brain: evidence for shared domain-general mechanisms

    Directory of Open Access Journals (Sweden)

    Salomi S. Asaridou

    2013-06-01

    Full Text Available Are there bi-directional influences between speech perception and music perception? An answer to this question is essential for understanding the extent to which the speech and music that we hear are processed by domain-general auditory processes and/or by distinct neural auditory mechanisms. This review summarizes a large body of behavioral and neuroscientific findings which suggest that the musical experience of trained musicians does modulate speech processing, and a sparser set of data, largely on pitch processing, which suggest in addition that linguistic experience, in particular learning a tone language, modulates music processing. Although research has focused mostly on music on speech effects, we argue that both directions of influence need to be studied, and conclude that the picture which thus emerges is one of mutual interaction across domains. In particular, it is not simply that experience with spoken language has some effects on music perception, and vice versa, but that because of shared domain-general subcortical and cortical networks, experiences in both domains influence behavior in both domains.

  7. Speech and music shape the listening brain: evidence for shared domain-general mechanisms

    Science.gov (United States)

    Asaridou, Salomi S.; McQueen, James M.

    2013-01-01

    Are there bi-directional influences between speech perception and music perception? An answer to this question is essential for understanding the extent to which the speech and music that we hear are processed by domain-general auditory processes and/or by distinct neural auditory mechanisms. This review summarizes a large body of behavioral and neuroscientific findings which suggest that the musical experience of trained musicians does modulate speech processing, and a sparser set of data, largely on pitch processing, which suggest in addition that linguistic experience, in particular learning a tone language, modulates music processing. Although research has focused mostly on music on speech effects, we argue that both directions of influence need to be studied, and conclude that the picture which thus emerges is one of mutual interaction across domains. In particular, it is not simply that experience with spoken language has some effects on music perception, and vice versa, but that because of shared domain-general subcortical and cortical networks, experiences in both domains influence behavior in both domains. PMID:23761776

  8. New tests of the distal speech rate effect: examining cross-linguistic generalization.

    Science.gov (United States)

    Dilley, Laura C; Morrill, Tuuli H; Banzina, Elina

    2013-01-01

    Recent findings [Dilley and Pitt, 2010. Psych. Science. 21, 1664-1670] have shown that manipulating context speech rate in English can cause entire syllables to disappear or appear perceptually. The current studies tested two rate-based explanations of this phenomenon while attempting to replicate and extend these findings to another language, Russian. In Experiment 1, native Russian speakers listened to Russian sentences which had been subjected to rate manipulations and performed a lexical report task. Experiment 2 investigated speech rate effects in cross-language speech perception; non-native speakers of Russian of both high and low proficiency were tested on the same Russian sentences as in Experiment 1. They decided between two lexical interpretations of a critical portion of the sentence, where one choice contained more phonological material than the other (e.g., /str'na/ "side" vs. /str'na/ "country"). In both experiments, with native and non-native speakers of Russian, context speech rate and the relative duration of the critical sentence portion were found to influence the amount of phonological material perceived. The results support the generalized rate normalization hypothesis, according to which the content perceived in a spectrally ambiguous stretch of speech depends on the duration of that content relative to the surrounding speech, while showing that the findings of Dilley and Pitt (2010) extend to a variety of morphosyntactic contexts and a new language, Russian. Findings indicate that relative timing cues across an utterance can be critical to accurate lexical perception by both native and non-native speakers.

  9. New tests of the distal speech rate effect: Examining cross-linguistic generalization

    Directory of Open Access Journals (Sweden)

    Laura eDilley

    2013-12-01

    Full Text Available Recent findings [Dilley and Pitt, 2010. Psych. Science. 21, 1664-1670] have shown that manipulating context speech rate in English can cause entire syllables to disappear or appear perceptually. The current studies tested two rate-based explanations of this phenomenon while attempting to replicate and extend these findings to another language, Russian. In Experiment 1, native Russian speakers listened to Russian sentences which had been subjected to rate manipulations and performed a lexical report task. Experiment 2 investigated speech rate effects in cross-language speech perception; non-native speakers of Russian of both high and low proficiency were tested on the same Russian sentences as in Experiment 1. They decided between two lexical interpretations of a critical portion of the sentence, where one choice contained more phonological material than the other (e.g., /stərʌ'na/ side vs. /strʌ'na/ country. In both experiments, with native and non-native speakers of Russian, context speech rate and the relative duration of the critical sentence portion were found to influence the amount of phonological material perceived. The results support the generalized rate normalization hypothesis, according to which the content perceived in a spectrally ambiguous stretch of speech depends on the duration of that content relative to the surrounding speech, while showing that the findings of Dilley and Pitt (2010 extend to a variety of morphosyntactic contexts and a new language, Russian. Findings indicate that relative timing cues across an utterance can be critical to accurate lexical perception by both native and non-native speakers.

  10. Theoretical Analysis of Amounts of Musical Noise and Speech Distortion in Structure-Generalized Parametric Blind Spatial Subtraction Array

    Science.gov (United States)

    Miyazaki, Ryoichi; Saruwatari, Hiroshi; Shikano, Kiyohiro

    We propose a structure-generalized blind spatial subtraction array (BSSA), and the theoretical analysis of the amounts of musical noise and speech distortion. The structure of BSSA should be selected according to the application, i.e., a channelwise BSSA is recommended for listening but a conventional BSSA is suitable for speech recognition.

  11. Speech segmentation by statistical learning is supported by domain-general processes within working memory.

    Science.gov (United States)

    Palmer, Shekeila D; Mattys, Sven L

    2016-12-01

    The purpose of this study was to examine the extent to which working memory resources are recruited during statistical learning (SL). Participants were asked to identify novel words in an artificial speech stream where the transitional probabilities between syllables provided the only segmentation cue. Experiments 1 and 2 demonstrated that segmentation performance improved when the speech rate was slowed down, suggesting that SL is supported by some form of active processing or maintenance mechanism that operates more effectively under slower presentation rates. In Experiment 3 we investigated the nature of this mechanism by asking participants to perform a two-back task while listening to the speech stream. Half of the participants performed a two-back rhyme task designed to engage phonological processing, whereas the other half performed a comparable two-back task on un-nameable visual shapes. It was hypothesized that if SL is dependent only upon domain-specific processes (i.e., phonological rehearsal), the rhyme task should impair speech segmentation performance more than the shape task. However, the two loads were equally disruptive to learning, as they both eradicated the benefit provided by the slow rate. These results suggest that SL is supported by working-memory processes that rely on domain-general resources.

  12. Features of developmental dyspraxia in the general speech-impaired population?

    Science.gov (United States)

    McCabe, P; Rosenthal, J B; McLeod, S

    1998-01-01

    A typical clinical population with speech impairment was investigated to determine the extent of the presence of features of developmental dyspraxia and its interaction between the severity of impairment. Thirty diagnostic features of developmental dyspraxia were identified from the post-1981 literature and two scales of severity were devised. First the severity of these 30 features was measured (feature severity rating, FSR), and secondly severity of speech impairment was based on percentage of consonants correct (PCC). Using these features and severity ratings a retrospective file audit was conducted of 50 paediatric clients aged 2-8 years with impaired articulation or phonology. It was found that many characteristics regarded as diagnostic for developmental dyspraxia occur in the general speech-impaired population. The relationship between the variables was analysed, and support was found for the hypotheses that: (a) there is a relationship between the number of dyspraxic features expressed and the severity of impairment of speech production and (b) developmental dyspraxia is not characterized by severe impairment, but may occur in a range of severities from mild to severe.

  13. Contribution of envelope periodicity to release from speech-on-speech masking

    DEFF Research Database (Denmark)

    Christiansen, Claus; MacDonald, Ewen; Dau, Torsten

    2013-01-01

    Masking release (MR) is the improvement in speech intelligibility for a fluctuating interferer compared to stationary noise. Reduction in MR due to vocoder processing is usually linked to distortions in the temporal fine structure of the stimuli and a corresponding reduction in the fundamental...... frequency (F0) cues. However, it is unclear if envelope periodicity related to F0, produced by the interaction between unresolved harmonics, contributes to MR. In the present study, MR was determined from speech reception thresholds measured in the presence of stationary speech-shaped noise and a competing...

  14. Speech Enhancement Algorithm Based on MMSE Short Time Spectral Amplitude in Whispered Speech

    Institute of Scientific and Technical Information of China (English)

    Zhi-Heng Lu; Huai-Zong Shao; Tai-Liang Ju

    2009-01-01

    An improved method based on minimum mean square error-short time spectral amplitude (MMSE-STSA) is proposed to cancel background noise in whispered speech. Using the acoustic character of whispered speech, the algorithm can track the change of non-stationary background noise effectively. Compared with original MMSE-STSA algorithm and method in selectable mode Vo-coder (SMV), the improved algorithm can further suppress the residual noise for low signal-to-noise radio (SNR) and avoid the excessive suppression. Simulations show that under the non-stationary noisy environment, the proposed algorithm can not only get a better performance in enhancement, but also reduce the speech distortion.

  15. Software Simulation in GSM Environment and Hardware Implementation of Improved Multi-band Excitation Vocoder

    Institute of Scientific and Technical Information of China (English)

    1998-01-01

    The algorithm of the Improved Multi-Band Excitation (IMBE) vocoder is thoroughly studied, designed and implemented including software implementation on PC/DOS, SUN/UNIX workstation system and hardware real-time implementation on TMS320C31 DSP. In order to explore the performance of IMBE vocoder in GSM environment, a GSM radio interface software simulation platform is built and a series of tests are run on four languages (Chinese, English, German, Swedish) and different channel models (urban, hilly and rural areas) with different SNR. Finally simulation result is analyzed which is useful for the performance analysis of IMBE and the application of vocoders with bit rate of 4kbps order in GSM environment.

  16. Sleep restores loss of generalized but not rote learning of synthetic speech.

    Science.gov (United States)

    Fenn, Kimberly M; Margoliash, Daniel; Nusbaum, Howard C

    2013-09-01

    Sleep-dependent consolidation has been demonstrated for declarative and procedural memory but few theories of consolidation distinguish between rote and generalized learning, suggesting similar consolidation should occur for both. However, studies using rote and generalized learning have suggested different patterns of consolidation may occur, although different tasks have been used across studies. Here we directly compared consolidation of rote and generalized learning using a single speech identification task. Training on a large set of novel stimuli resulted in substantial generalized learning, and sleep restored performance that had degraded after 12 waking hours. Training on a small set of repeated stimuli primarily resulted in rote learning and performance also degraded after 12 waking hours but was not restored by sleep. Moreover performance was significantly worse 24-h after rote training. Our results suggest a functional dissociation between the mechanisms of consolidation for rote and generalized learning which has broad implications for memory models.

  17. Recognition of temporally interrupted and spectrally degraded sentences with additional unprocessed low-frequency speech

    NARCIS (Netherlands)

    Baskent, Deniz; Chatterjeec, Monita

    2010-01-01

    Recognition of periodically interrupted sentences (with an interruption rate of 1.5 Hz, 50% duty cycle) was investigated under conditions of spectral degradation, implemented with a noiseband vocoder, with and without additional unprocessed low-pass filtered speech (cutoff frequency 500 Hz).

  18. Recognition of temporally interrupted and spectrally degraded sentences with additional unprocessed low-frequency speech

    NARCIS (Netherlands)

    Baskent, Deniz; Chatterjeec, Monita

    2010-01-01

    Recognition of periodically interrupted sentences (with an interruption rate of 1.5 Hz, 50% duty cycle) was investigated under conditions of spectral degradation, implemented with a noiseband vocoder, with and without additional unprocessed low-pass filtered speech (cutoff frequency 500 Hz). Intelli

  19. Design and Analysis of a BLPC Vocoder with Probe Noise for Adaptive Feedback Cancellation

    DEFF Research Database (Denmark)

    Kar, Asutosh; Swamy, M.N.S.; Anand, Ankita

    2017-01-01

    The band-limited linear predictive coding (BLPC) vocoder-based adaptive feedback cancellation (AFC) removes the high-frequency bias, while the low frequency bias persists between the desired input signal and the loudspeaker signal in the estimate of the feedback path. In this paper, we present a ...

  20. Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech.

    Science.gov (United States)

    Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor

    2015-12-25

    In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one.

  1. Classifier Subset Selection for the Stacked Generalization Method Applied to Emotion Recognition in Speech

    Science.gov (United States)

    Álvarez, Aitor; Sierra, Basilio; Arruti, Andoni; López-Gil, Juan-Miguel; Garay-Vitoria, Nestor

    2015-01-01

    In this paper, a new supervised classification paradigm, called classifier subset selection for stacked generalization (CSS stacking), is presented to deal with speech emotion recognition. The new approach consists of an improvement of a bi-level multi-classifier system known as stacking generalization by means of an integration of an estimation of distribution algorithm (EDA) in the first layer to select the optimal subset from the standard base classifiers. The good performance of the proposed new paradigm was demonstrated over different configurations and datasets. First, several CSS stacking classifiers were constructed on the RekEmozio dataset, using some specific standard base classifiers and a total of 123 spectral, quality and prosodic features computed using in-house feature extraction algorithms. These initial CSS stacking classifiers were compared to other multi-classifier systems and the employed standard classifiers built on the same set of speech features. Then, new CSS stacking classifiers were built on RekEmozio using a different set of both acoustic parameters (extended version of the Geneva Minimalistic Acoustic Parameter Set (eGeMAPS)) and standard classifiers and employing the best meta-classifier of the initial experiments. The performance of these two CSS stacking classifiers was evaluated and compared. Finally, the new paradigm was tested on the well-known Berlin Emotional Speech database. We compared the performance of single, standard stacking and CSS stacking systems using the same parametrization of the second phase. All of the classifications were performed at the categorical level, including the six primary emotions plus the neutral one. PMID:26712757

  2. High Performance Speech Compression System

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Since Pulse Code Modulation emerged in 1937, digitized speec h has experienced rapid development due to its outstanding voice quali ty, reliability, robustness and security in communication. But how to reduce channel width without loss of speech quality remains a crucial problem in speech coding theory. A new full-duplex digital speech comm unication system based on the Vocoder of AMBE-1000 and microcontroller ATMEL 89C51 is introduced. It shows higher voice quality than current mobile phone system with only a quarter of channel width needed for t he latter. The prospective areas in which the system can be applied in clude satellite communication, IP Phone, virtual meeting and the most important, defence industry.

  3. Recognition of temporally interrupted and spectrally degraded sentences with additional unprocessed low-frequency speech

    Science.gov (United States)

    Başkent, Deniz; Chatterjee, Monita

    2010-01-01

    Recognition of periodically interrupted sentences (with an interruption rate of 1.5 Hz, 50% duty cycle) was investigated under conditions of spectral degradation, implemented with a noiseband vocoder, with and without additional unprocessed low-pass filtered speech (cutoff frequency 500 Hz). Intelligibility of interrupted speech decreased with increasing spectral degradation. For all spectral-degradation conditions, however, adding the unprocessed low-pass filtered speech enhanced the intelligibility. The improvement at 4 and 8 channels was higher than the improvement at 16 and 32 channels: 19% and 8%, on average, respectively. The Articulation Index predicted an improvement of 0.09, in a scale from 0 to 1. Thus, the improvement at poorest spectral-degradation conditions was larger than what would be expected from additional speech information. Therefore, the results implied that the fine temporal cues from the unprocessed low-frequency speech, such as the additional voice pitch cues, helped perceptual integration of temporally interrupted and spectrally degraded speech, especially when the spectral degradations were severe. Considering the vocoder processing as a cochlear-implant simulation, where implant users’ performance is closest to 4 and 8-channel vocoder performance, the results support additional benefit of low-frequency acoustic input in combined electric-acoustic stimulation for perception of temporally degraded speech. PMID:20817081

  4. Recognition of temporally interrupted and spectrally degraded sentences with additional unprocessed low-frequency speech.

    Science.gov (United States)

    Başkent, Deniz; Chatterjee, Monita

    2010-12-01

    Recognition of periodically interrupted sentences (with an interruption rate of 1.5 Hz, 50% duty cycle) was investigated under conditions of spectral degradation, implemented with a noiseband vocoder, with and without additional unprocessed low-pass filtered speech (cutoff frequency 500 Hz). Intelligibility of interrupted speech decreased with increasing spectral degradation. For all spectral degradation conditions, however, adding the unprocessed low-pass filtered speech enhanced the intelligibility. The improvement at 4 and 8 channels was higher than the improvement at 16 and 32 channels: 19% and 8%, on average, respectively. The Articulation Index predicted an improvement of 0.09, in a scale from 0 to 1. Thus, the improvement at poorest spectral degradation conditions was larger than what would be expected from additional speech information. Therefore, the results implied that the fine temporal cues from the unprocessed low-frequency speech, such as the additional voice pitch cues, helped perceptual integration of temporally interrupted and spectrally degraded speech, especially when the spectral degradations were severe. Considering the vocoder processing as a cochlear implant simulation, where implant users' performance is closest to 4 and 8-channel vocoder performance, the results support additional benefit of low-frequency acoustic input in combined electric-acoustic stimulation for perception of temporally degraded speech. Copyright © 2010 Elsevier B.V. All rights reserved.

  5. A general auditory bias for handling speaker variability in speech? Evidence in humans and songbirds

    NARCIS (Netherlands)

    Kriengwatana, B.; Escudero, P.; Kerkhoven, A.H.; ten Cate, C.

    2015-01-01

    Different speakers produce the same speech sound differently, yet listeners are still able to reliably identify the speech sound. How listeners can adjust their perception to compensate for speaker differences in speech, and whether these compensatory processes are unique only to humans, is still

  6. Cross-linguistic generalization in the treatment of two sequential Spanish-English bilingual children with speech sound disorders.

    Science.gov (United States)

    Gildersleeve-Neumann, Christina; Goldstein, Brian A

    2015-02-01

    The effect of bilingual service delivery on treatment of speech sound disorders (SSDs) in bilingual children is largely unknown. Bilingual children with SSDs are typically provided intervention in only one language, although research suggests dual-language instruction for language disorders is best practice for bilinguals. This study examined cross-linguistic generalization of bilingual intervention in treatment of two 5-year-old sequential bilingual boys with SSDs (one with Childhood Apraxia of Speech), hypothesizing that selecting and treating targets in both languages would result in significant overall change in their English and Spanish speech systems. A multiple baseline across behaviours design was used to measure treatment effectiveness for two targets per child. Children received treatment 2-3 times per week for 8 weeks and in Spanish for at least 2 of every 3 days. Ongoing treatment performance was measured in probes in both languages; overall speech skills were compared pre- and post-treatment. Both children's speech improved in both languages with similar magnitude; there was improvement in some non-treated errors. Treating both languages had an overall positive effect on these bilingual children's speech. Future bilingual intervention research should explore alternating treatments designs, efficiency of monolingual vs bilingual treatment, different language and bilingual backgrounds, and between-group comparisons.

  7. Individual differneces in degraded speech perception

    Science.gov (United States)

    Carbonell, Kathy M.

    One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.

  8. Speech-Language Pathologist and General Educator Collaboration: A Model for Tier 2 Service Delivery

    Science.gov (United States)

    Watson, Gina D.; Bellon-Harn, Monica L.

    2014-01-01

    Tier 2 supplemental instruction within a response to intervention framework provides a unique opportunity for developing partnerships between speech-language pathologists and classroom teachers. Speech-language pathologists may participate in Tier 2 instruction via a consultative or collaborative service delivery model depending on district needs.…

  9. Speech Perception Engages a General Timer: Evidence from a Divided Attention Word Identification Task

    Science.gov (United States)

    Casini, Laurence; Burle, Boris; Nguyen, Noel

    2009-01-01

    Time is essential to speech. The duration of speech segments plays a critical role in the perceptual identification of these segments, and therefore in that of spoken words. Here, using a French word identification task, we show that vowels are perceived as shorter when attention is divided between two tasks, as compared to a single task control…

  10. Formation of intonational party of speech in the structure of the correction letter at younger schoolboys with mild expressed general underdevelopment of speech with erased form of dysarthria

    Directory of Open Access Journals (Sweden)

    Zoya Kyrbanova

    2015-05-01

    Full Text Available The problem of reading and writing disorders in children with speech underdevelopment with dizartricheskim component. The methodology of the survey speech capabilities for this category of children. The directions of correctional and developmental work on the formation of intonational party of speech in order to prevent violations of reading and writing.

  11. From Sound Morphing to the Synthesis of Starlight. Musical experiences with the Phase Vocoder over 25 years

    Directory of Open Access Journals (Sweden)

    Trevor Wishart

    2013-08-01

    Full Text Available The article reports the author’s experiences with the phase vocoder. Starting from the first attempts during the years 1973-77 – in connection with a speculative project to morph the sounds of a speaking voice into sounds from the natural world, project subsequently developed at Ircam in Paris between 1979 and 1986 – up to the most recent experiences in 2011-12 associated with the realization of Supernova, an 8-channel sound-surround piece, where the phase vocoder data format is used as a synthesis tool.

  12. Speech/Music Classification Enhancement for 3GPP2 SMV Codec Based on Support Vector Machine

    Science.gov (United States)

    Kim, Sang-Kyun; Chang, Joon-Hyuk

    In this letter, we propose a novel approach to speech/music classification based on the support vector machine (SVM) to improve the performance of the 3GPP2 selectable mode vocoder (SMV) codec. We first analyze the features and the classification method used in real time speech/music classification algorithm in SMV, and then apply the SVM for enhanced speech/music classification. For evaluation of performance, we compare the proposed algorithm and the traditional algorithm of the SMV. The performance of the proposed system is evaluated under the various environments and shows better performance compared to the original method in the SMV.

  13. A general auditory bias for handling speaker variability in speech? Evidence in humans and songbirds

    Directory of Open Access Journals (Sweden)

    Buddhamas eKriengwatana

    2015-08-01

    Full Text Available Different speakers produce the same speech sound differently, yet listeners are still able to reliably identify the speech sound. How listeners can adjust their perception to compensate for speaker differences in speech, and whether these compensatory processes are unique only to humans, is still not fully understood. In this study we compare the ability of humans and zebra finches to categorize vowels despite speaker variation in speech in order to test the hypothesis that accommodating speaker and gender differences in isolated vowels can be achieved without prior experience with speaker-related variability. Using a behavioural Go/No-go task and identical stimuli, we compared Australian English adults’ (naïve to Dutch and zebra finches’ (naïve to human speech ability to categorize /ɪ/ and /ɛ/ vowels of an novel Dutch speaker after learning to discriminate those vowels from only one other speaker. Experiment 1 and 2 presented vowels of two speakers interspersed or blocked, respectively. Results demonstrate that categorization of vowels is possible without prior exposure to speaker-related variability in speech for zebra finches, and in non-native vowel categories for humans. Therefore, this study is the first to provide evidence for what might be a species-shared auditory bias that may supersede speaker-related information during vowel categorization. It additionally provides behavioural evidence contradicting a prior hypothesis that accommodation of speaker differences is achieved via the use of formant ratios. Therefore, investigations of alternative accounts of vowel normalization that incorporate the possibility of an auditory bias for disregarding inter-speaker variability are warranted.

  14. Improved vocabulary production after naming therapy in aphasia: can gains in picture naming generalize to connected speech?

    Science.gov (United States)

    Conroy, Paul; Sage, Karen; Ralph, Matt Lambon

    2009-01-01

    Naming accuracy for nouns and verbs in aphasia can vary across different elicitation contexts, for example, simple picture naming, composite picture description, narratives, and conversation. For some people with aphasia, naming may be more accurate to simple pictures as opposed to naming in spontaneous, connected speech; for others, the opposite pattern may be evident. These differences have, in some instances, been related to word class (for example, noun or verb) as well as aphasia subtype. Given that the aim of picture-naming therapies is to improve word-finding in general, these differences in naming accuracy across contexts may have important implications for the potential functional benefits of picture-naming therapies. This study aimed to explore single-word therapy for both nouns and verbs, and to answer the following questions. (1) To what extent does an increase in naming accuracy after picture-naming therapy (for both nouns and verbs) predict accurate naming of the same items in less constrained spontaneous connected speech tasks such as composite picture description and retelling of a narrative? (2) Does the word class targeted in therapy (verb or noun) dictate whether there is 'carry-over' of the therapy item to connected speech tasks? (3) Does the speed at which the picture is named after therapy predict whether it will also be used appropriately in connected speech tasks? Seven participants with aphasia of varying degrees of severity and subtype took part in ten therapy sessions over five weeks. A set of potentially useful items was collected from control participant accounts of the Cookie Theft Picture Description and the Cinderella Story from the Quantitative Production Analysis. Twenty-four of these words (twelve verbs and twelve nouns) were collated for each participant, on the basis that they had failed to name them in either simple picture naming or connected speech tasks (picture-supported narrative and unsupported retelling of a narrative

  15. Talker-identification training using simulations of binaurally combined electric and acoustic hearing: generalization to speech and emotion recognition.

    Science.gov (United States)

    Krull, Vidya; Luo, Xin; Iler Kirk, Karen

    2012-04-01

    Understanding speech in background noise, talker identification, and vocal emotion recognition are challenging for cochlear implant (CI) users due to poor spectral resolution and limited pitch cues with the CI. Recent studies have shown that bimodal CI users, that is, those CI users who wear a hearing aid (HA) in their non-implanted ear, receive benefit for understanding speech both in quiet and in noise. This study compared the efficacy of talker-identification training in two groups of young normal-hearing adults, listening to either acoustic simulations of unilateral CI or bimodal (CI+HA) hearing. Training resulted in improved identification of talkers for both groups with better overall performance for simulated bimodal hearing. Generalization of learning to sentence and emotion recognition also was assessed in both subject groups. Sentence recognition in quiet and in noise improved for both groups, no matter if the talkers had been heard during training or not. Generalization to improvements in emotion recognition for two unfamiliar talkers also was noted for both groups with the simulated bimodal-hearing group showing better overall emotion-recognition performance. Improvements in sentence recognition were retained a month after training in both groups. These results have potential implications for aural rehabilitation of conventional and bimodal CI users.

  16. 多用途汉语方言语音数据库的设计%Design of general-purpose Chinese dialect speech database

    Institute of Scientific and Technical Information of China (English)

    高原; 顾明亮; 孙平; 王侠; 张长水

    2012-01-01

    建立了一个多用途汉语方言语音数据库,用于说话人信息处理、方言特征词识别、语音识别等领域的研究.以多通道的方式采集时长106小时的语音数据,包括七种主要的汉语方言区语音,对数据进行预处理.在此基础上提出了汉语方言数据库的设计标准以及实施方案,有助于推动汉语语音库特别是方言语音库的建立.%This paper describes a general-purpose Chinese dialect speech database, which can be applied to speaker information analysis, character-words recognition, speech recognition etc. The speech database, which includes seven kinds of most common Chinese dialects, has reached one hundred and six hours by multi-channel record modes and has already preprocessed. Based on the work, the design criteria and implementation scheme of Chinese dialects speech database are proposed, which is useful for the establishment of Chinese speech database, especially Chinese dialect speech database.

  17. Speech Correction in the Schools.

    Science.gov (United States)

    Eisenson, Jon; Ogilvie, Mardel

    An introduction to the problems and therapeutic needs of school age children whose speech requires remedial attention, the text is intended for both the classroom teacher and the speech correctionist. General considerations include classification and incidence of speech defects, speech correction services, the teacher as a speaker, the mechanism…

  18. Commercial speech and off-label drug uses: what role for wide acceptance, general recognition and research incentives?

    Science.gov (United States)

    Gilhooley, Margaret

    2011-01-01

    This article provides an overview of how the constitutional protections for commercial speech affect the Food and Drug Administration's (FDA) regulation of drugs, and the emerging issues about the scope of these protections. A federal district court has already found that commercial speech allows manufacturers to distribute reprints of medical articles about a new off-label use of a drug as long as it contains disclosures to prevent deception and to inform readers about the lack of FDA review. This paper summarizes the current agency guidance that accepts the manufacturer's distribution of reprints with disclosures. Allergan, the maker of Botox, recently maintained in a lawsuit that the First Amendment permits drug companies to provide "truthful information" to doctors about "widely accepted" off-label uses of a drug. While the case was settled as part of a fraud and abuse case on other grounds, extending constitutional protections generally to "widely accepted" uses is not warranted, especially if it covers the use of a drug for a new purpose that needs more proof of efficacy, and that can involve substantial risks. A health law academic pointed out in an article examining a fraud and abuse case that off-label use of drugs is common, and that practitioners may lack adequate dosage information about the off-label uses. Drug companies may obtain approval of a drug for a narrow use, such as for a specific type of pain, but practitioners use the drug for similar uses based on their experience. The writer maintained that a controlled study may not be necessary to establish efficacy for an expanded use of a drug for pain. Even if this is the case, as discussed below in this paper, added safety risks may exist if the expansion covers a longer period of time and use by a wider number of patients. The protections for commercial speech should not be extended to allow manufacturers to distribute information about practitioner use with a disclosure about the lack of FDA

  19. Effect of speech degradation on top-down repair: phonemic restoration with simulations of cochlear implants and combined electric-acoustic stimulation.

    Science.gov (United States)

    Başkent, Deniz

    2012-10-01

    The brain, using expectations, linguistic knowledge, and context, can perceptually restore inaudible portions of speech. Such top-down repair is thought to enhance speech intelligibility in noisy environments. Hearing-impaired listeners with cochlear implants commonly complain about not understanding speech in noise. We hypothesized that the degradations in the bottom-up speech signals due to the implant signal processing may have a negative effect on the top-down repair mechanisms, which could partially be responsible for this complaint. To test the hypothesis, phonemic restoration of interrupted sentences was measured with young normal-hearing listeners using a noise-band vocoder simulation of implant processing. Decreasing the spectral resolution (by reducing the number of vocoder processing channels from 32 to 4) systematically degraded the speech stimuli. Supporting the hypothesis, the size of the restoration benefit varied as a function of spectral resolution. A significant benefit was observed only at the highest spectral resolution of 32 channels. With eight channels, which resembles the resolution available to most implant users, there was no significant restoration effect. Combined electric-acoustic hearing has been previously shown to provide better intelligibility of speech in adverse listening environments. In a second configuration, combined electric-acoustic hearing was simulated by adding low-pass-filtered acoustic speech to the vocoder processing. There was a slight improvement in phonemic restoration compared to the first configuration; the restoration benefit was observed at spectral resolutions of both 16 and 32 channels. However, the restoration was not observed at lower spectral resolutions (four or eight channels). Overall, the findings imply that the degradations in the bottom-up signals alone (such as occurs in cochlear implants) may reduce the top-down restoration of speech.

  20. Evaluation of psycho-social training for speech therapists in oncology. Impact on general communication skills and empathy. A qualitative pilot study.

    Science.gov (United States)

    Ullrich, Peter; Wollbrück, Dorit; Danker, Helge; Singer, Susanne

    2011-06-01

    The aim of this study was to evaluate the impact of a psychosocial training programme for speech therapists on their performance skills in patient-therapist communication in general and empathy in particular. Twenty-three speech therapists were interviewed in a pseudo-randomised controlled trial. Communication skills were tested using questionnaires with open questions. Respondents were asked to find adequate replies to clinical vignettes. The vignettes briefly described a patient's physical state and contained a statement from the patient expressing some distress. Answers were coded with qualitative content analysis. Communication skills improved considerably in terms of frequency of conducive communication (especially empathy) and width of conducive communicative repertoire. Negative communication preferences were reduced. Psychosocial training for speech therapists can improve communication skills manifestly and is therefore recommended for further use.

  1. Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training

    Science.gov (United States)

    Bernstein, Lynne E.; Auer, Edward T.; Eberhardt, Silvio P.; Jiang, Jintao

    2013-01-01

    Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called “reverse hierarchy theory” of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning. PMID:23515520

  2. Vocoder Excitation Model Based on Voicing Cut-Off Frequency%基于语音截止频率的声码器激励模型

    Institute of Scientific and Technical Information of China (English)

    徐静云; 赵晓群; 李荣芸; 王峤

    2015-01-01

    在低速率声码器中,对激励信号的描述直接影响重建语音的质量。为了改善音质,提出一种基于语音截止频率的声码器激励模型。该模型编码时通过语音截止频率将激励谱分成谐波和噪声2个子带,谐波子带的激励谱幅度引入离散余弦变换变维模型进行描述,语音截止频率进行4 bit 非线性量化。解码时将恢复出的谐波子带激励谱幅度进行傅里叶反变换,噪声子带则由白噪声进行以语音截止频率为阻带截止频率的高通滤波,最后由谐波子带和噪声子带叠加出激励。实验结果表明,该模型提高了全带激励谱幅度和谐波噪声成分的描述精度,可使重建语音的音质得以明显改善,主客观指标更优,对男声更为突出。%A vocoder excitation model based on voicing cut-off frequency(VCO) was presented. In enco-ding part, the excitation spectrum was divided into two distinct spectral bands by VCO: harmonic sub-band and noise sub-band, the model of variable dimension through discrete cosine transform was used to express the excitation spectral parameter of harmonic sub-band, and VCO was quantized through 4bits nonlinear scalar quantization. In decoding part, the recovered excitation spectral parameter of harmonic sub-band was inversely Fourier transformed, the noise sub-band was obtained by the white noise pass through a high pass filter which used the VCO as the stop-band cut-off frequency, harmonic sub-band and noise sub-band were superimposed to get the excitation. The model greatly improves the description preci-sion of the entire spectral envelope and harmonic plus noise components. With better subjective and ob-jective indicators, especially for male’s speech, the reconstructed speech shows more natural.

  3. Does obesity’s beliefs matters? Analysis of general practitioners’ speech

    OpenAIRE

    Teixeira, Filipa Valente; Ribeiro, José Luis Pais; Maia,Ângela

    2011-01-01

    Background: The health physicians’ beliefs about obesity have been considered one of the reasons compromising the success of obese people’s treatment. Quantitative research has been criticized for not being able to clarify how health physicians’ practices in the management of obesity are affected by the way they perceived obesity and obese people. Method: Semi-structured interviews about beliefs, attitudes and practices about obesity have been done to Portuguese general practitioners. Dat...

  4. Adaptive V/UV Speech Detection Based on Characterization of Background Noise

    Directory of Open Access Journals (Sweden)

    F. Beritelli

    2009-01-01

    Full Text Available The paper presents an adaptive system for Voiced/Unvoiced (V/UV speech detection in the presence of background noise. Genetic algorithms were used to select the features that offer the best V/UV detection according to the output of a background Noise Classifier (NC and a Signal-to-Noise Ratio Estimation (SNRE system. The system was implemented, and the tests performed using the TIMIT speech corpus and its phonetic classification. The results were compared with a nonadaptive classification system and the V/UV detectors adopted by two important speech coding standards: the V/UV detection system in the ETSI ES 202 212 v1.1.2 and the speech classification in the Selectable Mode Vocoder (SMV algorithm. In all cases the proposed adaptive V/UV classifier outperforms the traditional solutions giving an improvement of 25% in very noisy environments.

  5. Medium-rate speech coding simulator for mobile satellite systems

    Science.gov (United States)

    Copperi, Maurizio; Perosino, F.; Rusina, F.; Albertengo, G.; Biglieri, E.

    1986-01-01

    Channel modeling and error protection schemes for speech coding are described. A residual excited linear predictive (RELP) coder for bit rates 4.8, 7.2, and 9.6 kbit/sec is outlined. The coder at 9.6 kbit/sec incorporates a number of channel error protection techniques, such as bit interleaving, error correction codes, and parameter repetition. Results of formal subjective experiments (DRT and DAM tests) under various channel conditions, reveal that the proposed coder outperforms conventional LPC-10 vocoders by 2 subjective categories, thus confirming the suitability of the RELP coder at 9.6 kbit/sec for good quality speech transmission in mobile satellite systems.

  6. Public Speech.

    Science.gov (United States)

    Green, Thomas F.

    1994-01-01

    Discusses the importance of public speech in society, noting the power of public speech to create a world and a public. The paper offers a theory of public speech, identifies types of public speech, and types of public speech fallacies. Two ways of speaking of the public and of public life are distinguished. (SM)

  7. Visual speech form influences the speed of auditory speech processing.

    Science.gov (United States)

    Paris, Tim; Kim, Jeesun; Davis, Chris

    2013-09-01

    An important property of visual speech (movements of the lips and mouth) is that it generally begins before auditory speech. Research using brain-based paradigms has demonstrated that seeing visual speech speeds up the activation of the listener's auditory cortex but it is not clear whether these observed neural processes link to behaviour. It was hypothesized that the very early portion of visual speech (occurring before auditory speech) will allow listeners to predict the following auditory event and so facilitate the speed of speech perception. This was tested in the current behavioural experiments. Further, we tested whether the salience of the visual speech played a role in this speech facilitation effect (Experiment 1). We also determined the relative contributions that visual form (what) and temporal (when) cues made (Experiment 2). The results showed that visual speech cues facilitated response times and that this was based on form rather than temporal cues. Copyright © 2013 Elsevier Inc. All rights reserved.

  8. Speech Problems

    Science.gov (United States)

    ... of your treatment plan may include seeing a speech therapist , a person who is trained to treat speech disorders. How often you have to see the speech therapist will vary — you'll probably start out seeing ...

  9. Notionally steady background noise acts primarily as a modulation masker of speech.

    Science.gov (United States)

    Stone, Michael A; Füllgrabe, Christian; Moore, Brian C J

    2012-07-01

    Stone et al. [J. Acoust. Soc Am. 130, 2874-2881 (2011)], using vocoder processing, showed that the envelope modulations of a notionally steady noise were more effective than the envelope energy as a masker of speech. Here the same effect is demonstrated using non-vocoded signals. Speech was filtered into 28 channels. A masker centered on each channel was added to the channel signal at a target-to-background ratio of -5 or -10 dB. Maskers were sinusoids or noise bands with bandwidth 1/3 or 1 ERB(N) (ERB(N) being the bandwidth of "normal" auditory filters), synthesized with Gaussian (GN) or low-noise (LNN) statistics. To minimize peripheral interactions between maskers, odd-numbered channels were presented to one ear and even to the other. Speech intelligibility was assessed in the presence of each "steady" masker and that masker 100% sinusoidally amplitude modulated (SAM) at 8 Hz. Intelligibility decreased with increasing envelope fluctuation of the maskers. Masking release, the difference in intelligibility between the SAM and its "steady" counterpart, increased with bandwidth from near-zero to around 50 percentage points for the 1-ERB(N) GN. It is concluded that the sinusoidal and GN maskers behaved primarily as energetic and modulation maskers, respectively.

  10. CERN 50th Anniversary Official Celebration : keynote speech from Professor Federico Mayor Zaragoza, Professor in the Molecular Biology at the Universidad Autónoma of Madrid, Former Director-General of UNESCO

    CERN Multimedia

    Michel Blanc

    2004-01-01

    CERN 50th Anniversary Official Celebration : keynote speech from Professor Federico Mayor Zaragoza, Professor in the Molecular Biology at the Universidad Autónoma of Madrid, Former Director-General of UNESCO

  11. Indirect Speech Acts

    Institute of Scientific and Technical Information of China (English)

    李威

    2001-01-01

    Indirect speech acts are frequently used in verbal communication, the interpretation of them is of great importance in order to meet the demands of the development of students' communicative competence. This paper, therefore, intends to present Searle' s indirect speech acts and explore the way how indirect speech acts are interpreted in accordance with two influential theories. It consists of four parts. Part one gives a general introduction to the notion of speech acts theory. Part two makes an elaboration upon the conception of indirect speech act theory proposed by Searle and his supplement and development of illocutionary acts. Part three deals with the interpretation of indirect speech acts. Part four draws implication from the previous study and also serves as the conclusion of the dissertation.

  12. Understanding the effect of noise on electrical stimulation sequences in cochlear implants and its impact on speech intelligibility.

    Science.gov (United States)

    Qazi, Obaid Ur Rehman; van Dijk, Bas; Moonen, Marc; Wouters, Jan

    2013-05-01

    The present study investigates the most important factors that limit the intelligibility of the cochlear implant (CI) processed speech in noisy environments. The electrical stimulation sequences provided in CIs are affected by the noise in the following three manners. First of all, the natural gaps in the speech are filled, which distorts the low-frequency ON/OFF modulations of the speech signal. Secondly, speech envelopes are distorted to include modulations of both speech and noise. Lastly, the N-of-M type of speech coding strategies may select the noise dominated channels instead of the dominant speech channels at low signal-to-noise ratio's (SNRs). Different stimulation sequences are tested with CI subjects to study how these three noise effects individually limit the intelligibility of the CI processed speech. Tests are also conducted with normal hearing (NH) subjects using vocoded speech to identify any significant differences in the noise reduction requirements and speech distortion limitations between the two subject groups. Results indicate that compared to NH subjects CI subjects can tolerate significantly lower levels of steady state speech shaped noise in the speech gaps but at the same time can tolerate comparable levels of distortions in the speech segments. Furthermore, modulations in the stimulus current level have no effect on speech intelligibility as long as the channel selection remains ideal. Finally, wrong maxima selection together with the introduction of noise in the speech gaps significantly degrades the intelligibility. At low SNRs wrong maxima selection introduces interruptions in the speech and makes it difficult to fuse noisy and interrupted speech signals into a coherent speech stream. Copyright © 2013 Elsevier B.V. All rights reserved.

  13. Feature extraction and models for speech: An overview

    Science.gov (United States)

    Schroeder, Manfred

    2002-11-01

    Modeling of speech has a long history, beginning with Count von Kempelens 1770 mechanical speaking machine. Even then human vowel production was seen as resulting from a source (the vocal chords) driving a physically separate resonator (the vocal tract). Homer Dudley's 1928 frequency-channel vocoder and many of its descendants are based on the same successful source-filter paradigm. For linguistic studies as well as practical applications in speech recognition, compression, and synthesis (see M. R. Schroeder, Computer Speech), the extant models require the (often difficult) extraction of numerous parameters such as the fundamental and formant frequencies and various linguistic distinctive features. Some of these difficulties were obviated by the introduction of linear predictive coding (LPC) in 1967 in which the filter part is an all-pole filter, reflecting the fact that for non-nasalized vowels the vocal tract is well approximated by an all-pole transfer function. In the now ubiquitous code-excited linear prediction (CELP), the source-part is replaced by a code book which (together with a perceptual error criterion) permits speech compression to very low bit rates at high speech quality for the Internet and cell phones.

  14. Effects of Adaptation Rate and Noise Suppression on the Intelligibility of Compressed-Envelope Based Speech.

    Directory of Open Access Journals (Sweden)

    Ying-Hui Lai

    Full Text Available Temporal envelope is the primary acoustic cue used in most cochlear implant (CI speech processors to elicit speech perception for patients fitted with CI devices. Envelope compression narrows down envelope dynamic range and accordingly degrades speech understanding abilities of CI users, especially under challenging listening conditions (e.g., in noise. A new adaptive envelope compression (AEC strategy was proposed recently, which in contrast to the traditional static envelope compression, is effective at enhancing the modulation depth of envelope waveform by making best use of its dynamic range and thus improving the intelligibility of envelope-based speech. The present study further explored the effect of adaptation rate in envelope compression on the intelligibility of compressed-envelope based speech. Moreover, since noise reduction is another essential unit in modern CI systems, the compatibility of AEC and noise reduction was also investigated. In this study, listening experiments were carried out by presenting vocoded sentences to normal hearing listeners for recognition. Experimental results demonstrated that the adaptation rate in envelope compression had a notable effect on the speech intelligibility performance of the AEC strategy. By specifying a suitable adaptation rate, speech intelligibility could be enhanced significantly in noise compared to when using static envelope compression. Moreover, results confirmed that the AEC strategy was suitable for combining with noise reduction to improve the intelligibility of envelope-based speech in noise.

  15. Effects of Adaptation Rate and Noise Suppression on the Intelligibility of Compressed-Envelope Based Speech.

    Science.gov (United States)

    Lai, Ying-Hui; Tsao, Yu; Chen, Fei

    2015-01-01

    Temporal envelope is the primary acoustic cue used in most cochlear implant (CI) speech processors to elicit speech perception for patients fitted with CI devices. Envelope compression narrows down envelope dynamic range and accordingly degrades speech understanding abilities of CI users, especially under challenging listening conditions (e.g., in noise). A new adaptive envelope compression (AEC) strategy was proposed recently, which in contrast to the traditional static envelope compression, is effective at enhancing the modulation depth of envelope waveform by making best use of its dynamic range and thus improving the intelligibility of envelope-based speech. The present study further explored the effect of adaptation rate in envelope compression on the intelligibility of compressed-envelope based speech. Moreover, since noise reduction is another essential unit in modern CI systems, the compatibility of AEC and noise reduction was also investigated. In this study, listening experiments were carried out by presenting vocoded sentences to normal hearing listeners for recognition. Experimental results demonstrated that the adaptation rate in envelope compression had a notable effect on the speech intelligibility performance of the AEC strategy. By specifying a suitable adaptation rate, speech intelligibility could be enhanced significantly in noise compared to when using static envelope compression. Moreover, results confirmed that the AEC strategy was suitable for combining with noise reduction to improve the intelligibility of envelope-based speech in noise.

  16. Contribution of supra-threshold processing to speech masking release

    DEFF Research Database (Denmark)

    Christiansen, Claus Forup Corlin; Dau, Torsten

    2011-01-01

    Normal-hearing (NH) listeners can typically better understand speech in the presence of a fluctuating noise or a competing talker compared to a stationary noise interferer. However, for hearing-impaired (HI) listeners, this masking release (MR) is strongly reduced or completely absent. Traditiona......Normal-hearing (NH) listeners can typically better understand speech in the presence of a fluctuating noise or a competing talker compared to a stationary noise interferer. However, for hearing-impaired (HI) listeners, this masking release (MR) is strongly reduced or completely absent....... Traditionally, this has been attributed to the ability of NH listeners to utilize the speech in the low-amplitude periods of the masker, an ability that is supposed to be reduced for HI listeners due to reduced temporal and spectral resolution. However, [1] proposed that the reduced MR experienced by HI...... listeners is due to their higher speech reception threshold (SRT) in stationary noise. In the present study, this hypothesis was investigated by presenting noise-band vocoded as well as low-pass and high-pass filtered stimuli to the NH listeners. In this way, the SRTs of the NH listeners were similar...

  17. Preventive measures in speech and language therapy

    OpenAIRE

    Slokar, Polona

    2014-01-01

    Preventive care plays an important role in speech and language therapy. Through training, a speech and language therapist informs the expert and the general public about his efforts in the field of feeding, speech and language development, as well as about the missing elements that may appear in relation to communication and feeding. A speech and language therapist is also responsible for early detection of irregularities and of those factors which affect speech and language development. To a...

  18. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  19. Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces

    Science.gov (United States)

    Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Savariaux, Christophe; Yvert, Blaise

    2016-01-01

    Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer. PMID:27880768

  20. Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces.

    Directory of Open Access Journals (Sweden)

    Florent Bocquelet

    2016-11-01

    Full Text Available Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN trained on electromagnetic articulography (EMA data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.

  1. Speech Development

    Science.gov (United States)

    ... The speech-language pathologist should consistently assess your child’s speech and language development, as well as screen for hearing problems (with ... and caregivers play a vital role in a child’s speech and language development. It is important that you talk to your ...

  2. Upregulation of cognitive control networks in older adults’ speech comprehension

    Directory of Open Access Journals (Sweden)

    Julia eErb

    2013-12-01

    Full Text Available Speech comprehension abilities decline with age and with age-related hearing loss, but it is unclear how this decline expresses in terms of central neural mechanisms. The current study examined neural speech processing in a group of older adults (aged 56–77, n=16, with varying degrees of sensorineural hearing loss, and compared them to a cohort of young adults (aged 22–31, n=30, self-reported normal hearing. In an fMRI experiment, listeners heard and repeated back degraded sentences (4-band vocoding, which preserves the temporal envelope of the acoustic signal, while substantially degrading spectral information. Behaviourally, older adults adapted to degraded speech at the same rate as young listeners, although their overall comprehension of degraded speech was lower. Neurally, both older and young adults relied on the left anterior insula for degraded more than clear speech perception. However, anterior insula engagement in older adults was dependent on hearing acuity. Young adults additionally employed the anterior cingulate cortex (ACC. Interestingly, this age group × degradation interaction was driven by a reduced dynamic range in older adults, who displayed elevated levels of ACC activity in both conditions, consistent with a persistent upregulation in cognitive control irrespective of task difficulty. For correct speech comprehension, older adults recruited the middle frontal gyrus in addition to a core speech comprehension network on which young adults relied, suggestive of a compensatory mechanism. Taken together, the results indicate that older adults increasingly recruit cognitive control networks, even under optimal listening conditions, at the expense of these systems’ dynamic range.

  3. New graduates’ perceptions of preparedness to provide speech-language therapy services in general and dysphagia services in particular

    Directory of Open Access Journals (Sweden)

    Shajila Singh

    2015-02-01

    Full Text Available Background: Upon graduation, newly qualified speech-language therapists are expected to provide services independently. This study describes new graduates’ perceptions of their preparedness to provide services across the scope of the profession and explores associations between perceptions of dysphagia theory and clinical learning curricula with preparedness for adult and paediatric dysphagia service delivery.Methods: New graduates of six South African universities were recruited to participate in a survey by completing an electronic questionnaire exploring their perceptions of the dysphagia curricula and their preparedness to practise across the scope of the profession of speechlanguage therapy. Results: Eighty graduates participated in the study yielding a response rate of 63.49%. Participants perceived themselves to be well prepared in some areas (e.g. child language: 100%; articulation and phonology: 97.26%, but less prepared in other areas (e.g. adult dysphagia: 50.70%; paediatric dysarthria: 46.58%; paediatric dysphagia: 38.36% and most unprepared to provide services requiring sign language (23.61% and African languages (20.55%. There was a significant relationship between perceptions of adequate theory and clinical learning opportunities with assessment and management of dysphagia and perceptions of preparedness to provide dysphagia services. Conclusion: There is a need for review of existing curricula and consideration of developing a standard speech-language therapy curriculum across universities, particularly in service provision to a multilingual population, and in both the theory and clinical learning of the assessment and management of adult and paediatric dysphagia, to better equip graduates for practice.

  4. Multisensory training can promote or impede visual perceptual learning of speech stimuli: visual-tactile vs. visual-auditory training.

    Science.gov (United States)

    Eberhardt, Silvio P; Auer, Edward T; Bernstein, Lynne E

    2014-01-01

    In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee's primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee's lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT).

  5. Multisensory Training can Promote or Impede Visual Perceptual Learning of Speech Stimuli: Visual-Tactile versus Visual-Auditory Training

    Directory of Open Access Journals (Sweden)

    Silvio P Eberhardt

    2014-10-01

    Full Text Available In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that Aaudiovisual training with speech stimuli can promote auditory-only perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded auditory-only (AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning in participants whose training scores were similar. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1 Stimuli presented to the trainee’s primary perceptual pathway will impede learning by a lower-rank pathway. (2 Stimuli presented to the trainee’s lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory.

  6. The use of auditory and visual context in speech perception by listeners with normal hearing and listeners with cochlear implants.

    Science.gov (United States)

    Winn, Matthew B; Rhone, Ariane E; Chatterjee, Monita; Idsardi, William J

    2013-01-01

    There is a wide range of acoustic and visual variability across different talkers and different speaking contexts. Listeners with normal hearing (NH) accommodate that variability in ways that facilitate efficient perception, but it is not known whether listeners with cochlear implants (CIs) can do the same. In this study, listeners with NH and listeners with CIs were tested for accommodation to auditory and visual phonetic contexts created by gender-driven speech differences as well as vowel coarticulation and lip rounding in both consonants and vowels. Accommodation was measured as the shifting of perceptual boundaries between /s/ and /∫/ sounds in various contexts, as modeled by mixed-effects logistic regression. Owing to the spectral contrasts thought to underlie these context effects, CI listeners were predicted to perform poorly, but showed considerable success. Listeners with CIs not only showed sensitivity to auditory cues to gender, they were also able to use visual cues to gender (i.e., faces) as a supplement or proxy for information in the acoustic domain, in a pattern that was not observed for listeners with NH. Spectrally-degraded stimuli heard by listeners with NH generally did not elicit strong context effects, underscoring the limitations of noise vocoders and/or the importance of experience with electric hearing. Visual cues for consonant lip rounding and vowel lip rounding were perceived in a manner consistent with coarticulation and were generally used more heavily by listeners with CIs. Results suggest that listeners with CIs are able to accommodate various sources of acoustic variability either by attending to appropriate acoustic cues or by inferring them via the visual signal.

  7. The use of auditory and visual context in speech perception by listeners with normal hearing and listeners with cochlear implants

    Directory of Open Access Journals (Sweden)

    Matthew eWinn

    2013-11-01

    Full Text Available There is a wide range of acoustic and visual variability across different talkers and different speaking contexts. Listeners with normal hearing accommodate that variability in ways that facilitate efficient perception, but it is not known whether listeners with cochlear implants can do the same. In this study, listeners with normal hearing (NH and listeners with cochlear implants (CIs were tested for accommodation to auditory and visual phonetic contexts created by gender-driven speech differences as well as vowel coarticulation and lip rounding in both consonants and vowels. Accommodation was measured as the shifting of perceptual boundaries between /s/ and /ʃ/ sounds in various contexts, as modeled by mixed-effects logistic regression. Owing to the spectral contrasts thought to underlie these context effects, CI listeners were predicted to perform poorly, but showed considerable success. Listeners with cochlear implants not only showed sensitivity to auditory cues to gender, they were also able to use visual cues to gender (i.e. faces as a supplement or proxy for information in the acoustic domain, in a pattern that was not observed for listeners with normal hearing. Spectrally-degraded stimuli heard by listeners with normal hearing generally did not elicit strong context effects, underscoring the limitations of noise vocoders and/or the importance of experience with electric hearing. Visual cues for consonant lip rounding and vowel lip rounding were perceived in a manner consistent with coarticulation and were generally used more heavily by listeners with CIs. Results suggest that listeners with cochlear implants are able to accommodate various sources of acoustic variability either by attending to appropriate acoustic cues or by inferring them via the visual signal.

  8. Speech Correction in the Schools. Third Edition.

    Science.gov (United States)

    Eisenson, Jon; Ogilvie, Mardel

    The volume, intended to introduce readers to the problems and therapeutic needs of speech impaired school children, first presents general considerations and background knowledge necessary for basic insights of the classroom teacher and the school speech clinician in relation to the speech handicapped child. Discussed are the classification and…

  9. Speech Indexing

    NARCIS (Netherlands)

    Ordelman, R.J.F.; Jong, de F.M.G.; Leeuwen, van D.A.; Blanken, H.M.; de Vries, A.P.; Blok, H.E.; Feng, L.

    2007-01-01

    This chapter will focus on the automatic extraction of information from the speech in multimedia documents. This approach is often referred to as speech indexing and it can be regarded as a subfield of audio indexing that also incorporates for example the analysis of music and sounds. If the objecti

  10. Plowing Speech

    OpenAIRE

    Zla ba sgrol ma

    2009-01-01

    This file contains a plowing speech and a discussion about the speech This collection presents forty-nine audio files including: several folk song genres; folktales and; local history from the Sman shad Valley of Sde dge county World Oral Literature Project

  11. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  12. A chimpanzee recognizes synthetic speech with significantly reduced acoustic cues to phonetic content.

    Science.gov (United States)

    Heimbauer, Lisa A; Beran, Michael J; Owren, Michael J

    2011-07-26

    A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human. Copyright © 2011 Elsevier Ltd. All rights reserved.

  13. Relative Contributions of Spectral and Temporal Cues for Speech Recognition in Patients with Sensorineural Hearing Loss

    Institute of Scientific and Technical Information of China (English)

    XU Li; ZHOU Ning; Rebecca Brashears; Katherine Rife

    2008-01-01

    The present study was designed to examine speech recognition in patients with sensorineural hearing loss when the temporal and spectral information in the speech signals were co-varied. Four subjects with mild to moderate sensorineural hearing loss were recruited to participate in consonant and vowel recognition tests that used speech stimuli processed through a noise-excited voeoder. The number of channels was varied between 2 and 32, which defined spectral information. The lowpass cutoff frequency of the temporal envelope extractor was varied from 1 to 512 Hz, which defined temporal information. Results indicate that performance of subjects with sensorineural heating loss varied tremendously among the subjects. For consonant recognition, patterns of relative contributions of spectral and temporal information were similar to those in normal-hearing subjects. The utility of temporal envelope information appeared to be normal in the hearing-impaired listeners. For vowel recognition, which depended predominately on spectral information, the performance plateau was achieved with numbers of channels as high as 16-24, much higher than expected, given that the frequency selectivity in patients with sensorineural hearing loss might be compromised. In order to understand the mechanisms on how hearing-impaired listeners utilize spectral and temporal cues for speech recognition, future studies that involve a large sample of patients with sensorineural hearing loss will be necessary to elucidate the relationship between frequency selectivity as well as central processing capability and speech recognition performance using vocoded signals.

  14. Neural pathways for visual speech perception

    Directory of Open Access Journals (Sweden)

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  15. Neural pathways for visual speech perception.

    Science.gov (United States)

    Bernstein, Lynne E; Liebenthal, Einat

    2014-01-01

    This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  16. Speech Therapy Prevention in Kindergarten

    Directory of Open Access Journals (Sweden)

    Vašíková Jana

    2017-08-01

    Full Text Available Introduction: This contribution presents the results of a research focused on speech therapy in kindergartens. This research was realized in Zlín Region. It explains how speech therapy prevention is realized in kindergartens, determines the educational qualifications of teachers for this activity and verifies the quality of the applied methodologies in the daily program of kindergartens. Methods: The empirical part of the study was conducted through a qualitative research. For data collection, we used participant observation. We analyzed the research data and presented them verbally, using frequency tables and graphs, which were subsequently interpreted. Results: In this research, 71% of the teachers completed a course of speech therapy prevention, 28% of the teachers received pedagogical training and just 1% of the teachers are clinical speech pathologists. In spite of this, the research data show that, in most of kindergartens, the aim of speech therapy prevention is performed in order to correct deficiencies in speech and voice. The content of speech therapy prevention is implemented in this direction. Discussion: Awareness of the teachers’/parents’ regarding speech therapy prevention in kindergartens. Limitations: This research was implemented in autumn of 2016 in Zlín Region. Research data cannot be generalized to the entire population. We have the ambition to expand this research to other regions next year. Conclusions: Results show that both forms of speech therapy prevention - individual and group - are used. It is also often a combination of both. The aim of the individual forms is, in most cases, to prepare a child for cooperation during voice correction. The research also confirmed that most teachers do not have sufficient education in speech therapy. Most of them completed a course of speech therapy as primary prevention educators. The results also show that teachers spend a lot of time by speech therapy prevention in

  17. Exploring the roles of spectral detail and intonation contour in speech intelligibility: an FMRI study.

    Science.gov (United States)

    Kyong, Jeong S; Scott, Sophie K; Rosen, Stuart; Howe, Timothy B; Agnew, Zarinah K; McGettigan, Carolyn

    2014-08-01

    The melodic contour of speech forms an important perceptual aspect of tonal and nontonal languages and an important limiting factor on the intelligibility of speech heard through a cochlear implant. Previous work exploring the neural correlates of speech comprehension identified a left-dominant pathway in the temporal lobes supporting the extraction of an intelligible linguistic message, whereas the right anterior temporal lobe showed an overall preference for signals clearly conveying dynamic pitch information [Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain, 123, 155-163, 2000; Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406, 2000]. The current study combined modulations of overall intelligibility (through vocoding and spectral inversion) with a manipulation of pitch contour (normal vs. falling) to investigate the processing of spoken sentences in functional MRI. Our overall findings replicate and extend those of Scott et al. [Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406, 2000], where greater sentence intelligibility was predominately associated with increased activity in the left STS, and the greatest response to normal sentence melody was found in right superior temporal gyrus. These data suggest a spatial distinction between brain areas associated with intelligibility and those involved in the processing of dynamic pitch information in speech. By including a set of complexity-matched unintelligible conditions created by spectral inversion, this is additionally the first study reporting a fully factorial exploration of spectrotemporal complexity and spectral inversion as they relate to the neural processing of speech intelligibility. Perhaps

  18. Amharic Speech Recognition for Speech Translation

    OpenAIRE

    Melese, Michael; Besacier, Laurent; Meshesha, Million

    2016-01-01

    International audience; The state-of-the-art speech translation can be seen as a cascade of Automatic Speech Recognition, Statistical Machine Translation and Text-To-Speech synthesis. In this study an attempt is made to experiment on Amharic speech recognition for Amharic-English speech translation in tourism domain. Since there is no Amharic speech corpus, we developed a read-speech corpus of 7.43hr in tourism domain. The Amharic speech corpus has been recorded after translating standard Bas...

  19. Functional speech disorders: clinical manifestations, diagnosis, and management.

    Science.gov (United States)

    Duffy, J R

    2016-01-01

    Acquired psychogenic or functional speech disorders are a subtype of functional neurologic disorders. They can mimic organic speech disorders and, although any aspect of speech production can be affected, they manifest most often as dysphonia, stuttering, or prosodic abnormalities. This chapter reviews the prevalence of functional speech disorders, the spectrum of their primary clinical characteristics, and the clues that help distinguish them from organic neurologic diseases affecting the sensorimotor networks involved in speech production. Diagnosis of a speech disorder as functional can be supported by sometimes rapidly achieved positive outcomes of symptomatic speech therapy. The general principles of such therapy are reviewed. © 2016 Elsevier B.V. All rights reserved.

  20. Speech-specificity of two audiovisual integration effects

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2010-01-01

    Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...

  1. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  2. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  3. Speech rate effects on the processing of conversational speech across the adult life span.

    Science.gov (United States)

    Koch, Xaver; Janse, Esther

    2016-04-01

    This study investigates the effect of speech rate on spoken word recognition across the adult life span. Contrary to previous studies, conversational materials with a natural variation in speech rate were used rather than lab-recorded stimuli that are subsequently artificially time-compressed. It was investigated whether older adults' speech recognition is more adversely affected by increased speech rate compared to younger and middle-aged adults, and which individual listener characteristics (e.g., hearing, fluid cognitive processing ability) predict the size of the speech rate effect on recognition performance. In an eye-tracking experiment, participants indicated with a mouse-click which visually presented words they recognized in a conversational fragment. Click response times, gaze, and pupil size data were analyzed. As expected, click response times and gaze behavior were affected by speech rate, indicating that word recognition is more difficult if speech rate is faster. Contrary to earlier findings, increased speech rate affected the age groups to the same extent. Fluid cognitive processing ability predicted general recognition performance, but did not modulate the speech rate effect. These findings emphasize that earlier results of age by speech rate interactions mainly obtained with artificially speeded materials may not generalize to speech rate variation as encountered in conversational speech.

  4. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  5. Speech dynamics

    NARCIS (Netherlands)

    Pols, L.C.W.

    2011-01-01

    In order for speech to be informative and communicative, segmental and suprasegmental variation is mandatory. Only this leads to meaningful words and sentences. The building blocks are no stable entities put next to each other (like beads on a string or like printed text), but there are gradual tran

  6. Speech communications in noise

    Science.gov (United States)

    1984-07-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  7. Production Rate and Weapon System Cost: Research Review, Case Studies, and Planning Model.

    Science.gov (United States)

    1980-11-01

    number of diphones is now 2733. Generalized the phonetic synthesis program to allow the use of phonological rules combined with diphone templates...Projects Agency under ARPA Order No. 3515. 19. KEY WORoS (Continue on ,evere.s ide It n*.Ceew mid Identify by block number) Speech synthesis, phonetic ...real-time vocoder, mixed-source excitation model, adaptive lattice, phonetic vocoder, spectral template, 20. AU ,RAkCT (Conitinue On reverse eld* It

  8. Perceptual organization of speech signals by children with and without dyslexia

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H.

    2013-01-01

    Developmental dyslexia is a condition in which children encounter difficulty learning to read in spite of adequate instruction. Although considerable effort has been expended trying to identify the source of the problem, no single solution has been agreed upon. The current study explored a new hypothesis, that developmental dyslexia may be due to faulty perceptual organization of linguistically relevant sensory input. To test that idea, sentence-length speech signals were processed to create either sine-wave or noise-vocoded analogs. Seventy children between 8 and 11 years of age, with and without dyslexia participated. Children with dyslexia were selected to have phonological awareness deficits, although those without such deficits were retained in the study. The processed sentences were presented for recognition, and measures of reading, phonological awareness, and expressive vocabulary were collected. Results showed that children with dyslexia, regardless of phonological subtype, had poorer recognition scores than children without dyslexia for both kinds of degraded sentences. Older children with dyslexia recognized the sine-wave sentences better than younger children with dyslexia, but no such effect of age was found for the vocoded materials. Recognition scores were used as predictor variables in regression analyses with reading, phonological awareness, and vocabulary measures used as dependent variables. Scores for both sorts of sentence materials were strong predictors of performance on all three dependent measures when all children were included, but only performance for the sine-wave materials explained significant proportions of variance when only children with dyslexia were included. Finally, matching young, typical readers with older children with dyslexia on reading abilities did not mitigate the group difference in recognition of vocoded sentences. Conclusions were that children with dyslexia have difficulty organizing linguistically relevant sensory

  9. Specialization in audiovisual speech perception: a replication study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    of the speaker. Observers were required to report this after primary target categorization. We found a significant McGurk effect only in the natural speech and speech mode conditions supporting the finding of Tuomainen et al. Performance in the secondary task was similar in all conditions indicating......Speech perception is audiovisual as evidenced by bimodal integration in the McGurk effect. This integration effect may be specific to speech or be applied to all stimuli in general. To investigate this, Tuomainen et al. (2005) used sine-wave speech, which naïve observers may perceive as non-speech......, but hear as speech once informed of the linguistic origin of the signal. Combinations of sine-wave speech and incongruent video of the talker elicited a McGurk effect only for informed observers. This indicates that the audiovisual integration effect is specific to speech perception. However, observers...

  10. Specialization in audiovisual speech perception: a replication study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by bimodal integration in the McGurk effect. This integration effect may be specific to speech or be applied to all stimuli in general. To investigate this, Tuomainen et al. (2005) used sine-wave speech, which naïve observers may perceive as non-speech......, but hear as speech once informed of the linguistic origin of the signal. Combinations of sine-wave speech and incongruent video of the talker elicited a McGurk effect only for informed observers. This indicates that the audiovisual integration effect is specific to speech perception. However, observers...... of the speaker. Observers were required to report this after primary target categorization. We found a significant McGurk effect only in the natural speech and speech mode conditions supporting the finding of Tuomainen et al. Performance in the secondary task was similar in all conditions indicating...

  11. Going to a Speech Therapist

    Science.gov (United States)

    ... Video: Getting an X-ray Going to a Speech Therapist KidsHealth > For Kids > Going to a Speech Therapist ... therapists (also called speech-language pathologists ). What Do Speech Therapists Help With? Speech therapists help people of all ...

  12. Philosophy of Research in Motor Speech Disorders

    Science.gov (United States)

    Weismer, Gary

    2006-01-01

    The primary objective of this position paper is to assess the theoretical and empirical support that exists for the Mayo Clinic view of motor speech disorders in general, and for oromotor, nonverbal tasks as a window to speech production processes in particular. Literature both in support of and against the Mayo clinic view and the associated use…

  13. Analog Acoustic Expression in Speech Communication

    Science.gov (United States)

    Shintel, Hadas; Nusbaum, Howard C.; Okrent, Arika

    2006-01-01

    We present the first experimental evidence of a phenomenon in speech communication we call "analog acoustic expression." Speech is generally thought of as conveying information in two distinct ways: discrete linguistic-symbolic units such as words and sentences represent linguistic meaning, and continuous prosodic forms convey information about…

  14. The Effects of TV on Speech Education

    Science.gov (United States)

    Gocen, Gokcen; Okur, Alpaslan

    2013-01-01

    Generally, the speaking aspect is not properly debated when discussing the positive and negative effects of television (TV), especially on children. So, to highlight this point, this study was first initialized by asking the question: "What are the effects of TV on speech?" and secondly, to transform the effects that TV has on speech in…

  15. Speech research

    Science.gov (United States)

    1992-06-01

    Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.

  16. Speech on the general states of enterprises and the sustainable development; Discours devant les Etats generaux des entreprises et du developpement durable

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2006-07-01

    In this speech the author points out two main recommendations. The first message concerns the necessity of a whole mobilization in favor of the sustainable development, from the government policy and the enterprises management to the human behavior. He presents then three main axis to heighten the enterprises (reinforce the information on the environmental and social impact of the economic activities, the development of sustainable investments, the development of the environmental sponsorship). The second message concerns the necessity to place the environment in the economic growth by the development of the ecology and the eco-technology. (A.L.B.)

  17. Speech encoding strategies for multielectrode cochlear implants: a digital signal processor approach.

    Science.gov (United States)

    Dillier, N; Bögli, H; Spillmann, T

    1993-01-01

    The following processing strategies have been implemented on an experimental laboratory system of a cochlear implant digital speech processor (CIDSP) for the Nucleus 22-channel cochlear prosthesis. The first approach (PES, Pitch Excited Sampler) is based on the maximum peak channel vocoder concept whereby the time-varying spectral energy of a number of frequency bands is transformed into electrical stimulation parameters for up to 22 electrodes. The pulse rate at any electrode is controlled by the voice pitch of the input speech signal. The second approach (CIS, Continuous Interleaved Sampler) uses a stimulation pulse rate which is independent of the input signal. The algorithm continuously scans all specified frequency bands (typically between four and 22) and samples their energy levels. As only one electrode can be stimulated at any instance of time, the maximally achievable rate of stimulation is limited by the required stimulus pulse widths (determined individually for each subject) and some additional constraints and parameters. A number of variations of the CIS approach have, therefore, been implemented which either maximize the number of quasi-simultaneous stimulation channels or the pulse rate on a reduced number of electrodes. Evaluation experiments with five experienced cochlear implant users showed significantly better performance in consonant identification tests with the new processing strategies than with the subjects' own wearable speech processors; improvements in vowel identification tasks were rarely observed. Modifications of the basic PES- and CIS strategies resulted in large variations of identification scores. Information transmission analysis of confusion matrices revealed a rather complex pattern across conditions and speech features. Optimization and fine-tuning of processing parameters for these coding strategies will require more data both from speech identification and discrimination evaluations and from psychophysical experiments.

  18. Predictive top-down integration of prior knowledge during speech perception.

    Science.gov (United States)

    Sohoglu, Ediz; Peelle, Jonathan E; Carlyon, Robert P; Davis, Matthew H

    2012-06-20

    A striking feature of human perception is that our subjective experience depends not only on sensory information from the environment but also on our prior knowledge or expectations. The precise mechanisms by which sensory information and prior knowledge are integrated remain unclear, with longstanding disagreement concerning whether integration is strictly feedforward or whether higher-level knowledge influences sensory processing through feedback connections. Here we used concurrent EEG and MEG recordings to determine how sensory information and prior knowledge are integrated in the brain during speech perception. We manipulated listeners' prior knowledge of speech content by presenting matching, mismatching, or neutral written text before a degraded (noise-vocoded) spoken word. When speech conformed to prior knowledge, subjective perceptual clarity was enhanced. This enhancement in clarity was associated with a spatiotemporal profile of brain activity uniquely consistent with a feedback process: activity in the inferior frontal gyrus was modulated by prior knowledge before activity in lower-level sensory regions of the superior temporal gyrus. In parallel, we parametrically varied the level of speech degradation, and therefore the amount of sensory detail, so that changes in neural responses attributable to sensory information and prior knowledge could be directly compared. Although sensory detail and prior knowledge both enhanced speech clarity, they had an opposite influence on the evoked response in the superior temporal gyrus. We argue that these data are best explained within the framework of predictive coding in which sensory activity is compared with top-down predictions and only unexplained activity propagated through the cortical hierarchy.

  19. Individual differences in the perception of melodic contours and pitch-accent timing in speech: Support for domain-generality of pitch processing.

    Science.gov (United States)

    Morrill, Tuuli H; McAuley, J Devin; Dilley, Laura C; Hambrick, David Z

    2015-08-01

    Do the same mechanisms underlie processing of music and language? Recent investigations of this question have yielded inconsistent results. Likely factors contributing to discrepant findings are use of small samples and failure to control for individual differences in cognitive ability. We investigated the relationship between music and speech prosody processing, while controlling for cognitive ability. Participants (n = 179) completed a battery of cognitive ability tests, the Montreal Battery of Evaluation of Amusia (MBEA) to assess music perception, and a prosody test of pitch peak timing discrimination (early, as in insight vs. late, incite). Structural equation modeling revealed that only music perception was a significant predictor of prosody test performance. Music perception accounted for 34.5% of variance on prosody test performance; cognitive abilities and music training added only about 8%. These results indicate musical pitch and temporal processing are highly predictive of pitch discrimination in speech processing, even after controlling for other possible predictors of this aspect of language processing. (c) 2015 APA, all rights reserved).

  20. Speech production, Psychology of

    NARCIS (Netherlands)

    Schriefers, H.J.; Vigliocco, G.

    2015-01-01

    Research on speech production investigates the cognitive processes involved in transforming thoughts into speech. This article starts with a discussion of the methodological issues inherent to research in speech production that illustrates how empirical approaches to speech production must differ fr

  1. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ... reasons that STS ] has not been more widely utilized. Are people with speech disabilities not connected to... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With...

  2. Normal and Time-Compressed Speech

    Science.gov (United States)

    Lemke, Ulrike; Kollmeier, Birger; Holube, Inga

    2016-01-01

    Short-term and long-term learning effects were investigated for the German Oldenburg sentence test (OLSA) using original and time-compressed fast speech in noise. Normal-hearing and hearing-impaired participants completed six lists of the OLSA in five sessions. Two groups of normal-hearing listeners (24 and 12 listeners) and two groups of hearing-impaired listeners (9 listeners each) performed the test with original or time-compressed speech. In general, original speech resulted in better speech recognition thresholds than time-compressed speech. Thresholds decreased with repetition for both speech materials. Confirming earlier results, the largest improvements were observed within the first measurements of the first session, indicating a rapid initial adaptation phase. The improvements were larger for time-compressed than for original speech. The novel results on long-term learning effects when using the OLSA indicate a longer phase of ongoing learning, especially for time-compressed speech, which seems to be limited by a floor effect. In addition, for normal-hearing participants, no complete transfer of learning benefits from time-compressed to original speech was observed. These effects should be borne in mind when inviting listeners repeatedly, for example, in research settings.

  3. Speech Enhancement

    DEFF Research Database (Denmark)

    Benesty, Jacob; Jensen, Jesper Rindom; Christensen, Mads Græsbøll;

    of methods and have been introduced in somewhat different contexts. Linear filtering methods originate in stochastic processes, while subspace methods have largely been based on developments in numerical linear algebra and matrix approximation theory. This book bridges the gap between these two classes......Speech enhancement is a classical problem in signal processing, yet still largely unsolved. Two of the conventional approaches for solving this problem are linear filtering, like the classical Wiener filter, and subspace methods. These approaches have traditionally been treated as different classes...... of methods by showing how the ideas behind subspace methods can be incorporated into traditional linear filtering. In the context of subspace methods, the enhancement problem can then be seen as a classical linear filter design problem. This means that various solutions can more easily be compared...

  4. Speech therapy with obturator.

    Science.gov (United States)

    Shyammohan, A; Sreenivasulu, D

    2010-12-01

    Rehabilitation of speech is tantamount to closure of defect in cases with velopharyngeal insufficiency. Often the importance of speech therapy is sidelined during the fabrication of obturators. Usually the speech part is taken up only at a later stage and is relegated entirely to a speech therapist without the active involvement of the prosthodontist. The article suggests a protocol for speech therapy in such cases to be done in unison with a prosthodontist.

  5. Speaking of Race, Speaking of Sex: Hate Speech, Civil Rights, and Civil Liberties.

    Science.gov (United States)

    Gates, Henry Louis, Jr.; And Others

    The essays of this collection explore the restriction of speech and the hate speech codes that attempt to restrict bigoted or offensive speech and punish those who engage in it. These essays generally argue that speech restrictions are dangerous and counterproductive, but they acknowledge that it is very difficult to distinguish between…

  6. 42 CFR 409.17 - Physical therapy, occupational therapy, and speech-language pathology services.

    Science.gov (United States)

    2010-10-01

    ... therapist furnishing the physical therapy services. (4) A speech-language pathologist furnishing the speech... 42 Public Health 2 2010-10-01 2010-10-01 false Physical therapy, occupational therapy, and speech..., and speech-language pathology services. (a) General rules. (1) Except as specified in this...

  7. Speech comprehension aided by multiple modalities: behavioural and neural interactions

    Science.gov (United States)

    McGettigan, Carolyn; Faulkner, Andrew; Altarelli, Irene; Obleser, Jonas; Baverstock, Harriet; Scott, Sophie K.

    2014-01-01

    Speech comprehension is a complex human skill, the performance of which requires the perceiver to combine information from several sources – e.g. voice, face, gesture, linguistic context – to achieve an intelligible and interpretable percept. We describe a functional imaging investigation of how auditory, visual and linguistic information interact to facilitate comprehension. Our specific aims were to investigate the neural responses to these different information sources, alone and in interaction, and further to use behavioural speech comprehension scores to address sites of intelligibility-related activation in multifactorial speech comprehension. In fMRI, participants passively watched videos of spoken sentences, in which we varied Auditory Clarity (with noise-vocoding), Visual Clarity (with Gaussian blurring) and Linguistic Predictability. Main effects of enhanced signal with increased auditory and visual clarity were observed in overlapping regions of posterior STS. Two-way interactions of the factors (auditory × visual, auditory × predictability) in the neural data were observed outside temporal cortex, where positive signal change in response to clearer facial information and greater semantic predictability was greatest at intermediate levels of auditory clarity. Overall changes in stimulus intelligibility by condition (as determined using an independent behavioural experiment) were reflected in the neural data by increased activation predominantly in bilateral dorsolateral temporal cortex, as well as inferior frontal cortex and left fusiform gyrus. Specific investigation of intelligibility changes at intermediate auditory clarity revealed a set of regions, including posterior STS and fusiform gyrus, showing enhanced responses to both visual and linguistic information. Finally, an individual differences analysis showed that greater comprehension performance in the scanning participants (measured in a post-scan behavioural test) were associated with

  8. Delayed Speech or Language Development

    Science.gov (United States)

    ... to 2-Year-Old Delayed Speech or Language Development KidsHealth > For Parents > Delayed Speech or Language Development ... child is right on schedule. Normal Speech & Language Development It's important to discuss early speech and language ...

  9. Effects of Age, Acoustic Challenge, and Verbal Working Memory on Recall of Narrative Speech.

    Science.gov (United States)

    Ward, Caitlin M; Rogers, Chad S; Van Engen, Kristin J; Peelle, Jonathan E

    2016-01-01

    A common goal during speech comprehension is to remember what we have heard. Encoding speech into long-term memory frequently requires processes such as verbal working memory that may also be involved in processing degraded speech. Here the authors tested whether young and older adult listeners' memory for short stories was worse when the stories were acoustically degraded, or whether the additional contextual support provided by a narrative would protect against these effects. The authors tested 30 young adults (aged 18-28 years) and 30 older adults (aged 65-79 years) with good self-reported hearing. Participants heard short stories that were presented as normal (unprocessed) speech or acoustically degraded using a noise vocoding algorithm with 24 or 16 channels. The degraded stories were still fully intelligible. Following each story, participants were asked to repeat the story in as much detail as possible. Recall was scored using a modified idea unit scoring approach, which included separately scoring hierarchical levels of narrative detail. Memory for acoustically degraded stories was significantly worse than for normal stories at some levels of narrative detail. Older adults' memory for the stories was significantly worse overall, but there was no interaction between age and acoustic clarity or level of narrative detail. Verbal working memory (assessed by reading span) significantly correlated with recall accuracy for both young and older adults, whereas hearing ability (better ear pure tone average) did not. The present findings are consistent with a framework in which the additional cognitive demands caused by a degraded acoustic signal use resources that would otherwise be available for memory encoding for both young and older adults. Verbal working memory is a likely candidate for supporting both of these processes.

  10. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With... this document, the Commission amends telecommunications relay services (TRS) mandatory...

  11. Surgical improvement of speech disorder caused by amyotrophic lateral sclerosis.

    Science.gov (United States)

    Saigusa, Hideto; Yamaguchi, Satoshi; Nakamura, Tsuyoshi; Komachi, Taro; Kadosono, Osamu; Ito, Hiroyuki; Saigusa, Makoto; Niimi, Seiji

    2012-12-01

    Amyotrophic lateral sclerosis (ALS) is a progressive debilitating neurological disease. ALS disturbs the quality of life by affecting speech, swallowing and free mobility of the arms without affecting intellectual function. It is therefore of significance to improve intelligibility and quality of speech sounds, especially for ALS patients with slowly progressive courses. Currently, however, there is no effective or established approach to improve speech disorder caused by ALS. We investigated a surgical procedure to improve speech disorder for some patients with neuromuscular diseases with velopharyngeal closure incompetence. In this study, we performed the surgical procedure for two patients suffering from severe speech disorder caused by slowly progressing ALS. The patients suffered from speech disorder with hypernasality and imprecise and weak articulation during a 6-year course (patient 1) and a 3-year course (patient 2) of slowly progressing ALS. We narrowed bilateral lateral palatopharyngeal wall at velopharyngeal port, and performed this surgery under general anesthesia without muscle relaxant for the two patients. Postoperatively, intelligibility and quality of their speech sounds were greatly improved within one month without any speech therapy. The patients were also able to generate longer speech phrases after the surgery. Importantly, there was no serious complication during or after the surgery. In summary, we performed bilateral narrowing of lateral palatopharyngeal wall as a speech surgery for two patients suffering from severe speech disorder associated with ALS. With this technique, improved intelligibility and quality of speech can be maintained for longer duration for the patients with slowly progressing ALS.

  12. Speech and Language Impairments

    Science.gov (United States)

    ... impairment. Many children are identified as having a speech or language impairment after they enter the public school system. A teacher may notice difficulties in a child’s speech or communication skills and refer the child for ...

  13. Auditory free classification of nonnative speech

    Science.gov (United States)

    Atagi, Eriko; Bent, Tessa

    2013-01-01

    Through experience with speech variability, listeners build categories of indexical speech characteristics including categories for talker, gender, and dialect. The auditory free classification task—a task in which listeners freely group talkers based on audio samples—has been a useful tool for examining listeners’ representations of some of these characteristics including regional dialects and different languages. The free classification task was employed in the current study to examine the perceptual representation of nonnative speech. The category structure and salient perceptual dimensions of nonnative speech were investigated from two perspectives: general similarity and perceived native language background. Talker intelligibility and whether native talkers were included were manipulated to test stimulus set effects. Results showed that degree of accent was a highly salient feature of nonnative speech for classification based on general similarity and on perceived native language background. This salience, however, was attenuated when listeners were listening to highly intelligible stimuli and attending to the talkers’ native language backgrounds. These results suggest that the context in which nonnative speech stimuli are presented—such as the listeners’ attention to the talkers’ native language and the variability of stimulus intelligibility—can influence listeners’ perceptual organization of nonnative speech. PMID:24363470

  14. Auditory free classification of nonnative speech.

    Science.gov (United States)

    Atagi, Eriko; Bent, Tessa

    2013-11-01

    Through experience with speech variability, listeners build categories of indexical speech characteristics including categories for talker, gender, and dialect. The auditory free classification task-a task in which listeners freely group talkers based on audio samples-has been a useful tool for examining listeners' representations of some of these characteristics including regional dialects and different languages. The free classification task was employed in the current study to examine the perceptual representation of nonnative speech. The category structure and salient perceptual dimensions of nonnative speech were investigated from two perspectives: general similarity and perceived native language background. Talker intelligibility and whether native talkers were included were manipulated to test stimulus set effects. Results showed that degree of accent was a highly salient feature of nonnative speech for classification based on general similarity and on perceived native language background. This salience, however, was attenuated when listeners were listening to highly intelligible stimuli and attending to the talkers' native language backgrounds. These results suggest that the context in which nonnative speech stimuli are presented-such as the listeners' attention to the talkers' native language and the variability of stimulus intelligibility-can influence listeners' perceptual organization of nonnative speech.

  15. Relations between affective music and speech: evidence from dynamics of affective piano performance and speech production.

    Science.gov (United States)

    Liu, Xiaoluan; Xu, Yi

    2015-01-01

    This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory constraints are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics. Fingerings interact with fear in the piano experiment and articulatory constraints interact with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role.

  16. Relations between affective music and speech: Evidence from dynamics of affective piano performance and speech production

    Directory of Open Access Journals (Sweden)

    Xiaoluan eLiu

    2015-07-01

    Full Text Available This study compares affective piano performance with speech production from the perspective of dynamics: unlike previous research, this study uses finger force and articulatory effort as indexes reflecting the dynamics of affective piano performance and speech production respectively. Moreover, for the first time physical constraints such as piano fingerings and speech articulatory distance are included due to their potential contribution to different patterns of dynamics. A piano performance experiment and speech production experiment were conducted in four emotions: anger, fear, happiness and sadness. The results show that in both piano performance and speech production, anger and happiness generally have high dynamics while sadness has the lowest dynamics, with fear in the middle. Fingerings interact with fear in the piano experiment and articulatory distance interacts with anger in the speech experiment, i.e., large physical constraints produce significantly higher dynamics than small physical constraints in piano performance under the condition of fear and in speech production under the condition of anger. Using production experiments, this study firstly supports previous perception studies on relations between affective music and speech. Moreover, this is the first study to show quantitative evidence for the importance of considering motor aspects such as dynamics in comparing music performance and speech production in which motor mechanisms play a crucial role.

  17. Application of a short-time version of the Equalization-Cancellation model to speech intelligibility experiments with speech maskers.

    Science.gov (United States)

    Wan, Rui; Durlach, Nathaniel I; Colburn, H Steven

    2014-08-01

    A short-time-processing version of the Equalization-Cancellation (EC) model of binaural processing is described and applied to speech intelligibility tasks in the presence of multiple maskers, including multiple speech maskers. This short-time EC model, called the STEC model, extends the model described by Wan et al. [J. Acoust. Soc. Am. 128, 3678-3690 (2010)] to allow the EC model's equalization parameters τ and α to be adjusted as a function of time, resulting in improved masker cancellation when the dominant masker location varies in time. Using the Speech Intelligibility Index, the STEC model is applied to speech intelligibility with maskers that vary in number, type, and spatial arrangements. Most notably, when maskers are located on opposite sides of the target, this STEC model predicts improved thresholds when the maskers are modulated independently with speech-envelope modulators; this includes the most relevant case of independent speech maskers. The STEC model describes the spatial dependence of the speech reception threshold with speech maskers better than the steady-state model. Predictions are also improved for independently speech-modulated noise maskers but are poorer for reversed-speech maskers. In general, short-term processing is useful, but much remains to be done in the complex task of understanding speech in speech maskers.

  18. Speech 7 through 12.

    Science.gov (United States)

    Nederland Independent School District, TX.

    GRADES OR AGES: Grades 7 through 12. SUBJECT MATTER: Speech. ORGANIZATION AND PHYSICAL APPEARANCE: Following the foreward, philosophy and objectives, this guide presents a speech curriculum. The curriculum covers junior high and Speech I, II, III (senior high). Thirteen units of study are presented for junior high, each unit is divided into…

  19. From Gesture to Speech

    Directory of Open Access Journals (Sweden)

    Maurizio Gentilucci

    2012-11-01

    Full Text Available One of the major problems concerning the evolution of human language is to understand how sounds became associated to meaningful gestures. It has been proposed that the circuit controlling gestures and speech evolved from a circuit involved in the control of arm and mouth movements related to ingestion. This circuit contributed to the evolution of spoken language, moving from a system of communication based on arm gestures. The discovery of the mirror neurons has provided strong support for the gestural theory of speech origin because they offer a natural substrate for the embodiment of language and create a direct link between sender and receiver of a message. Behavioural studies indicate that manual gestures are linked to mouth movements used for syllable emission. Grasping with the hand selectively affected movement of inner or outer parts of the mouth according to syllable pronunciation and hand postures, in addition to hand actions, influenced the control of mouth grasp and vocalization. Gestures and words are also related to each other. It was found that when producing communicative gestures (emblems the intention to interact directly with a conspecific was transferred from gestures to words, inducing modification in voice parameters. Transfer effects of the meaning of representational gestures were found on both vocalizations and meaningful words. It has been concluded that the results of our studies suggest the existence of a system relating gesture to vocalization which was precursor of a more general system reciprocally relating gesture to word.

  20. Robust digital processing of speech signals

    CERN Document Server

    Kovacevic, Branko; Veinović, Mladen; Marković, Milan

    2017-01-01

    This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general class of nonlinear models, such as various neural networks, these models possess good properties of controlled complexity, the option of working in “online” mode, as well as a low information volume for efficient speech encoding and transmission. Providing comprehensive insights, the book is based on the authors’ research, which has already been published, supplemented by additional texts discussing general considerations of speech modeling, linear predictive analysis and robust parameter estimation.

  1. Representational Similarity Analysis Reveals Heterogeneous Networks Supporting Speech Motor Control

    DEFF Research Database (Denmark)

    Zheng, Zane; Cusack, Rhodri; Johnsrude, Ingrid

    The everyday act of speaking involves the complex processes of speech motor control. One important feature of such control is regulation of articulation when auditory concomitants of speech do not correspond to the intended motor gesture. While theoretical accounts of speech monitoring posit...... is supported by a complex neural network that is involved in linguistic, motoric and sensory processing. With the aid of novel real-time acoustic analyses and representational similarity analyses of fMRI signals, our data show functionally differentiated networks underlying auditory feedback control of speech....... multiple functional components required for detection of errors in speech planning (e.g., Levelt, 1983), neuroimaging studies generally indicate either single brain regions sensitive to speech production errors, or small, discrete networks. Here we demonstrate that the complex system controlling speech...

  2. Speech Databases of Typical Children and Children with SLI.

    Science.gov (United States)

    Grill, Pavel; Tučková, Jana

    2016-01-01

    The extent of research on children's speech in general and on disordered speech specifically is very limited. In this article, we describe the process of creating databases of children's speech and the possibilities for using such databases, which have been created by the LANNA research group in the Faculty of Electrical Engineering at Czech Technical University in Prague. These databases have been principally compiled for medical research but also for use in other areas, such as linguistics. Two databases were recorded: one for healthy children's speech (recorded in kindergarten and in the first level of elementary school) and the other for pathological speech of children with a Specific Language Impairment (recorded at a surgery of speech and language therapists and at the hospital). Both databases were sub-divided according to specific demands of medical research. Their utilization can be exoteric, specifically for linguistic research and pedagogical use as well as for studies of speech-signal processing.

  3. SII-Based Speech Prepocessing for Intelligibility Improvement in Noise

    DEFF Research Database (Denmark)

    Taal, Cees H.; Jensen, Jesper

    2013-01-01

    A linear time-invariant filter is designed in order to improve speech understanding when the speech is played back in a noisy environment. To accomplish this, the speech intelligibility index (SII) is maximized under the constraint that the speech energy is held constant. A nonlinear approximation...... filter sets certain frequency bands to zero when they do not contribute to intelligibility anymore. Experiments show large intelligibility improvements with the proposed method when used in stationary speech-shaped noise. However, it was also found that the method does not perform well for speech...... is used for the SII such that a closed-form solution exists to the constrained optimization problem. The resulting filter is dependent both on the long-term average noise and speech spectrum and the global SNR and, in general, has a high-pass characteristic. In contrast to existing methods, the proposed...

  4. Speech Databases of Typical Children and Children with SLI.

    Directory of Open Access Journals (Sweden)

    Pavel Grill

    Full Text Available The extent of research on children's speech in general and on disordered speech specifically is very limited. In this article, we describe the process of creating databases of children's speech and the possibilities for using such databases, which have been created by the LANNA research group in the Faculty of Electrical Engineering at Czech Technical University in Prague. These databases have been principally compiled for medical research but also for use in other areas, such as linguistics. Two databases were recorded: one for healthy children's speech (recorded in kindergarten and in the first level of elementary school and the other for pathological speech of children with a Specific Language Impairment (recorded at a surgery of speech and language therapists and at the hospital. Both databases were sub-divided according to specific demands of medical research. Their utilization can be exoteric, specifically for linguistic research and pedagogical use as well as for studies of speech-signal processing.

  5. Brain activity underlying the recovery of meaning from degraded speech: A functional near-infrared spectroscopy (fNIRS) study.

    Science.gov (United States)

    Wijayasiri, Pramudi; Hartley, Douglas E H; Wiggins, Ian M

    2017-08-01

    The purpose of this study was to establish whether functional near-infrared spectroscopy (fNIRS), an emerging brain-imaging technique based on optical principles, is suitable for studying the brain activity that underlies effortful listening. In an event-related fNIRS experiment, normally-hearing adults listened to sentences that were either clear or degraded (noise vocoded). These sentences were presented simultaneously with a non-speech distractor, and on each trial participants were instructed to attend either to the speech or to the distractor. The primary region of interest for the fNIRS measurements was the left inferior frontal gyrus (LIFG), a cortical region involved in higher-order language processing. The fNIRS results confirmed findings previously reported in the functional magnetic resonance imaging (fMRI) literature. Firstly, the LIFG exhibited an elevated response to degraded versus clear speech, but only when attention was directed towards the speech. This attention-dependent increase in frontal brain activation may be a neural marker for effortful listening. Secondly, during attentive listening to degraded speech, the haemodynamic response peaked significantly later in the LIFG than in superior temporal cortex, possibly reflecting the engagement of working memory to help reconstruct the meaning of degraded sentences. The homologous region in the right hemisphere may play an equivalent role to the LIFG in some left-handed individuals. In conclusion, fNIRS holds promise as a flexible tool to examine the neural signature of effortful listening. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Speech in spinocerebellar ataxia.

    Science.gov (United States)

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia.

  7. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  8. Speech-to-Speech Relay Service

    Science.gov (United States)

    ... to make an STS call. You are then connected to an STS CA who will repeat your spoken words, making the spoken words clear to the other party. Persons with speech disabilities may also receive STS calls. The calling ...

  9. Apraxia of speech and cerebellar mutism syndrome: a case report

    OpenAIRE

    De Witte, E.; Wilssens, I.; De Surgeloose, D.; Dua, G.; Moens, M.; Verhoeven, J.; Manto, M; Marien, P.

    2017-01-01

    Background\\ud Cerebellar mutism syndrome (CMS) or posterior fossa syndrome (PFS) consists of a constellation of neuropsychiatric, neuropsychological and neurogenic speech and language deficits. It is most commonly observed in children after posterior fossa tumor surgery. The most prominent feature of CMS is mutism, which generally starts after a few days after the operation, has a limited duration and is typically followed by motor speech deficits. However, the core speech disorder subserving...

  10. Emil Kraepelin's dream speech: a psychoanalytic interpretation.

    Science.gov (United States)

    Engels, Huub; Heynick, Frank; van der Staak, Cees

    2003-10-01

    Freud's contemporary fellow psychiatrist Emil Kraepelin collected over the course of several decades some 700 specimens of speech in dreams, mostly his own, along with various concomitant data. These generally exhibit far more obvious primary-process influence than do the dream speech specimens found in Freud's corpus; but Kraepelin eschewed any depth-psychology interpretation. In this paper the authors first explore the respective orientations of Freud and Kraepelin to mind and brain, and normal and pathological phenomena, particularly as these relate to speech and dreaming. They then proceed, with the help of biographical sources, to analyze a selection of Kraepelin's deviant dream speech in the manner that was pioneered by Freud, most notably in his 'Autodidasker' dream. They find that Kraepelin's particular concern with the preservation of his rather uncommon family name--and with the preservation of his medical nomenclature, which lent prestige to that name--appears to provide a key link in a chain of associations for elucidating his dream speech specimens. They further suggest, more generally, that one's proper name, as a minimal characteristic of the ego during sleep, may prove to be a key in interpreting the dream speech of others as well.

  11. Predicting masking release of lateralized speech

    DEFF Research Database (Denmark)

    Chabot-Leclerc, Alexandre; MacDonald, Ewen; Dau, Torsten

    2016-01-01

    al., 2013, J. Acoust. Soc. Am. 130], which uses a short-term equalization-cancellation process to model binaural unmasking. In the conditions where informational masking (IM) was involved, the predicted SRTs were lower than the measured values because the model is blind to confusions experienced......Locsei et al. (2015) [Speech in Noise Workshop, Copenhagen, 46] measured ˝ speech reception thresholds (SRTs) in anechoic conditions where the target speech and the maskers were lateralized using interaural time delays. The maskers were speech-shaped noise (SSN) and reversed babble with 2, 4, or 8...... talkers. For a given interferer type, the number of maskers presented on the target’s side was varied, such that none, some, or all maskers were presented on the same side as the target. In general, SRTs did not vary significantly when at least one masker was presented on the same side as the target...

  12. Exploration of Speech Planning and Producing by Speech Error Analysis

    Institute of Scientific and Technical Information of China (English)

    冷卉

    2012-01-01

    Speech error analysis is an indirect way to discover speech planning and producing processes. From some speech errors made by people in their daily life, linguists and learners can reveal the planning and producing processes more easily and clearly.

  13. Esophageal speeches modified by the Speech Enhancer Program®

    OpenAIRE

    Manochiopinig, Sriwimon; Boonpramuk, Panuthat

    2014-01-01

    Esophageal speech appears to be the first choice of speech treatment for a laryngectomy. However, many laryngectomy people are unable to speak well. The aim of this study was to evaluate post-modified speech quality of Thai esophageal speakers using the Speech Enhancer Program®. The method adopted was to approach five speech–language pathologists to assess the speech accuracy and intelligibility of the words and continuing speech of the seven laryngectomy people. A comparison study was conduc...

  14. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  15. Ear, Hearing and Speech

    DEFF Research Database (Denmark)

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  16. Advances in Speech Recognition

    CERN Document Server

    Neustein, Amy

    2010-01-01

    This volume is comprised of contributions from eminent leaders in the speech industry, and presents a comprehensive and in depth analysis of the progress of speech technology in the topical areas of mobile settings, healthcare and call centers. The material addresses the technical aspects of voice technology within the framework of societal needs, such as the use of speech recognition software to produce up-to-date electronic health records, not withstanding patients making changes to health plans and physicians. Included will be discussion of speech engineering, linguistics, human factors ana

  17. Simulating the dual-peak excitation pattern produced by bipolar stimulation of a cochlear implant: effects on speech intelligibility.

    Science.gov (United States)

    Mesnildrey, Quentin; Macherey, Olivier

    2015-01-01

    Several electrophysiological and psychophysical studies have shown that the spatial excitation pattern produced by bipolar stimulation of a cochlear implant (CI) can have a dual-peak shape. The perceptual effects of this dual-peak shape were investigated using noise-vocoded CI simulations in which synthesis filters were designed to simulate the spread of neural activity produced by various electrode configurations, as predicted by a simple cochlear model. Experiments 1 and 2 tested speech recognition in the presence of a concurrent speech masker for various sets of single-peak and dual-peak synthesis filters and different numbers of channels. Similarly as results obtained in real CIs, both monopolar (MP, single-peak) and bipolar (BP + 1, dual-peak) simulations showed a plateau of performance above 8 channels. The benefit of increasing the number of channels was also lower for BP + 1 than for MP. This shows that channel interactions in BP + 1 become especially deleterious for speech intelligibility when a simulated electrode acts both as an active and as a return electrode for different channels because envelope information from two different analysis bands are being conveyed to the same spectral location. Experiment 3 shows that these channel interactions are even stronger in wide BP configuration (BP + 5), likely because the interfering speech envelopes are less correlated than in narrow BP + 1. Although the exact effects of dual- or multi-peak excitation in real CIs remain to be determined, this series of experiments suggest that multipolar stimulation strategies, such as bipolar or tripolar, should be controlled to avoid neural excitation in the vicinity of the return electrodes. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. A neural mechanism for recognizing speech spoken by different speakers.

    Science.gov (United States)

    Kreitewolf, Jens; Gaudrain, Etienne; von Kriegstein, Katharina

    2014-05-01

    Understanding speech from different speakers is a sophisticated process, particularly because the same acoustic parameters convey important information about both the speech message and the person speaking. How the human brain accomplishes speech recognition under such conditions is unknown. One view is that speaker information is discarded at early processing stages and not used for understanding the speech message. An alternative view is that speaker information is exploited to improve speech recognition. Consistent with the latter view, previous research identified functional interactions between the left- and the right-hemispheric superior temporal sulcus/gyrus, which process speech- and speaker-specific vocal tract parameters, respectively. Vocal tract parameters are one of the two major acoustic features that determine both speaker identity and speech message (phonemes). Here, using functional magnetic resonance imaging (fMRI), we show that a similar interaction exists for glottal fold parameters between the left and right Heschl's gyri. Glottal fold parameters are the other main acoustic feature that determines speaker identity and speech message (linguistic prosody). The findings suggest that interactions between left- and right-hemispheric areas are specific to the processing of different acoustic features of speech and speaker, and that they represent a general neural mechanism when understanding speech from different speakers. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Integrating HMM-Based Speech Recognition With Direct Manipulation In A Multimodal Korean Natural Language Interface

    CERN Document Server

    Lee, G; Kim, S; Lee, Geunbae; Lee, Jong-Hyeok; Kim, Sangeok

    1996-01-01

    This paper presents a HMM-based speech recognition engine and its integration into direct manipulation interfaces for Korean document editor. Speech recognition can reduce typical tedious and repetitive actions which are inevitable in standard GUIs (graphic user interfaces). Our system consists of general speech recognition engine called ABrain {Auditory Brain} and speech commandable document editor called SHE {Simple Hearing Editor}. ABrain is a phoneme-based speech recognition engine which shows up to 97% of discrete command recognition rate. SHE is a EuroBridge widget-based document editor that supports speech commands as well as direct manipulation interfaces.

  20. A brief overview of speech enhancement with linear filtering

    DEFF Research Database (Denmark)

    Benesty, Jacob; Christensen, Mads Græsbøll; Jensen, Jesper Rindom;

    2014-01-01

    In this paper, we provide an overview of some recently introduced principles and ideas for speech enhancement with linear filtering and explore how these are related and how they can be used in various applications. This is done in a general framework where the speech enhancement problem is stated......-to-noise ratio (SNR), and Wiener filters are derived from the conventional speech enhancement approach and the recently introduced orthogonal decomposition approach. For each of the filters, we derive their properties in terms of output SNR and speech distortion. We then demonstrate how the ideas can be applied...

  1. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Feeding Your 1- to 2-Year-Old Speech-Language Therapy KidsHealth > For Parents > Speech-Language Therapy A ... with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech disorder refers ...

  2. Liberalism, feminism and republicanism on freedom of speech: the cases of pornography and racist hate speech

    OpenAIRE

    Power Febres, C.

    2011-01-01

    The central issue tackled in this thesis is whether there is room for legitimate restrictions upon pornography and extreme right political organisations' racist hate speech; whether such restrictions can be made without breaching generally accepted liberal rights and within a democratic context. Both these forms of speech, identified as 'hard cases' in the literature, are presented as problems that political theorists should be concerned with. This concern stems from the increase in these ...

  3. Measuring speech sound development : An item response model approach

    NARCIS (Netherlands)

    Priester, Gertrude H.; Goorhuis - Brouwer, Siena

    2013-01-01

    Research aim: The primary aim of our study is to investigate if there is an ordering in the speech sound development of children aged 3-6, similar to the ordering in general language development. Method: The speech sound development of 1035 children was tested with a revised version of Logo-Articula

  4. Start-Up Rhetoric in Eight Speeches of Barack Obama

    Science.gov (United States)

    O'Connell, Daniel C.; Kowal, Sabine; Sabin, Edward J.; Lamia, John F.; Dannevik, Margaret

    2010-01-01

    Our purpose in the following was to investigate the start-up rhetoric employed by U.S. President Barack Obama in his speeches. The initial 5 min from eight of his speeches from May to September of 2009 were selected for their variety of setting, audience, theme, and purpose. It was generally hypothesized that Barack Obama, widely recognized for…

  5. A System for Detecting Miscues in Dyslexic Read Speech

    DEFF Research Database (Denmark)

    Rasmussen, Morten Højfeldt; Tan, Zheng-Hua; Lindberg, Børge

    2009-01-01

    While miscue detection in general is a well explored research field little attention has so far been paid to miscue detection in dyslexic read speech. This domain differs substantially from the domains that are commonly researched, as for example dyslexic read speech includes frequent regressions...

  6. The effects of bilingualism on children's perception of speech sounds

    NARCIS (Netherlands)

    Brasileiro, I.

    2009-01-01

    The general topic addressed by this dissertation is that of bilingualism, and more specifically, the topic of bilingual acquisition of speech sounds. The central question in this study is the following: does bilingualism affect children’s perceptual development of speech sounds? The term bilingual i

  7. Speech Compression for Noise-Corrupted Thai Expressive Speech

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In speech communication, speech coding aims at preserving the speech quality with lower coding bitrate. When considering the communication environment, various types of noises deteriorates the speech quality. The expressive speech with different speaking styles may cause different speech quality with the same coding method. Approach: This research proposed a study of speech compression for noise-corrupted Thai expressive speech by using two coding methods of CS-ACELP and MP-CELP. The speech material included a hundredmale speech utterances and a hundred female speech utterances. Four speaking styles included enjoyable, sad, angry and reading styles. Five sentences of Thai speech were chosen. Three types of noises were included (train, car and air conditioner. Five levels of each type of noise were varied from 0-20 dB. The subjective test of mean opinion score was exploited in the evaluation process. Results: The experimental results showed that CS-ACELP gave the better speech quality than that of MP-CELP at all three bitrates of 6000, 8600-12600 bps. When considering the levels of noise, the 20-dB noise gave the best speech quality, while 0-dB noise gave the worst speech quality. When considering the speech gender, female speech gave the better results than that of male speech. When considering the types of noise, the air-conditioner noise gave the best speech quality, while the train noise gave the worst speech quality. Conclusion: From the study, it can be seen that coding methods, types of noise, levels of noise, speech gender influence on the coding speech quality.

  8. Speech breathing in speakers who use an electrolarynx.

    Science.gov (United States)

    Bohnenkamp, Todd A; Stowell, Talena; Hesse, Joy; Wright, Simon

    2010-01-01

    Speakers who use an electrolarynx following a total laryngectomy no longer require pulmonary support for speech. Subsequently, chest wall movements may be affected; however, chest wall movements in these speakers are not well defined. The purpose of this investigation was to evaluate speech breathing in speakers who use an electrolarynx during speech and reading tasks. Six speakers who use an electrolarynx underwent an evaluation of chest wall kinematics (e.g., chest wall movements, temporal characteristics of chest wall movement), lung volumes, temporal measures of speech, and the interaction of linguistic influences on ventilation. Results of the present study were compared to previous reports in speakers who use an electrolarynx, as well as to previous reports in typical speakers. There were no significant differences in lung volumes used and the general movement of the chest wall by task; however, there were differences of note in the temporal aspects of chest wall configuration when compared to previous reports in both typical speakers and speakers who use an electrolarynx. These differences were related to timing and posturing of the chest wall. The lack of differences in lung volumes and chest wall movements by task indicates that neither reading nor spontaneous speech exerts a greater influence on speech breathing; however, the temporal and posturing results suggest the possibility of a decoupling of the respiratory system from speech following a total laryngectomy and subsequent alaryngeal speech rehabilitation. The reader will be able to understand and describe: (1) The primary differences in speech breathing across alaryngeal speech options; (2) how speech breathing specifically differs (i.e., lung volumes and chest wall movements) in speakers who use an electrolarynx; (3) How the coupling of speech and respiration is altered when pulmonary air is no longer used for speech. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  9. Language disorders in young children: when is speech therapy recommended?

    Science.gov (United States)

    Goorhuis-Brouwer, Siena M; Knijff, Wilma A

    2003-05-01

    Analysis of treatment recommendation given by speech therapists. Evaluation of the language abilities in the examined children and re-examination of those abilities after 12 months. Thirty-four children, aged between 2.0 and 5.3 years, referred to speech therapists by their General Practitioners because of possible language problems were included in a prospective study. The number of children receiving speech therapy and the number of speech therapy sessions received during 1 year, and the therapy effect on three quantitative language measures were compiled. In 97% of the children referred to a speech therapist, speech therapy was recommended. Most of these children showed average to above-average language scores on standardised tests for sentence development (61%) and language comprehension (79%). In addition, for most children spontaneous speech, as screened by the Groningen Diagnostic Speech Norms, was age-adequate (76%). The children's problems consisted of pronunciation difficulties or periods of stammering. After 12 months for 50% of these children speech therapy was still continued which means that the articulation problems still were present. The mean number of speech therapy sessions was 26.7. The language scores on the three language tests remained relatively stable over the 12-month interval. In young children pronunciation difficulties often lead to the recommendation for speech therapy. For a large number of children therapy takes more than a year, indicating that speech therapy cannot influence these problems to a great extent. In addition language scores remained relatively stable. Therefore, language problems and especially articulation problems in young children should be reconsidered regarding maturation and normal variations in speech motor development. A 'watchful waiting' approach should be taken more often.

  10. Internet images of the speech pathology profession.

    Science.gov (United States)

    Byrne, Nicole

    2017-06-05

    Objective The Internet provides the general public with information about speech pathology services, including client groups and service delivery models, as well as the professionals providing the services. Although this information assists the general public and other professionals to both access and understand speech pathology services, it also potentially provides information about speech pathology as a prospective career, including the types of people who are speech pathologists (i.e. demographics). The aim of the present study was to collect baseline data on how the speech pathology profession was presented via images on the Internet.Methods A pilot prospective observational study using content analysis methodology was conducted to analyse publicly available Internet images related to the speech pathology profession. The terms 'Speech Pathology' and 'speech pathologist' to represent both the profession and the professional were used, resulting in the identification of 200 images. These images were considered across a range of areas, including who was in the image (e.g. professional, client, significant other), the technology used and the types of intervention.Results The majority of images showed both a client and a professional (i.e. speech pathologist). While the professional was predominantly presented as female, the gender of the client was more evenly distributed. The clients were more likely to be preschool or school aged, however male speech pathologists were presented as providing therapy to selected age groups (i.e. school aged and younger adults). Images were predominantly of individual therapy and the few group images that were presented were all paediatric.Conclusion Current images of speech pathology continue to portray narrow professional demographics and client groups (e.g. paediatrics). Promoting images of wider scope to fully represent the depth and breadth of speech pathology professional practice may assist in attracting a more diverse group

  11. Computer-Assisted Analysis of Spontaneous Speech: Quantification of Basic Parameters in Aphasic and Unimpaired Language

    Science.gov (United States)

    Hussmann, Katja; Grande, Marion; Meffert, Elisabeth; Christoph, Swetlana; Piefke, Martina; Willmes, Klaus; Huber, Walter

    2012-01-01

    Although generally accepted as an important part of aphasia assessment, detailed analysis of spontaneous speech is rarely carried out in clinical practice mostly due to time limitations. The Aachener Sprachanalyse (ASPA; Aachen Speech Analysis) is a computer-assisted method for the quantitative analysis of German spontaneous speech that allows for…

  12. Tracking Speech Sound Acquisition

    Science.gov (United States)

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  13. Preschool Connected Speech Inventory.

    Science.gov (United States)

    DiJohnson, Albert; And Others

    This speech inventory developed for a study of aurally handicapped preschool children (see TM 001 129) provides information on intonation patterns in connected speech. The inventory consists of a list of phrases and simple sentences accompanied by pictorial clues. The test is individually administered by a teacher-examiner who presents the spoken…

  14. Illustrated Speech Anatomy.

    Science.gov (United States)

    Shearer, William M.

    Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

  15. Private Speech in Ballet

    Science.gov (United States)

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  16. Private Speech in Ballet

    Science.gov (United States)

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  17. Tracking Speech Sound Acquisition

    Science.gov (United States)

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  18. Free Speech Yearbook 1976.

    Science.gov (United States)

    Phifer, Gregg, Ed.

    The articles collected in this annual address several aspects of First Amendment Law. The following titles are included: "Freedom of Speech As an Academic Discipline" (Franklyn S. Haiman), "Free Speech and Foreign-Policy Decision Making" (Douglas N. Freeman), "The Supreme Court and the First Amendment: 1975-1976"…

  19. Preschool Connected Speech Inventory.

    Science.gov (United States)

    DiJohnson, Albert; And Others

    This speech inventory developed for a study of aurally handicapped preschool children (see TM 001 129) provides information on intonation patterns in connected speech. The inventory consists of a list of phrases and simple sentences accompanied by pictorial clues. The test is individually administered by a teacher-examiner who presents the spoken…

  20. Advertising and Free Speech.

    Science.gov (United States)

    Hyman, Allen, Ed.; Johnson, M. Bruce, Ed.

    The articles collected in this book originated at a conference at which legal and economic scholars discussed the issue of First Amendment protection for commercial speech. The first article, in arguing for freedom for commercial speech, finds inconsistent and untenable the arguments of those who advocate freedom from regulation for political…

  1. Free Speech. No. 38.

    Science.gov (United States)

    Kane, Peter E., Ed.

    This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds Voted For Schorr Inquiry" by Richard Lyons, "Erosion of the…

  2. Charisma in business speeches

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter

    2016-01-01

    to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences......Charisma is a key component of spoken language interaction; and it is probably for this reason that charismatic speech has been the subject of intensive research for centuries. However, what is still largely missing is a quantitative and objective line of research that, firstly, involves analyses...... of the acoustic-prosodic signal, secondly, focuses on business speeches like product presentations, and, thirdly, in doing so, advances the still fairly fragmentary evidence on the prosodic correlates of charismatic speech. We show that the prosodic features of charisma in political speeches also apply...

  3. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music.

    Science.gov (United States)

    Lee, Hweeling; Noppeney, Uta

    2014-01-01

    This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech, or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogs of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms). Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past 3 years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.

  4. Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment

    OpenAIRE

    Shrawankar, Urmila; Thakare, Vilas

    2010-01-01

    International audience; Noise is ubiquitous in almost all acoustic environments. The speech signal, that is recorded by a microphone is generally infected by noise originating from various sources. Such contamination can change the characteristics of the speech signals and degrade the speech quality and intelligibility, thereby causing significant harm to human-to-machine communication systems. Noise detection and reduction for speech applications is often formulated as a digital filtering pr...

  5. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...... from computational auditory scene analysis and further support the hypothesis that the SNRenv is a powerful metric for speech intelligibility prediction....

  6. Sperry Univac speech communications technology

    Science.gov (United States)

    Medress, Mark F.

    1977-01-01

    Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.

  7. Voice and Speech after Laryngectomy

    Science.gov (United States)

    Stajner-Katusic, Smiljka; Horga, Damir; Musura, Maja; Globlek, Dubravka

    2006-01-01

    The aim of the investigation is to compare voice and speech quality in alaryngeal patients using esophageal speech (ESOP, eight subjects), electroacoustical speech aid (EACA, six subjects) and tracheoesophageal voice prosthesis (TEVP, three subjects). The subjects reading a short story were recorded in the sound-proof booth and the speech samples…

  8. Environmental Contamination of Normal Speech.

    Science.gov (United States)

    Harley, Trevor A.

    1990-01-01

    Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…

  9. Dysfluencies in the speech of adults with intellectual disabilities and reported speech difficulties

    NARCIS (Netherlands)

    Coppens-Hofman, Marjolein C.; Terband, Hayo R.; Maassen, Ben A. M.; Lantman-De Valk, Henny M. J. van Schrojenstein; Hof, Yvonne Van Zaalen-op't; Snik, Ad F. M.

    2013-01-01

    Background: In individuals with an intellectual disability, speech dysfluencies are more common than in the general population. In clinical practice, these fluency disorders are generally diagnosed and treated as stuttering rather than cluttering. Purpose: To characterise the type of dysfluencies in

  10. Speech processing in mobile environments

    CERN Document Server

    Rao, K Sreenivasa

    2014-01-01

    This book focuses on speech processing in the presence of low-bit rate coding and varying background environments. The methods presented in the book exploit the speech events which are robust in noisy environments. Accurate estimation of these crucial events will be useful for carrying out various speech tasks such as speech recognition, speaker recognition and speech rate modification in mobile environments. The authors provide insights into designing and developing robust methods to process the speech in mobile environments. Covering temporal and spectral enhancement methods to minimize the effect of noise and examining methods and models on speech and speaker recognition applications in mobile environments.

  11. Global Freedom of Speech

    DEFF Research Database (Denmark)

    Binderup, Lars Grassme

    2007-01-01

    , as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...

  12. Neural Oscillations Carry Speech Rhythm through to Comprehension.

    Science.gov (United States)

    Peelle, Jonathan E; Davis, Matthew H

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners' processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging - particularly electroencephalography (EEG) and magnetoencephalography (MEG) - point to phase locking by ongoing cortical oscillations to low-frequency information (~4-8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain.

  13. Neural oscillations carry speech rhythm through to comprehension

    Directory of Open Access Journals (Sweden)

    Jonathan E Peelle

    2012-09-01

    Full Text Available A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging—particularly electroencephalography (EEG and magnetoencephalography (MEG—point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and on segment perception (i.e., that the perception of phonemes and words in connected speech are influenced by preceding speech rate. Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in additional recruitment of left hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain.

  14. Apraxia of speech and cerebellar mutism syndrome: a case report.

    Science.gov (United States)

    De Witte, E; Wilssens, I; De Surgeloose, D; Dua, G; Moens, M; Verhoeven, J; Manto, M; Mariën, P

    2017-01-01

    Cerebellar mutism syndrome (CMS) or posterior fossa syndrome (PFS) consists of a constellation of neuropsychiatric, neuropsychological and neurogenic speech and language deficits. It is most commonly observed in children after posterior fossa tumor surgery. The most prominent feature of CMS is mutism, which generally starts after a few days after the operation, has a limited duration and is typically followed by motor speech deficits. However, the core speech disorder subserving CMS is still unclear. This study investigates the speech and language symptoms following posterior fossa medulloblastoma surgery in a 12-year-old right-handed boy. An extensive battery of formal speech (DIAS = Diagnostic Instrument Apraxia of Speech) and language tests were administered during a follow-up of 6 weeks after surgery. Although the neurological and neuropsychological (affective, cognitive) symptoms of this patient are consistent with Schmahmann's syndrome, the speech and language symptoms were markedly different from what is typically described in the literature. In-depth analyses of speech production revealed features consistent with a diagnosis of apraxia of speech (AoS) while ataxic dysarthria was completely absent. In addition, language assessments showed genuine aphasic deficits as reflected by distorted language production and perception, wordfinding difficulties, grammatical disturbances and verbal fluency deficits. To the best of our knowledge this case might be the first example that clearly demonstrates that a higher level motor planning disorder (apraxia) may be the origin of disrupted speech in CMS. In addition, identification of non-motor linguistic disturbances during follow-up add to the view that the cerebellum not only plays a crucial role in the planning and execution of speech but also in linguistic processing. Whether the cerebellum has a direct or indirect role in motor speech planning needs to be further investigated.

  15. The Rhetoric in English Speech

    Institute of Scientific and Technical Information of China (English)

    马鑫

    2014-01-01

    English speech has a very long history and always attached importance of people highly. People usually give a speech in economic activities, political forums and academic reports to express their opinions to investigate or persuade others. English speech plays a rather important role in English literature. The distinct theme of speech should attribute to the rhetoric. It discusses parallelism, repetition and rhetorical question in English speech, aiming to help people appreciate better the charm of them.

  16. Acoustic Evidence for Phonologically Mismatched Speech Errors

    Science.gov (United States)

    Gormley, Andrea

    2015-01-01

    Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…

  17. College Students' Preference for Compressed Speech Lectures.

    Science.gov (United States)

    Primrose, Robert A.

    To test student reactions to compressed-speech lectures, tapes for a general education course in oral communication were compressed to 49 to 77 percent of original time. Students were permitted to check them out via a dial access retrieval system. Checkouts and use of tapes were compared with student grades at semester's end. No significant…

  18. Hereditary Rolandic Epilepsy and Speech Dyspraxia

    OpenAIRE

    J Gordon Millichap

    1995-01-01

    A syndrome of nocturnal oro-facio-brachial partial seizures, secondarily generalized partial seizures, centro-temporal epileptiform discharges, associated with oral and speech dyspraxia and cognitive impairment, is described in a family of 9 affected members in three generations reported from Austin Hospital, Heidelberg (Melbourne), the University of Melbourne, and the Royal Children’s Hospital, Melbourne, Australia.

  19. Freedom of Speech: A Selected, Annotated Basic Bibliography.

    Science.gov (United States)

    Tedford, Thomas L.

    This bibliography lists 36 books related to problems of freedom of speech. General sources (history, analyses, texts, and anthologies) are listed separately from those dealing with censorship of obscenity and pornography. Each entry is briefly annotated. (AA)

  20. Speech disturbances and gaze behavior during public speaking in subtypes of social phobia.

    Science.gov (United States)

    Hofmann, S G; Gerlach, A L; Wender, A; Roth, W T

    1997-01-01

    Twenty-four social phobics with public speaking anxiety and 25 nonphobic individuals (controls) gave a speech in front of two people. Subjective anxiety, gaze behavior, and speech disturbances were assessed. Based on subjects' fear ratings of social situations, phobics and controls were divided into the generalized and nongeneralized subtype. Results showed that generalized phobics reported the most, and nongeneralized controls the least anxiety during public speaking. All subjects had longer and more frequent eye contact when delivering a speech than when talking with an experimenter or sitting in front of an audience. Phobics showed more filled pauses, had longer silent pauses, paused more frequently, and spent more time pausing than controls when giving a speech. Generalized phobics spent more time pausing during their speech than the other subgroups (nongeneralized controls, generalized controls, and nongeneralized phobics). These results suggest that generalized phobics tended to shift attentional resources from speech production to other cognitive tasks.

  1. An introduction to silent speech interfaces

    CERN Document Server

    Freitas, João; Dias, Miguel Sales; Silva, Samuel

    2017-01-01

    This book provides a broad and comprehensive overview of the existing technical approaches in the area of silent speech interfaces (SSI), both in theory and in application. Each technique is described in the context of the human speech production process, allowing the reader to clearly understand the principles behind SSI in general and across different methods. Additionally, the book explores the combined use of different data sources, collected from various sensors, in order to tackle the limitations of simpler SSI approaches, addressing current challenges of this field. The book also provides information about existing SSI applications, resources and a simple tutorial on how to build an SSI.

  2. Anxiety and ritualized speech

    Science.gov (United States)

    Lalljee, Mansur; Cook, Mark

    1975-01-01

    The experiment examines the effects on a number of words that seem irrelevant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well' and 'you know'. (Editor)

  3. Anxiety and ritualized speech

    Science.gov (United States)

    Lalljee, Mansur; Cook, Mark

    1975-01-01

    The experiment examines the effects on a number of words that seem irrelevant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well' and 'you know'. (Editor)

  4. HATE SPEECH AS COMMUNICATION

    National Research Council Canada - National Science Library

    Gladilin Aleksey Vladimirovich

    2012-01-01

    The purpose of the paper is a theoretical comprehension of hate speech from communication point of view, on the one hand, and from the point of view of prejudice, stereotypes and discrimination on the other...

  5. Speech intelligibility in hospitals.

    Science.gov (United States)

    Ryherd, Erica E; Moeller, Michael; Hsu, Timothy

    2013-07-01

    Effective communication between staff members is key to patient safety in hospitals. A variety of patient care activities including admittance, evaluation, and treatment rely on oral communication. Surprisingly, published information on speech intelligibility in hospitals is extremely limited. In this study, speech intelligibility measurements and occupant evaluations were conducted in 20 units of five different U.S. hospitals. A variety of unit types and locations were studied. Results show that overall, no unit had "good" intelligibility based on the speech intelligibility index (SII > 0.75) and several locations found to have "poor" intelligibility (SII speech intelligibility across a variety of hospitals and unit types, offers some evidence of the positive impact of absorption on intelligibility, and identifies areas for future research.

  6. Speech disorders - children

    Science.gov (United States)

    ... this page: //medlineplus.gov/ency/article/001430.htm Speech disorders - children To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  7. Speech impairment (adult)

    Science.gov (United States)

    ... this page: //medlineplus.gov/ency/article/003204.htm Speech impairment (adult) To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  8. Recognizing GSM Digital Speech

    OpenAIRE

    Gallardo-Antolín, Ascensión; Peláez-Moreno, Carmen; Díaz-de-María, Fernando

    2005-01-01

    The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source c...

  9. Recognizing GSM Digital Speech

    OpenAIRE

    2005-01-01

    The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source c...

  10. Neural overlap in processing music and speech.

    Science.gov (United States)

    Peretz, Isabelle; Vuvan, Dominique; Lagrois, Marie-Élaine; Armony, Jorge L

    2015-03-19

    Neural overlap in processing music and speech, as measured by the co-activation of brain regions in neuroimaging studies, may suggest that parts of the neural circuitries established for language may have been recycled during evolution for musicality, or vice versa that musicality served as a springboard for language emergence. Such a perspective has important implications for several topics of general interest besides evolutionary origins. For instance, neural overlap is an important premise for the possibility of music training to influence language acquisition and literacy. However, neural overlap in processing music and speech does not entail sharing neural circuitries. Neural separability between music and speech may occur in overlapping brain regions. In this paper, we review the evidence and outline the issues faced in interpreting such neural data, and argue that converging evidence from several methodologies is needed before neural overlap is taken as evidence of sharing.

  11. Speech perception as complex auditory categorization

    Science.gov (United States)

    Holt, Lori L.

    2002-05-01

    Despite a long and rich history of categorization research in cognitive psychology, very little work has addressed the issue of complex auditory category formation. This is especially unfortunate because the general underlying cognitive and perceptual mechanisms that guide auditory category formation are of great importance to understanding speech perception. I will discuss a new methodological approach to examining complex auditory category formation that specifically addresses issues relevant to speech perception. This approach utilizes novel nonspeech sound stimuli to gain full experimental control over listeners' history of experience. As such, the course of learning is readily measurable. Results from this methodology indicate that the structure and formation of auditory categories are a function of the statistical input distributions of sound that listeners hear, aspects of the operating characteristics of the auditory system, and characteristics of the perceptual categorization system. These results have important implications for phonetic acquisition and speech perception.

  12. Neural overlap in processing music and speech

    Science.gov (United States)

    Peretz, Isabelle; Vuvan, Dominique; Lagrois, Marie-Élaine; Armony, Jorge L.

    2015-01-01

    Neural overlap in processing music and speech, as measured by the co-activation of brain regions in neuroimaging studies, may suggest that parts of the neural circuitries established for language may have been recycled during evolution for musicality, or vice versa that musicality served as a springboard for language emergence. Such a perspective has important implications for several topics of general interest besides evolutionary origins. For instance, neural overlap is an important premise for the possibility of music training to influence language acquisition and literacy. However, neural overlap in processing music and speech does not entail sharing neural circuitries. Neural separability between music and speech may occur in overlapping brain regions. In this paper, we review the evidence and outline the issues faced in interpreting such neural data, and argue that converging evidence from several methodologies is needed before neural overlap is taken as evidence of sharing. PMID:25646513

  13. Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

    Science.gov (United States)

    Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

    2016-10-01

    Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.

  14. Computer-based speech therapy for childhood speech sound disorders.

    Science.gov (United States)

    Furlong, Lisa; Erickson, Shane; Morris, Meg E

    2017-07-01

    With the current worldwide workforce shortage of Speech-Language Pathologists, new and innovative ways of delivering therapy to children with speech sound disorders are needed. Computer-based speech therapy may be an effective and viable means of addressing service access issues for children with speech sound disorders. To evaluate the efficacy of computer-based speech therapy programs for children with speech sound disorders. Studies reporting the efficacy of computer-based speech therapy programs were identified via a systematic, computerised database search. Key study characteristics, results, main findings and details of computer-based speech therapy programs were extracted. The methodological quality was evaluated using a structured critical appraisal tool. 14 studies were identified and a total of 11 computer-based speech therapy programs were evaluated. The results showed that computer-based speech therapy is associated with positive clinical changes for some children with speech sound disorders. There is a need for collaborative research between computer engineers and clinicians, particularly during the design and development of computer-based speech therapy programs. Evaluation using rigorous experimental designs is required to understand the benefits of computer-based speech therapy. The reader will be able to 1) discuss how computerbased speech therapy has the potential to improve service access for children with speech sound disorders, 2) explain the ways in which computer-based speech therapy programs may enhance traditional tabletop therapy and 3) compare the features of computer-based speech therapy programs designed for different client populations. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    Directory of Open Access Journals (Sweden)

    İlhan ERDEM

    2013-06-01

    Full Text Available Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered speech, language, medical and psychological conditions as well as acquisitions also be caused by many factors. Speaking, is the collective work of many organs, such as an orchestra. Mental dimension of the speech disorder which is a very complex skill so it must be found which of these obstacles inhibit conversation. Speech disorder is a defect in speech flow, rhythm, tizliğinde, beats, the composition and vocalization. In this study, speech disorders such as articulation disorders, stuttering, aphasia, dysarthria, a local dialect speech, , language and lip-laziness, rapid speech peech defects in a term of language skills. This causes of speech disorders were investigated and presented suggestions for remedy was discussed.

  16. Infant rule learning: advantage language, or advantage speech?

    Directory of Open Access Journals (Sweden)

    Hugh Rabagliati

    Full Text Available Infants appear to learn abstract rule-like regularities (e.g., la la da follows an AAB pattern more easily from speech than from a variety of other auditory and visual stimuli (Marcus et al., 2007. We test if that facilitation reflects a specialization to learn from speech alone, or from modality-independent communicative stimuli more generally, by measuring 7.5-month-old infants' ability to learn abstract rules from sign language-like gestures. Whereas infants appear to easily learn many different rules from speech, we found that with sign-like stimuli, and under circumstances comparable to those of Marcus et al. (1999, hearing infants were able to learn an ABB rule, but not an AAB rule. This is consistent with results of studies that demonstrate lower levels of infant rule learning from a variety of other non-speech stimuli, and we discuss implications for accounts of speech-facilitation.

  17. Learning Fault-tolerant Speech Parsing with SCREEN

    CERN Document Server

    Wermter, S; Wermter, Stefan; Weber, Volker

    1994-01-01

    This paper describes a new approach and a system SCREEN for fault-tolerant speech parsing. SCREEEN stands for Symbolic Connectionist Robust EnterprisE for Natural language. Speech parsing describes the syntactic and semantic analysis of spontaneous spoken language. The general approach is based on incremental immediate flat analysis, learning of syntactic and semantic speech parsing, parallel integration of current hypotheses, and the consideration of various forms of speech related errors. The goal for this approach is to explore the parallel interactions between various knowledge sources for learning incremental fault-tolerant speech parsing. This approach is examined in a system SCREEN using various hybrid connectionist techniques. Hybrid connectionist techniques are examined because of their promising properties of inherent fault tolerance, learning, gradedness and parallel constraint integration. The input for SCREEN is hypotheses about recognized words of a spoken utterance potentially analyzed by a spe...

  18. Practical speech user interface design

    CERN Document Server

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  19. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Speech-language pathologists (SLPs), often informally known as speech therapists, are professionals educated in the study of human ... Palate Hearing Evaluation in Children Going to a Speech Therapist Stuttering Hearing Impairment Speech Problems Cleft Lip and ...

  20. Speech processing using maximum likelihood continuity mapping

    Energy Technology Data Exchange (ETDEWEB)

    Hogden, John E. (Santa Fe, NM)

    2000-01-01

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  1. Speech processing using maximum likelihood continuity mapping

    Energy Technology Data Exchange (ETDEWEB)

    Hogden, J.E.

    2000-04-18

    Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

  2. Managing the reaction effects of speech disorders on speech ...

    African Journals Online (AJOL)

    Speech disorders is responsible for defective speaking. It is usually ... They occur as a result of persistent frustrations which speech defectives usually encounter for speaking defectively. This paper ... AJOL African Journals Online. HOW TO ...

  3. Under-resourced speech recognition based on the speech manifold

    CSIR Research Space (South Africa)

    Sahraeian, R

    2015-09-01

    Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...

  4. Perception of words and pitch patterns in song and speech

    Directory of Open Access Journals (Sweden)

    Julia eMerrill

    2012-03-01

    Full Text Available This fMRI study examines shared and distinct cortical areas involved in the auditory perception of song and speech at the level of their underlying constituents: words, pitch and rhythm. Univariate and multivariate analyses were performed on the brain activity patterns of six conditions, arranged in a subtractive hierarchy: sung sentences including words, pitch and rhythm; hummed speech prosody and song melody containing only pitch patterns and rhythm; as well as the pure musical or speech rhythm.Systematic contrasts between these balanced conditions following their hierarchical organization showed a great overlap between song and speech at all levels in the bilateral temporal lobe, but suggested a differential role of the inferior frontal gyrus (IFG and intraparietal sulcus (IPS in processing song and speech. The left IFG was involved in word- and pitch-related processing in speech, the right IFG in processing pitch in song.Furthermore, the IPS showed sensitivity to discrete pitch relations in song as opposed to the gliding pitch in speech. Finally, the superior temporal gyrus and premotor cortex coded for general differences between words and pitch patterns, irrespective of whether they were sung or spoken. Thus, song and speech share many features which are reflected in a fundamental similarity of brain areas involved in their perception. However, fine-grained acoustic differences on word and pitch level are reflected in the activity of IFG and IPS.

  5. Effects of pitch, level, and tactile cues on speech segregation

    Science.gov (United States)

    Drullman, Rob; Bronkhorst, Adelbert W.

    2003-04-01

    Sentence intelligibility for interfering speech was investigated as a function of level difference, pitch difference, and presence of tactile support. A previous study by the present authors [J. Acoust. Soc. Am. 111, 2432-2433 (2002)] had shown a small benefit of tactile support in the speech-reception threshold measured against a background of one to eight competing talkers. The present experiment focused on the effects of informational and energetic masking for one competing talker. Competing speech was obtained by manipulating the speech of the male target talker (different sentences). The PSOLA technique was used to increase the average pitch of competing speech by 2, 4, 8, or 12 semitones. Level differences between target and competing speech ranged from -16 to +4 dB. Tactile support (B&K 4810 shaker) was given to the index finger by presenting the temporal envelope of the low-pass-filtered speech (0-200 Hz). Sentences were presented diotically and the percentage of correctly perceived words was measured. Results show a significant overall increase in intelligibility score from 71% to 77% due to tactile support. Performance improves monotonically with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences.

  6. Evaluating airborne sound insulation in terms of speech intelligibility.

    Science.gov (United States)

    Park, H K; Bradley, J S; Gover, B N

    2008-03-01

    This paper reports on an evaluation of ratings of the sound insulation of simulated walls in terms of the intelligibility of speech transmitted through the walls. Subjects listened to speech modified to simulate transmission through 20 different walls with a wide range of sound insulation ratings, with constant ambient noise. The subjects' mean speech intelligibility scores were compared with various physical measures to test the success of the measures as sound insulation ratings. The standard Sound Transmission Class (STC) and Weighted Sound Reduction Index ratings were only moderately successful predictors of intelligibility scores, and eliminating the 8 dB rule from STC led to very modest improvements. Various previously established speech intelligibility measures (e.g., Articulation Index or Speech Intelligibility Index) and measures derived from them, such as the Articulation Class, were all relatively strongly related to speech intelligibility scores. In general, measures that involved arithmetic averages or summations of decibel values over frequency bands important for speech were most strongly related to intelligibility scores. The two most accurate predictors of the intelligibility of transmitted speech were an arithmetic average transmission loss over the frequencies from 200 to 2.5 kHz and the addition of a new spectrum weighting term to R(w) that included frequencies from 400 to 2.5 kHz.

  7. Intelligibility of speech of children with speech and sound disorders

    OpenAIRE

    Ivetac, Tina

    2014-01-01

    The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...

  8. Automatic speech recognition An evaluation of Google Speech

    OpenAIRE

    Stenman, Magnus

    2015-01-01

    The use of speech recognition is increasing rapidly and is now available in smart TVs, desktop computers, every new smart phone, etc. allowing us to talk to computers naturally. With the use in home appliances, education and even in surgical procedures accuracy and speed becomes very important. This thesis aims to give an introduction to speech recognition and discuss its use in robotics. An evaluation of Google Speech, using Google’s speech API, in regards to word error rate and translation ...

  9. Differential Diagnosis of Severe Speech Disorders Using Speech Gestures

    Science.gov (United States)

    Bahr, Ruth Huntley

    2005-01-01

    The differentiation of childhood apraxia of speech from severe phonological disorder is a common clinical problem. This article reports on an attempt to describe speech errors in children with childhood apraxia of speech on the basis of gesture use and acoustic analyses of articulatory gestures. The focus was on the movement of articulators and…

  10. Ordinal models of audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2011-01-01

    Audiovisual information is integrated in speech perception. One manifestation of this is the McGurk illusion in which watching the articulating face alters the auditory phonetic percept. Understanding this phenomenon fully requires a computational model with predictive power. Here, we describe...... ordinal models that can account for the McGurk illusion. We compare this type of models to the Fuzzy Logical Model of Perception (FLMP) in which the response categories are not ordered. While the FLMP generally fit the data better than the ordinal model it also employs more free parameters in complex...... experiments when the number of response categories are high as it is for speech perception in general. Testing the predictive power of the models using a form of cross-validation we found that ordinal models perform better than the FLMP. Based on these findings we suggest that ordinal models generally have...

  11. Frequent word section extraction in a presentation speech by an effective dynamic programming algorithm.

    Science.gov (United States)

    Itoh, Yoshiaki; Tanaka, Kazuyo

    2004-08-01

    Word frequency in a document has often been utilized in text searching and summarization. Similarly, identifying frequent words or phrases in a speech data set for searching and summarization would also be meaningful. However, obtaining word frequency in a speech data set is difficult, because frequent words are often special terms in the speech and cannot be recognized by a general speech recognizer. This paper proposes another approach that is effective for automatic extraction of such frequent word sections in a speech data set. The proposed method is applicable to any domain of monologue speech, because no language models or specific terms are required in advance. The extracted sections can be regarded as speech labels of some kind or a digest of the speech presentation. The frequent word sections are determined by detecting similar sections, which are sections of audio data that represent the same word or phrase. The similar sections are detected by an efficient algorithm, called Shift Continuous Dynamic Programming (Shift CDP), which realizes fast matching between arbitrary sections in the reference speech pattern and those in the input speech, and enables frame-synchronous extraction of similar sections. In experiments, the algorithm is applied to extract the repeated sections in oral presentation speeches recorded in academic conferences in Japan. The results show that Shift CDP successfully detects similar sections and identifies the frequent word sections in individual presentation speeches, without prior domain knowledge, such as language models and terms.

  12. Tackling the complexity in speech

    DEFF Research Database (Denmark)

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  13. Denial Denied: Freedom of Speech

    Directory of Open Access Journals (Sweden)

    Glen Newey

    2009-12-01

    Full Text Available Free speech is a widely held principle. This is in some ways surprising, since formal and informal censorship of speech is widespread, and rather different issues seem to arise depending on whether the censorship concerns who speaks, what content is spoken or how it is spoken. I argue that despite these facts, free speech can indeed be seen as a unitary principle. On my analysis, the core of the free speech principle is the denial of the denial of speech, whether to a speaker, to a proposition, or to a mode of expression. Underlying free speech is the principle of freedom of association, according to which speech is both a precondition of future association (e.g. as a medium for negotiation and a mode of association in its own right. I conclude by applying this account briefly to two contentious issues: hate speech and pornography.

  14. Tackling the complexity in speech

    DEFF Research Database (Denmark)

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  15. Speech spectrogram expert

    Energy Technology Data Exchange (ETDEWEB)

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  16. RECOGNISING SPEECH ACTS

    Directory of Open Access Journals (Sweden)

    Phyllis Kaburise

    2012-09-01

    Full Text Available Speech Act Theory (SAT, a theory in pragmatics, is an attempt to describe what happens during linguistic interactions. Inherent within SAT is the idea that language forms and intentions are relatively formulaic and that there is a direct correspondence between sentence forms (for example, in terms of structure and lexicon and the function or meaning of an utterance. The contention offered in this paper is that when such a correspondence does not exist, as in indirect speech utterances, this creates challenges for English second language speakers and may result in miscommunication. This arises because indirect speech acts allow speakers to employ various pragmatic devices such as inference, implicature, presuppositions and context clues to transmit their messages. Such devices, operating within the non-literal level of language competence, may pose challenges for ESL learners.

  17. Protection limits on free speech

    Institute of Scientific and Technical Information of China (English)

    李敏

    2014-01-01

    Freedom of speech is one of the basic rights of citizens should receive broad protection, but in the real context of China under what kind of speech can be protected and be restricted, how to grasp between state power and free speech limit is a question worth considering. People tend to ignore the freedom of speech and its function, so that some of the rhetoric cannot be demonstrated in the open debates.

  18. The University and Free Speech

    OpenAIRE

    Grcic, Joseph

    2014-01-01

    Free speech is a necessary condition for the growth of knowledge and the implementation of real and rational democracy. Educational institutions play a central role in socializing individuals to function within their society. Academic freedom is the right to free speech in the context of the university and tenure, properly interpreted, is a necessary component of protecting academic freedom and free speech.

  19. Designing speech for a recipient

    DEFF Research Database (Denmark)

    Fischer, Kerstin

    is investigated on three candidates for so-called ‘simplified registers’: speech to children (also called motherese or baby talk), speech to foreigners (also called foreigner talk) and speech to robots. The volume integrates research from various disciplines, such as psychology, sociolinguistics...

  20. ADMINISTRATIVE GUIDE IN SPEECH CORRECTION.

    Science.gov (United States)

    HEALEY, WILLIAM C.

    WRITTEN PRIMARILY FOR SCHOOL SUPERINTENDENTS, PRINCIPALS, SPEECH CLINICIANS, AND SUPERVISORS, THIS GUIDE OUTLINES THE MECHANICS OF ORGANIZING AND CONDUCTING SPEECH CORRECTION ACTIVITIES IN THE PUBLIC SCHOOLS. IT INCLUDES THE REQUIREMENTS FOR CERTIFICATION OF A SPEECH CLINICIAN IN MISSOURI AND DESCRIBES ESSENTIAL STEPS FOR THE DEVELOPMENT OF A…

  1. Scrambling-based speech encryption via compressed sensing

    Science.gov (United States)

    Zeng, Li; Zhang, Xiongwei; Chen, Liang; Fan, Zhangjun; Wang, Yonggang

    2012-12-01

    Conventional speech scramblers have three disadvantages, including heavy communication overhead, signal features underexploitation, and low attack resistance. In this study, we propose a scrambling-based speech encryption scheme via compressed sensing (CS). Distinguished from conventional scramblers, the above problems are solved in a unified framework by utilizing the advantages of CS. The presented encryption idea is general and easily applies to speech communication systems. Compared with the state-of-the-art methods, the proposed scheme provides lower residual intelligibility and greater cryptanalytic efforts. Meanwhile, it ensures desirable channel usage and notable resistibility to hostile attack. Extensive experimental results also confirm the effectiveness of the proposed scheme.

  2. Digitized Ethnic Hate Speech: Understanding Effects of Digital Media Hate Speech on Citizen Journalism in Kenya

    Science.gov (United States)

    Kimotho, Stephen Gichuhi; Nyaga, Rahab Njeri

    2016-01-01

    Ethnicity in Kenya permeates all spheres of life. However, it is in politics that ethnicity is most visible. Election time in Kenya often leads to ethnic competition and hatred, often expressed through various media. Ethnic hate speech characterized the 2007 general elections in party rallies and through text messages, emails, posters and…

  3. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    OpenAIRE

    2013-01-01

    Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered sp...

  4. Freedom of Speech and Hate Speech: an analysis of possible limits for freedom of speech

    National Research Council Canada - National Science Library

    Riva Sobrado de Freitas; Matheus Felipe de Castro

    2013-01-01

      In a view to determining the outlines of the Freedom of Speech and to specify its contents, we face hate speech as an offensive and repulsive manifestation, particularly directed to minority groups...

  5. The Speech Act Theory between Linguistics and Language Philosophy

    Directory of Open Access Journals (Sweden)

    Liviu-Mihail MARINESCU

    2006-10-01

    Full Text Available Of all the issues in the general theory of language usage, speech act theory has probably aroused the widest interest. Psychologists, forexample, have suggested that the acquisition of the concepts underlying speech acts may be a prerequisite for the acquisition of language in general,literary critics have looked to speech act theory for an illumination of textual subtleties or for an understanding of the nature of literary genres,anthropologists have hoped to find in the theory some account of the nature of magical incantations, philosophers have seen potential applications to,amongst other things, the status of ethical statements, while linguists have seen the notions of speech act theory as variously applicable to problemsin syntax, semantics, second language learning, and elsewhere.

  6. Digitized Ethnic Hate Speech: Understanding Effects of Digital Media Hate Speech on Citizen Journalism in Kenya

    Directory of Open Access Journals (Sweden)

    Stephen Gichuhi Kimotho

    2016-06-01

    Full Text Available Ethnicity in Kenya permeates all spheres of life. However, it is in politics that ethnicity is most visible. Election time in Kenya often leads to ethnic competition and hatred, often expressed through various media. Ethnic hate speech characterized the 2007 general elections in party rallies and through text messages, emails, posters and leaflets. This resulted in widespread skirmishes that left over 1200 people dead, and many displaced (KNHRC, 2008. In 2013, however, the new battle zone was the war of words on social media platform. More than any other time in Kenyan history, Kenyans poured vitriolic ethnic hate speech through digital media like Facebook, tweeter and blogs. Although scholars have studied the role and effects of the mainstream media like television and radio in proliferating the ethnic hate speech in Kenya (Michael Chege, 2008; Goldstein & Rotich, 2008a; Ismail & Deane, 2008; Jacqueline Klopp & Prisca Kamungi, 2007, little has been done in regard to social media.  This paper investigated the nature of digitized hate speech by: describing the forms of ethnic hate speech on social media in Kenya; the effects of ethnic hate speech on Kenyan’s perception of ethnic entities; ethnic conflict and ethics of citizen journalism. This study adopted a descriptive interpretive design, and utilized Austin’s Speech Act Theory, which explains use of language to achieve desired purposes and direct behaviour (Tarhom & Miracle, 2013. Content published between January and April 2013 from six purposefully identified blogs was analysed. Questionnaires were used to collect data from university students as they form a good sample of Kenyan population, are most active on social media and are drawn from all parts of the country. Qualitative data were analysed using NVIVO 10 software, while responses from the questionnaire were analysed using IBM SPSS version 21. The findings indicated that Facebook and Twitter were the main platforms used to

  7. Speech transmission index from running speech: A neural network approach

    Science.gov (United States)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  8. Relations among questionnaire and experience sampling measures of inner speech: A smartphone app study

    Directory of Open Access Journals (Sweden)

    Ben eAlderson-Day

    2015-04-01

    Full Text Available Inner speech is often reported to be a common and central part of inner experience, but its true prevalence is unclear. Many questionnaire-based measures appear to lack convergent validity and it has been claimed that they overestimate inner speech in comparison to experience sampling methods (which involve collecting data at random timepoints. The present study compared self-reporting of inner speech collected via a general questionnaire and experience sampling, using data from a custom-made smartphone app (Inner Life. Fifty-one university students completed a generalized self-report measure of inner speech (the Varieties of Inner Speech Questionnaire, or VISQ and responded to at least 7 random alerts to report on incidences of inner speech over a 2-week period. Correlations and pairwise comparisons were used to compare generalized endorsements and randomly-sampled scores for each VISQ subscale. Significant correlations were observed between general and randomly sampled measures for only 2 of the 4 VISQ subscales, and endorsements of inner speech with evaluative or motivational characteristics did not correlate at all across different measures. Endorsement of inner speech items was significantly lower for random sampling compared to generalized self-report, for all VISQ subscales. Exploratory analysis indicated that specific inner speech characteristics were also related to anxiety and future-oriented thinking.

  9. The influence of speech rate and accent on access and use of semantic information.

    Science.gov (United States)

    Sajin, Stanislav M; Connine, Cynthia M

    2017-04-01

    Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.

  10. Analytic study of the Tadoma method: effects of hand position on segmental speech perception.

    Science.gov (United States)

    Reed, C M; Durlach, N I; Braida, L D; Schultz, M C

    1989-12-01

    In the Tadoma method of communication, deaf-blind individuals receive speech by placing a hand on the face and neck of the talker and monitoring actions associated with speech production. Previous research has documented the speech perception, speech production, and linguistic abilities of highly experienced users of the Tadoma method. The current study was performed to gain further insight into the cues involved in the perception of speech segments through Tadoma. Small-set segmental identification experiments were conducted in which the subjects' access to various types of articulatory information was systematically varied by imposing limitations on the contact of the hand with the face. Results obtained on 3 deaf-blind, highly experienced users of Tadoma were examined in terms of percent-correct scores, information transfer, and reception of speech features for each of sixteen experimental conditions. The results were generally consistent with expectations based on the speech cues assumed to be available in the various hand positions.

  11. Corticomuscular coherence is tuned to the spontaneous rhythmicity of speech at 2-3 Hz.

    Science.gov (United States)

    Ruspantini, Irene; Saarinen, Timo; Belardinelli, Paolo; Jalava, Antti; Parviainen, Tiina; Kujala, Jan; Salmelin, Riitta

    2012-03-14

    Human speech features rhythmicity that frames distinctive, fine-grained speech patterns. Speech can thus be counted among rhythmic motor behaviors that generally manifest characteristic spontaneous rates. However, the critical neural evidence for tuning of articulatory control to a spontaneous rate of speech has not been uncovered. The present study examined the spontaneous rhythmicity in speech production and its relationship to cortex-muscle neurocommunication, which is essential for speech control. Our MEG results show that, during articulation, coherent oscillatory coupling between the mouth sensorimotor cortex and the mouth muscles is strongest at the frequency of spontaneous rhythmicity of speech at 2-3 Hz, which is also the typical rate of word production. Corticomuscular coherence, a measure of efficient cortex-muscle neurocommunication, thus reveals behaviorally relevant oscillatory tuning for spoken language.

  12. Low-Rank and Sparsity Analysis Applied to Speech Enhancement Via Online Estimated Dictionary

    Science.gov (United States)

    Sun, Pengfei; Qin, Jun

    2016-12-01

    We propose an online estimated dictionary based single channel speech enhancement algorithm, which focuses on low rank and sparse matrix decomposition. In this proposed algorithm, a noisy speech spectral matrix is considered as the summation of low rank background noise components and an activation of the online speech dictionary, on which both low rank and sparsity constraints are imposed. This decomposition takes the advantage of local estimated dictionary high expressiveness on speech components. The local dictionary can be obtained through estimating the speech presence probability by applying Expectation Maximal algorithm, in which a generalized Gamma prior for speech magnitude spectrum is used. The evaluation results show that the proposed algorithm achieves significant improvements when compared to four other speech enhancement algorithms.

  13. Optimization of Algorithm in Selectable Mode Vocoder%可选取模式声码器 SMV算法优化

    Institute of Scientific and Technical Information of China (English)

    夏晓峰

    2014-01-01

    Based on the consideration to further improve the speech quality , make it more practical and lower the code rate , this paper simplifies the mode/code rate, frame size and sample rate in SMV and in the voice range of short -term stability, it ade-quately increases the amount of coding frame length , that is to increase the sample rate to 11025hz from 8000hz, then on the simula-tion platform, the upgrading SMV algorithm’s encoding and decoding are programmed .According to the simulation result, there is an analysis and contrast of the speech quality and the code rate , which is proved that this algorithm can reduce the complexity of the decoding process , make it beneficial and reduce the cost to achieve the terminal , by reducing the coding rate of the high frequency components to improve the voice quality .%基于进一步提高语音质量和实用化、降低码率的考虑,采用了简化了SMV的模式/码率,在语音短时平稳性范围内适量加大编码帧的长度,把采样频率从8000 hz提高到11025 hz的方法,并在仿真平台上,对SMV编解码改进算法进行编程仿真。由仿真结果,对优化前后的语音质量和码率进行分析和比较,证明这种优化算法降低了解码程序的复杂度,使其有利于在终端的实现并降低成本,降低了编码率,丰富了高频成分,提高了语音质量。

  14. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

    OpenAIRE

    Byeongwook Lee; Kwang-Hyun Cho

    2016-01-01

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintai...

  15. Global Freedom of Speech

    DEFF Research Database (Denmark)

    Binderup, Lars Grassme

    2007-01-01

    , as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...

  16. Speech and Hearing Therapy.

    Science.gov (United States)

    Sakata, Reiko; Sakata, Robert

    1978-01-01

    In the public school, the speech and hearing therapist attempts to foster child growth and development through the provision of services basic to awareness of self and others, management of personal and social interactions, and development of strategies for coping with the handicap. (MM)

  17. Perceptual learning in speech

    NARCIS (Netherlands)

    Norris, D.; McQueen, J.M.; Cutler, A.

    2003-01-01

    This study demonstrates that listeners use lexical knowledge in perceptual learning of speech sounds. Dutch listeners first made lexical decisions on Dutch words and nonwords. The final fricative of 20 critical words had been replaced by an ambiguous sound, between [f] and [s]. One group of listener

  18. Speech and Language Delay

    Science.gov (United States)

    ... home affect my child’s language and speech?The brain has to work harder to interpret and use 2 languages, so it may take longer for children to start using either one or both of the languages they’re learning. It’s not unusual for a bilingual child to ...

  19. Mandarin Visual Speech Information

    Science.gov (United States)

    Chen, Trevor H.

    2010-01-01

    While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…

  20. Speech After Banquet

    Science.gov (United States)

    Yang, Chen Ning

    2013-05-01

    I am usually not so short of words, but the previous speeches have rendered me really speechless. I have known and admired the eloquence of Freeman Dyson, but I did not know that there is a hidden eloquence in my colleague George Sterman...

  1. Speech disfluency in centenarians.

    Science.gov (United States)

    Searl, Jeffrey P; Gabel, Rodney M; Fulks, J Steven

    2002-01-01

    Other than a single case presentation of a 105-year-old female, no other studies have addressed the speech fluency characteristics of centenarians. The purpose of this study was to provide descriptive information on the fluency characteristics of speakers between the ages of 100-103 years. Conversational speech samples from seven speakers were evaluated for the frequency and types of disfluencies and speech rate. The centenarian speakers had a disfluency rate similar to that reported for 70-, 80-, and early 90-year-olds. The types of disfluencies observed also were similar to those reported for younger elderly speakers (primarily whole word/phrase, or formulative fluency breaks). Finally, the speech rate data for the current group of speakers supports prior literature reports of a slower rate with advancing age, but extends the finding to centenarians. As a result of this activity, participants will be able to: (1) describe the frequency of disfluency breaks and the types of disfluencies exhibited by centenarian speakers, (2) describe the mean and range of speaking rates in centenarians, and (3) compare the present findings for centenarians to the fluency and speaking rate characteristics reported in the literature.

  2. Mandarin Visual Speech Information

    Science.gov (United States)

    Chen, Trevor H.

    2010-01-01

    While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…

  3. The Commercial Speech Doctrine.

    Science.gov (United States)

    Luebke, Barbara F.

    In its 1942 ruling in the "Valentine vs. Christensen" case, the Supreme Court established the doctrine that commercial speech is not protected by the First Amendment. In 1975, in the "Bigelow vs. Virginia" case, the Supreme Court took a decisive step toward abrogating that doctrine, by ruling that advertising is not stripped of…

  4. Ingressive speech errors: a service evaluation of speech-sound therapy in a child aged 4;6.

    Science.gov (United States)

    Hrastelj, Laura; Knight, Rachael-Anne

    2017-07-01

    A pattern of ingressive substitutions for word-final sibilants can be identified in a small number of cases in child speech disorder, with growing evidence suggesting it is a phonological difficulty, despite the unusual surface form. Phonological difficulty implies a problem with the cognitive process of organizing speech into sound contrasts. To evaluate phonological therapy approaches in the remediation of non-pulmonic speech errors. Thus, adding to evidence concerning the nature of ingressive substitutions and their remediation whilst highlighting their occurrence within child speech disorder population for practising and training speech and language therapists. Child KO, a boy aged 4;6, was identified through a screening of speech, language and communication needs at his school. Word-final, non-pulmonic-egressive substitutes for fricatives and plosives were identified using the Diagnostic Evaluation of Articulation and Phonology (DEAP). Treatment took place in five, weekly school-based sessions with a care-giver present, and targeted two phonemes /f/ and /ʃ/ in word-final position. Word-final /s/ was monitored throughout to capture any change in other word-final fricatives. Phonemes /ɡ/ and /p/ were used as controls, as no change was expected in word-final plosives as a result of therapy targeting fricatives. Production of single words in the DEAP, pre- and post-therapy were transcribed by two independent therapists, (transcription agreement was 86.6% (pre) and 83.7% (post), with all 140 consonants within the DEAP transcribed), and change in consonants correct was analysed using a Wilcoxon test. Picture description tasks and telling of familiar stories were videoed post-therapy to analyse use of word-final fricative egression in connected speech. Percentage consonants correct in single-words post-treatment was significantly higher than pre-treatment at single-word level. Generalization of target fricatives into connected speech and modest generalization of

  5. SPEECH VISUALIZATION SISTEM AS A BASIS FOR SPEECH TRAINING AND COMMUNICATION AIDS

    Directory of Open Access Journals (Sweden)

    Oliana KRSTEVA

    1997-09-01

    Full Text Available One receives much more information through a visual sense than through a tactile one. However, most visual aids for hearing-impaired persons are not wearable because it is difficult to make them compact and it is not a best way to mask always their vision.Generally it is difficult to get the integrated patterns by a single mathematical transform of signals, such as a Foruier transform. In order to obtain the integrated pattern speech parameters should be carefully extracted by an analysis according as each parameter, and a visual pattern, which can intuitively be understood by anyone, must be synthesized from them. Successful integration of speech parameters will never disturb understanding of individual features, so that the system can be used for speech training and communication.

  6. Conversation, speech acts, and memory.

    Science.gov (United States)

    Holtgraves, Thomas

    2008-03-01

    Speakers frequently have specific intentions that they want others to recognize (Grice, 1957). These specific intentions can be viewed as speech acts (Searle, 1969), and I argue that they play a role in long-term memory for conversation utterances. Five experiments were conducted to examine this idea. Participants in all experiments read scenarios ending with either a target utterance that performed a specific speech act (brag, beg, etc.) or a carefully matched control. Participants were more likely to falsely recall and recognize speech act verbs after having read the speech act version than after having read the control version, and the speech act verbs served as better recall cues for the speech act utterances than for the controls. Experiment 5 documented individual differences in the encoding of speech act verbs. The results suggest that people recognize and retain the actions that people perform with their utterances and that this is one of the organizing principles of conversation memory.

  7. Metaheuristic applications to speech enhancement

    CERN Document Server

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  8. Relationship between speech motor control and speech intelligibility in children with speech sound disorders.

    Science.gov (United States)

    Namasivayam, Aravind Kumar; Pukonen, Margit; Goshulak, Debra; Yu, Vickie Y; Kadis, Darren S; Kroll, Robert; Pang, Elizabeth W; De Nil, Luc F

    2013-01-01

    The current study was undertaken to investigate the impact of speech motor issues on the speech intelligibility of children with moderate to severe speech sound disorders (SSD) within the context of the PROMPT intervention approach. The word-level Children's Speech Intelligibility Measure (CSIM), the sentence-level Beginner's Intelligibility Test (BIT) and tests of speech motor control and articulation proficiency were administered to 12 children (3:11 to 6:7 years) before and after PROMPT therapy. PROMPT treatment was provided for 45 min twice a week for 8 weeks. Twenty-four naïve adult listeners aged 22-46 years judged the intelligibility of the words and sentences. For CSIM, each time a recorded word was played to the listeners they were asked to look at a list of 12 words (multiple-choice format) and circle the word while for BIT sentences, the listeners were asked to write down everything they heard. Words correctly circled (CSIM) or transcribed (BIT) were averaged across three naïve judges to calculate percentage speech intelligibility. Speech intelligibility at both the word and sentence level was significantly correlated with speech motor control, but not articulatory proficiency. Further, the severity of speech motor planning and sequencing issues may potentially be a limiting factor in connected speech intelligibility and highlights the need to target these issues early and directly in treatment. The reader will be able to: (1) outline the advantages and disadvantages of using word- and sentence-level speech intelligibility tests; (2) describe the impact of speech motor control and articulatory proficiency on speech intelligibility; and (3) describe how speech motor control and speech intelligibility data may provide critical information to aid treatment planning. Copyright © 2013 Elsevier Inc. All rights reserved.

  9. A Mobile Phone based Speech Therapist

    OpenAIRE

    Pandey, Vinod K.; Pande, Arun; Kopparapu, Sunil Kumar

    2016-01-01

    Patients with articulatory disorders often have difficulty in speaking. These patients need several speech therapy sessions to enable them speak normally. These therapy sessions are conducted by a specialized speech therapist. The goal of speech therapy is to develop good speech habits as well as to teach how to articulate sounds the right way. Speech therapy is critical for continuous improvement to regain normal speech. Speech therapy sessions require a patient to travel to a hospital or a ...

  10. The impact of voice on speech realization

    Directory of Open Access Journals (Sweden)

    Jelka Breznik

    2014-12-01

    Full Text Available The study discusses spoken literary language and the impact of voice on speech realization. The voice consists of a sound made by a human being using the vocal folds for talking, singing, laughing, crying, screaming… The human voice is specifically the part of human sound production in which the vocal folds (vocal cords are the primary sound source. Our voice is our instrument and identity card. How does the voice (voice tone affect others and how do they respond, positively or negatively? How important is voice (voice tone in communication process? The study presents how certain individuals perceive voice. The results of the research on the relationships between the spoken word, excellent speaker, voice and description / definition / identification of specific voices done by experts in the field of speech and voice as well as non-professionals are presented. The study encompasses two focus groups. One consists of amateurs (non-specialists in the field of speech or voice who have no knowledge in this field and the other consists of professionals who work with speech or language or voice. The questions were intensified from general to specific, directly related to the topic. The purpose of such a method of questioning was to create relaxed atmosphere, promote discussion, allow participants to interact, complement, and to set up self-listening and additional comments.

  11. On Training Targets for Supervised Speech Separation

    Science.gov (United States)

    Wang, Yuxuan; Narayanan, Arun; Wang, DeLiang

    2014-01-01

    Formulation of speech separation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised learning algorithm, typically a deep neural network, is trained to learn a mapping from noisy features to a time-frequency representation of the target of interest. Traditionally, the ideal binary mask (IBM) is used as the target because of its simplicity and large speech intelligibility gains. The supervised learning framework, however, is not restricted to the use of binary targets. In this study, we evaluate and compare separation results by using different training targets, including the IBM, the target binary mask, the ideal ratio mask (IRM), the short-time Fourier transform spectral magnitude and its corresponding mask (FFT-MASK), and the Gammatone frequency power spectrum. Our results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics. In addition, we find that masking based targets, in general, are significantly better than spectral envelope based targets. We also present comparisons with recent methods in non-negative matrix factorization and speech enhancement, which show clear performance advantages of supervised speech separation. PMID:25599083

  12. Speech level shift in Japanese and Slovene

    Directory of Open Access Journals (Sweden)

    Jasmina BAJRAMI

    2016-12-01

    Full Text Available In verbal communication, we always aim to establish and maintain harmonious relations with others. Proper use of expressions and the choice of the way we speak are closely connected with politeness. In Japanese speech level is a level of formality or politeness in conversation, which is expressed by the use of linguistic forms (formal vs. informal within and at the end of an utterance and the use of honorific expressions. In Slovene the level of formality or politeness in conversation is mainly expressed by the use of formal language and general colloquial language. Speech level shift is a shift from one speech level to another – e.g. from a formal style to an informal, etc. According to previous research, these shifts express speaker's psychological distance and a change of attitude towards a hearer. In this paper I will first briefly present the theoretical framework of politeness and an outline of speech levels in Japanese and Slovene. I will then present the data and the method used in this study. Finally, I will present and discuss the results of the analysis of both Japanese and Slovene conversation.

  13. Pupils with social anxiety and speech disorders

    OpenAIRE

    Podlogar, Petra

    2013-01-01

    The diploma thesis presents all important literature that deals with socially anxious children who have speech disorders. The term social anxiety is defined based on anxiety in general. There are different interpretations of both terms. But the fact is that we talk about social anxiety when a person experiences fear, tension in social situations. He has a feeling that others value and test him whereas he will not be able to meet their standards. He is afraid and expects criticism, reprima...

  14. Multilingual Vocabularies in Automatic Speech Recognition

    Science.gov (United States)

    2000-08-01

    monolingual (a few thousands) is an obstacle to a full generalization of the inventories, then moved to the multilingual case. In the approach towards the...language. of multilingual models than the monolingual models, and it was specifically observed in the test with Spanish utterances. In fact...UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP010389 TITLE: Multilingual Vocabularies in Automatic Speech Recognition

  15. Speech Intelligibility and Hearing Protector Selection

    Science.gov (United States)

    2016-08-29

    Suter, 1989b). Speech sounds are called phonemes. These phonemes can be divided into two groups, vowels and consonants. Vowel sounds tend to...which consist of words that contain all the phonetic elements of connected English discourse in their normal proportion to one another, are common in...and are generally of the form consonant- vowel -consonant (CVC). The lists were generated to form 50 related ensembles, each ensemble consisting of 6

  16. Rhetorical Flaws in Brutus’ Forum Speech in Julius Caesar: A Carefully Controlled Weakness?

    Directory of Open Access Journals (Sweden)

    Dominic Cheetham

    2017-06-01

    Full Text Available In Julius Caesar Shakespeare reproduces one of the pivotal moments in European history. Brutus and Mark Antony, through the medium of their forum speeches, compete for the support of the people of Rome. In the play, as in history, Mark Antony wins this contest of language. Critics are generally agreed that Antony has the better speech, but also that Brutus’ speech is still exceptionally good. Traditionally the question of how Antony’s speech is superior is argued by examining differences between the two speeches, however, this approach has not resulted in any critical consensus. This paper takes the opening lines of the speeches as the only point of direct convergence between the content and the rhetorical forms used by Brutus and Antony and argues that Brutus’ opening tricolon is structurally inferior to Marc Antony’s. Analysis of the following rhetorical schemes in Brutus’ speech reveals further structural weaknesses. Shakespeare gives Brutus a speech rich in perceptually salient rhetorical schemes but introduces small, less salient, structural weaknesses into those schemes. The tightly structured linguistic patterns which make up the majority of Brutus’ speech give an impression of great rhetorical skill. This skilful impression obscures the minor faults or weaknesses that quietly and subtly reduce the overall power of the speech. By identifying the weaknesses in Brutus’ forms we add an extra element to the discussion of these speeches and at the same time display how subtly and effectively Shakespeare uses rhetorical forms to control audience response and appreciation.

  17. Detecting self-produced speech errors before and after articulation: An ERP investigation

    Directory of Open Access Journals (Sweden)

    Kevin Michael Trewartha

    2013-11-01

    Full Text Available It has been argued that speech production errors are monitored by the same neural system involved in monitoring other types of action errors. Behavioral evidence has shown that speech errors can be detected and corrected prior to articulation, yet the neural basis for such pre-articulatory speech error monitoring is poorly understood. The current study investigated speech error monitoring using a phoneme-substitution task known to elicit speech errors. Stimulus-locked event-related potential (ERP analyses comparing correct and incorrect utterances were used to assess pre-articulatory error monitoring and response-locked ERP analyses were used to assess post-articulatory monitoring. Our novel finding in the stimulus-locked analysis revealed that words that ultimately led to a speech error were associated with a larger P2 component at midline sites (FCz, Cz, and CPz. This early positivity may reflect the detection of an error in speech formulation, or a predictive mechanism to signal the potential for an upcoming speech error. The data also revealed that general conflict monitoring mechanisms are involved during this task as both correct and incorrect responses elicited an anterior N2 component typically associated with conflict monitoring. The response-locked analyses corroborated previous observations that self-produced speech errors led to a fronto-central ERN. These results demonstrate that speech errors can be detected prior to articulation, and that speech error monitoring relies on a central error monitoring mechanism.

  18. Attention mechanisms and the mosaic evolution of speech

    Directory of Open Access Journals (Sweden)

    Pedro Tiago Martins

    2014-12-01

    Full Text Available There is still no categorical answer for why humans, and no other species, have speech, or why speech is the way it is. Several purely anatomical arguments have been put forward, but they have been shown to be false, biologically implausible, or of limited scope. This perspective paper supports the idea that evolutionary theories of speech could benefit from a focus on the cognitive mechanisms that make speech possible, for which antecedents in evolutionary history and brain correlates can be found. This type of approach is part of a very recent, but rapidly growing tradition, which has provided crucial insights on the nature of human speech by focusing on the biological bases of vocal learning. Here, we call attention to what might be an important ingredient for speech. We contend that a general mechanism of attention, which manifests itself not only in visual but also auditory (and possibly other modalities, might be one of the key pieces of human speech, in addition to the mechanisms underlying vocal learning, and the pairing of facial gestures with vocalic units.

  19. Phonological awareness intervention for children with childhood apraxia of speech.

    Science.gov (United States)

    Moriarty, Brigid C; Gillon, Gail T

    2006-01-01

    To investigate the effectiveness of an integrated phonological awareness intervention to improve the speech production, phonological awareness and printed word decoding skills for three children with childhood apraxia of speech (CAS) aged 7;3, 6;3 and 6;10. The three children presented with severely delayed phonological awareness skills before intervention. In consideration for the heterogeneity in the population with CAS, the study employed a multiple single-subject design with repeated measures. Baseline and post-intervention measures for speech, phonological awareness and decoding were compared. Each child received intervention for three 45-min sessions per week for 3 weeks (approximately 7 h of individual treatment). Sessions focused on developing phoneme awareness, linking graphemes to phonemes and providing opportunities for targeted speech production practice. Phonological awareness activities were linked with each child's speech production goals. Two participants significantly improved target speech and phonological awareness skills during intervention. These participants also generalized the phonological awareness skills from trained to untrained items and were able to transfer newly acquired knowledge to improved performance on a non-word reading task. The results suggest that integrated phonological awareness intervention may be an effective method simultaneously to treat speech production, phonological awareness and decoding skills in some children with CAS. The findings are discussed within the context of the phonological representational theory of CAS.

  20. Behavioural, computational, and neuroimaging studies of acquired apraxia of speech

    Directory of Open Access Journals (Sweden)

    Kirrie J Ballard

    2014-11-01

    Full Text Available A critical examination of speech motor control depends on an in-depth understanding of network connectivity associated with Brodmann areas 44 and 45 and surrounding cortices. Damage to these areas has been associated with two conditions - the speech motor programming disorder apraxia of speech (AOS and the linguistic / grammatical disorder of Broca’s aphasia. Here we focus on AOS, which is most commonly associated with damage to posterior Broca's area and adjacent cortex. We provide an overview of our own studies into the nature of AOS, including behavioral and neuroimaging methods, to explore components of the speech motor network that are associated with normal and disordered speech motor programming in AOS. Behavioral, neuroimaging, and computational modeling studies are indicating that AOS is associated with impairment in learning feedforward models and/or implementing feedback mechanisms and with the functional contribution of BA6. While functional connectivity methods are not yet routinely applied to the study of AOS, we highlight the need for focusing on the functional impact of localised lesions throughout the speech network, as well as larger scale comparative studies to distinguish the unique behavioral and neurological signature of AOS. By coupling these methods with neural network models, we have a powerful set of tools to improve our understanding of the neural mechanisms that underlie AOS, and speech production generally.

  1. Social and linguistic factors influencing adaptation in children's speech.

    Science.gov (United States)

    Street, R L; Cappella, J N

    1989-09-01

    The ability to appropriately reciprocate or compensate a partner's communicative response represents an essential element of communicative competence. Previous research indicates that as children grow older, their speech levels reflect greater adaptation relative to their partner's speech. In this study, we argue that patterns of adaptation are related to specific linguistic and pragmatic abilities, such as verbal responsiveness, involvement in the interaction, and the production of relatively complex syntactic structures. Thirty-seven children (3-6 years of age) individually interacted with an adult for 20 to 30 minutes. Adaptation between child and adult was examined among conversational floortime, response latency, and speech rate. Three conclusions were drawn from the results of this investigation. First, by applying time-series analysis to the interactants' speech behaviors within each dyad, individual measures of the child's adaptations to the adult's speech can be generated. Second, consistent with findings in the adult domain, these children generally reciprocated changes in the adult's speech rate and response latency. Third, there were differences in degree and type of adaptation within specific dyads. Chronological age was not useful in accounting for this individual variation, but specific linguistic and social abilities were. Implications of these findings for the development of communicative competence and for the study of normal versus language-delayed speech were discussed.

  2. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    Science.gov (United States)

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  3. Sensorimotor Interactions in Speech Learning

    Directory of Open Access Journals (Sweden)

    Douglas M Shiller

    2011-10-01

    Full Text Available Auditory input is essential for normal speech development and plays a key role in speech production throughout the life span. In traditional models, auditory input plays two critical roles: 1 establishing the acoustic correlates of speech sounds that serve, in part, as the targets of speech production, and 2 as a source of feedback about a talker's own speech outcomes. This talk will focus on both of these roles, describing a series of studies that examine the capacity of children and adults to adapt to real-time manipulations of auditory feedback during speech production. In one study, we examined sensory and motor adaptation to a manipulation of auditory feedback during production of the fricative “s”. In contrast to prior accounts, adaptive changes were observed not only in speech motor output but also in subjects' perception of the sound. In a second study, speech adaptation was examined following a period of auditory–perceptual training targeting the perception of vowels. The perceptual training was found to systematically improve subjects' motor adaptation response to altered auditory feedback during speech production. The results of both studies support the idea that perceptual and motor processes are tightly coupled in speech production learning, and that the degree and nature of this coupling may change with development.

  4. On model architecture for a children's speech recognition interactive dialog system

    OpenAIRE

    Kraleva, Radoslava; Kralev, Velin

    2016-01-01

    This report presents a general model of the architecture of information systems for the speech recognition of children. It presents a model of the speech data stream and how it works. The result of these studies and presented veins architectural model shows that research needs to be focused on acoustic-phonetic modeling in order to improve the quality of children's speech recognition and the sustainability of the systems to noise and changes in transmission environment. Another important aspe...

  5. Multilevel Analysis in Analyzing Speech Data

    Science.gov (United States)

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  6. Hate Speech: Power in the Marketplace.

    Science.gov (United States)

    Harrison, Jack B.

    1994-01-01

    A discussion of hate speech and freedom of speech on college campuses examines the difference between hate speech from normal, objectionable interpersonal comments and looks at Supreme Court decisions on the limits of student free speech. Two cases specifically concerning regulation of hate speech on campus are considered: Chaplinsky v. New…

  7. Negative blood oxygen level dependent signals during speech comprehension.

    Science.gov (United States)

    Rodriguez Moreno, Diana; Schiff, Nicholas D; Hirsch, Joy

    2015-05-01

    Speech comprehension studies have generally focused on the isolation and function of regions with positive blood oxygen level dependent (BOLD) signals with respect to a resting baseline. Although regions with negative BOLD signals in comparison to a resting baseline have been reported in language-related tasks, their relationship to regions of positive signals is not fully appreciated. Based on the emerging notion that the negative signals may represent an active function in language tasks, the authors test the hypothesis that negative BOLD signals during receptive language are more associated with comprehension than content-free versions of the same stimuli. Regions associated with comprehension of speech were isolated by comparing responses to passive listening to natural speech to two incomprehensible versions of the same speech: one that was digitally time reversed and one that was muffled by removal of high frequencies. The signal polarity was determined by comparing the BOLD signal during each speech condition to the BOLD signal during a resting baseline. As expected, stimulation-induced positive signals relative to resting baseline were observed in the canonical language areas with varying signal amplitudes for each condition. Negative BOLD responses relative to resting baseline were observed primarily in frontoparietal regions and were specific to the natural speech condition. However, the BOLD signal remained indistinguishable from baseline for the unintelligible speech conditions. Variations in connectivity between brain regions with positive and negative signals were also specifically related to the comprehension of natural speech. These observations of anticorrelated signals related to speech comprehension are consistent with emerging models of cooperative roles represented by BOLD signals of opposite polarity.

  8. On pre-image iterations for speech enhancement.

    Science.gov (United States)

    Leitner, Christina; Pernkopf, Franz

    2015-01-01

    In this paper, we apply kernel PCA for speech enhancement and derive pre-image iterations for speech enhancement. Both methods make use of a Gaussian kernel. The kernel variance serves as tuning parameter that has to be adapted according to the SNR and the desired degree of de-noising. We develop a method to derive a suitable value for the kernel variance from a noise estimate to adapt pre-image iterations to arbitrary SNRs. In experiments, we compare the performance of kernel PCA and pre-image iterations in terms of objective speech quality measures and automatic speech recognition. The speech data is corrupted by white and colored noise at 0, 5, 10, and 15 dB SNR. As a benchmark, we provide results of the generalized subspace method, of spectral subtraction, and of the minimum mean-square error log-spectral amplitude estimator. In terms of the scores of the PEASS (Perceptual Evaluation Methods for Audio Source Separation) toolbox, the proposed methods achieve a similar performance as the reference methods. The speech recognition experiments show that the utterances processed by pre-image iterations achieve a consistently better word recognition accuracy than the unprocessed noisy utterances and than the utterances processed by the generalized subspace method.

  9. Investigating unscripted speech: implications for phonetics and phonology.

    Science.gov (United States)

    Kohler, K J

    2000-01-01

    This paper looks at patterns of reduction and elaboration in speech production, taking the phenomenon of plosive-related glottalization in German spontaneous speech, on the basis of the 'Kiel Corpus', as its point of departure, and proposes general principles of human speech to explain them. This is followed by an enquiry into the nature of a production-perception link, based on complementary data from perceptual experiments. A hypothesis is put forward as to how listeners cope with the enormous phonetic variability of spoken language and how this ability may be acquired. Finally, the need for a new paradigm of phonetic analysis and phonological systematization is stressed, as a prerequisite to dealing adequately and in an insightful way with the production and perception of spontaneous speech. Copyright 2000 S. Karger AG, Basel

  10. Robust Speech Recognition Method Based on Discriminative Environment Feature Extraction

    Institute of Scientific and Technical Information of China (English)

    HAN Jiqing; GAO Wen

    2001-01-01

    It is an effective approach to learn the influence of environmental parameters,such as additive noise and channel distortions, from training data for robust speech recognition.Most of the previous methods are based on maximum likelihood estimation criterion. However,these methods do not lead to a minimum error rate result. In this paper, a novel discrimina-tive learning method of environmental parameters, which is based on Minimum ClassificationError (MCE) criterion, is proposed. In the method, a simple classifier and the Generalized Probabilistic Descent (GPD) algorithm are adopted to iteratively learn the environmental parameters. Consequently, the clean speech features are estimated from the noisy speech features with the estimated environmental parameters, and then the estimations of clean speech features are utilized in the back-end HMM classifier. Experiments show that the best error rate reduction of 32.1% is obtained, tested on a task of 18 isolated confusion Korean words, relative to a conventional HMM system.

  11. Variation and Synthetic Speech

    CERN Document Server

    Miller, C; Massey, N; Miller, Corey; Karaali, Orhan; Massey, Noel

    1997-01-01

    We describe the approach to linguistic variation taken by the Motorola speech synthesizer. A pan-dialectal pronunciation dictionary is described, which serves as the training data for a neural network based letter-to-sound converter. Subsequent to dictionary retrieval or letter-to-sound generation, pronunciations are submitted a neural network based postlexical module. The postlexical module has been trained on aligned dictionary pronunciations and hand-labeled narrow phonetic transcriptions. This architecture permits the learning of individual postlexical variation, and can be retrained for each speaker whose voice is being modeled for synthesis. Learning variation in this way can result in greater naturalness for the synthetic speech that is produced by the system.

  12. Speech is Golden

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter

    2014-01-01

    Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...... on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around...... of the present article, in the role of economically neutral advisers. The aim of the initiative is to pave the way for the first profitable contract in the field - which we hope to see in 2014 - an event which would precisely break the present deadlock and open up a billion EUR market for speech technology...

  13. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  14. Neurophysiology of speech differences in childhood apraxia of speech.

    Science.gov (United States)

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.

  15. Hiding Information under Speech

    Science.gov (United States)

    2005-12-12

    as it arrives in real time, and it disappears as fast as it arrives. Furthermore, our cognitive process for translating audio sounds to the meaning... steganography , whose goal is to make the embedded data completely undetectable. In addi- tion, we must dismiss the idea of hiding data by using any...therefore, an image has more room to hide data; and (2) speech steganography has not led to many money-making commercial businesses. For these two

  16. Speech Quality Measurement

    Science.gov (United States)

    1977-06-10

    noise test , t=2 for t1-v low p’ass f lit er te st ,and t 3 * or theit ADP(NI cod ing tevst ’*s is the sub lec nube 0l e tet Bostz- Av L b U0...a 1ý...it aepa rate, speech clu.1 t laboratory and controlled by the NOVA 830 computoer . Bach of tho stations has a CRT, .15 response buttons, a "rad button

  17. Binary Masking & Speech Intelligibility

    DEFF Research Database (Denmark)

    Boldt, Jesper

    The purpose of this thesis is to examine how binary masking can be used to increase intelligibility in situations where hearing impaired listeners have difficulties understanding what is being said. The major part of the experiments carried out in this thesis can be categorized as either experime...... mask using a directional system and a method for correcting errors in the target binary mask. The last part of the thesis, proposes a new method for objective evaluation of speech intelligibility....

  18. The relationship of speech intelligibility with hearing sensitivity, cognition, and perceived hearing difficulties varies for different speech perception tests

    Directory of Open Access Journals (Sweden)

    Antje eHeinrich

    2015-06-01

    Full Text Available Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests.Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study.Forty-four listeners aged between 50-74 years with mild SNHL were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet, to medium (digit triplet perception in speech-shaped noise to high (sentence perception in modulated noise; cognitive tests of attention, memory, and nonverbal IQ; and self-report questionnaires of general health-related and hearing-specific quality of life.Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that auditory environments pose on

  19. Representation of speech in human auditory cortex: is it special?

    Science.gov (United States)

    Steinschneider, Mitchell; Nourski, Kirill V; Fishman, Yonatan I

    2013-11-01

    Successful categorization of phonemes in speech requires that the brain analyze the acoustic signal along both spectral and temporal dimensions. Neural encoding of the stimulus amplitude envelope is critical for parsing the speech stream into syllabic units. Encoding of voice onset time (VOT) and place of articulation (POA), cues necessary for determining phonemic identity, occurs within shorter time frames. An unresolved question is whether the neural representation of speech is based on processing mechanisms that are unique to humans and shaped by learning and experience, or is based on rules governing general auditory processing that are also present in non-human animals. This question was examined by comparing the neural activity elicited by speech and other complex vocalizations in primary auditory cortex of macaques, who are limited vocal learners, with that in Heschl's gyrus, the putative location of primary auditory cortex in humans. Entrainment to the amplitude envelope is neither specific to humans nor to human speech. VOT is represented by responses time-locked to consonant release and voicing onset in both humans and monkeys. Temporal representation of VOT is observed both for isolated syllables and for syllables embedded in the more naturalistic context of running speech. The fundamental frequency of male speakers is represented by more rapid neural activity phase-locked to the glottal pulsation rate in both humans and monkeys. In both species, the differential representation of stop consonants varying in their POA can be predicted by the relationship between the frequency selectivity of neurons and the onset spectra of the speech sounds. These findings indicate that the neurophysiology of primary auditory cortex is similar in monkeys and humans despite their vastly different experience with human speech, and that Heschl's gyrus is engaged in general auditory, and not language-specific, processing. This article is part of a Special Issue entitled

  20. Abortion and compelled physician speech.

    Science.gov (United States)

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading. © 2015 American Society of Law, Medicine & Ethics, Inc.

  1. Noise Reduction in Car Speech

    OpenAIRE

    V. Bolom

    2009-01-01

    This paper presents properties of chosen multichannel algorithms for speech enhancement in a noisy environment. These methods are suitable for hands-free communication in a car cabin. Criteria for evaluation of these systems are also presented. The criteria consider both the level of noise suppression and the level of speech distortion. The performance of multichannel algorithms is investigated for a mixed model of speech signals and car noise and for real signals recorded in a car. 

  2. Speech recognition in university classrooms

    OpenAIRE

    Wald, Mike; Bain, Keith; Basson, Sara H

    2002-01-01

    The LIBERATED LEARNING PROJECT (LLP) is an applied research project studying two core questions: 1) Can speech recognition (SR) technology successfully digitize lectures to display spoken words as text in university classrooms? 2) Can speech recognition technology be used successfully as an alternative to traditional classroom notetaking for persons with disabilities? This paper addresses these intriguing questions and explores the underlying complex relationship between speech recognition te...

  3. Noise Reduction in Car Speech

    Directory of Open Access Journals (Sweden)

    V. Bolom

    2009-01-01

    Full Text Available This paper presents properties of chosen multichannel algorithms for speech enhancement in a noisy environment. These methods are suitable for hands-free communication in a car cabin. Criteria for evaluation of these systems are also presented. The criteria consider both the level of noise suppression and the level of speech distortion. The performance of multichannel algorithms is investigated for a mixed model of speech signals and car noise and for real signals recorded in a car. 

  4. Speech and language intervention in bilinguals

    Directory of Open Access Journals (Sweden)

    Eliane Ramos

    2011-12-01

    Full Text Available Increasingly, speech and language pathologists (SLPs around the world are faced with the unique set of issues presented by their bilingual clients. Some professional associations in different countries have presented recommendations when assessing and treating bilingual populations. In children, most of the studies have focused on intervention for language and phonology/ articulation impairments and very few focus on stuttering. In general, studies of language intervention tend to agree that intervention in the first language (L1 either increase performance on L2 or does not hinder it. In bilingual adults, monolingual versus bilingual intervention is especially relevant in cases of aphasia; dysarthria in bilinguals has been barely approached. Most studies of cross-linguistic effects in bilingual aphasics have focused on lexical retrieval training. It has been noted that even though a majority of studies have disclosed a cross-linguistic generalization from one language to the other, some methodological weaknesses are evident. It is concluded that even though speech and language intervention in bilinguals represents a most important clinical area in speech language pathology, much more research using larger samples and controlling for potentially confounding variables is evidently required.

  5. Speech Recognition on Mobile Devices

    DEFF Research Database (Denmark)

    Tan, Zheng-Hua; Lindberg, Børge

    2010-01-01

    The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...... in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within...... command and control, text entry and search are presented with an emphasis on mobile text entry....

  6. Speech Recognition: How Do We Teach It?

    Science.gov (United States)

    Barksdale, Karl

    2002-01-01

    States that growing use of speech recognition software has made voice writing an essential computer skill. Describes how to present the topic, develop basic speech recognition skills, and teach speech recognition outlining, writing, proofreading, and editing. (Contains 14 references.) (SK)

  7. Huntington's Disease: Speech, Language and Swallowing

    Science.gov (United States)

    ... Disease Society of America Huntington's Disease Youth Organization Movement Disorder Society National Institute of Neurological Disorders and Stroke Typical Speech and Language Development Learning More Than One Language Adult Speech and Language Child Speech and Language Swallowing ...

  8. Are there interactive processes in speech perception?

    Science.gov (United States)

    McClelland, James L.; Mirman, Daniel; Holt, Lori L.

    2012-01-01

    Lexical information facilitates speech perception, especially when sounds are ambiguous or degraded. The interactive approach to understanding this effect posits that this facilitation is accomplished through bi-directional flow of information, allowing lexical knowledge to influence pre-lexical processes. Alternative autonomous theories posit feed-forward processing with lexical influence restricted to post-perceptual decision processes. We review evidence supporting the prediction of interactive models that lexical influences can affect pre-lexical mechanisms, triggering compensation, adaptation and retuning of phonological processes generally taken to be pre-lexical. We argue that these and other findings point to interactive processing as a fundamental principle for perception of speech and other modalities. PMID:16843037

  9. Speech emotion recognition with unsupervised feature learning

    Institute of Scientific and Technical Information of China (English)

    Zheng-wei HUANG; Wen-tao XUE; Qi-rong MAO

    2015-01-01

    Emotion-based features are critical for achieving high performance in a speech emotion recognition (SER) system. In general, it is difficult to develop these features due to the ambiguity of the ground-truth. In this paper, we apply several unsupervised feature learning algorithms (including K-means clustering, the sparse auto-encoder, and sparse restricted Boltzmann machines), which have promise for learning task-related features by using unlabeled data, to speech emotion recognition. We then evaluate the performance of the proposed approach and present a detailed analysis of the effect of two important factors in the model setup, the content window size and the number of hidden layer nodes. Experimental results show that larger content windows and more hidden nodes contribute to higher performance. We also show that the two-layer network cannot explicitly improve performance compared to a single-layer network.

  10. The Enlightenment of the Important Speech of General Secretary Xi Propaganda Work on College Ideological and Political Education%习总书记思想宣传工作重要讲话对高校思想政治教育的启示

    Institute of Scientific and Technical Information of China (English)

    毛时玉

    2016-01-01

    Xi Jinping's new leadership of the party and government leadership has made a great contribution to the construc-tion of a new era of socialist value system China characteristics. Especially the general secretary Xi Jinping important speech delivered in propaganda work, the official discourse system of correction and innovation, has important implications for im-proving the ideological and political education in Colleges and universities. Facing the dramatic changes of education environ-ment, students' ideological myth, classroom education lagging predicament of Ideological and political education. To get rid of these difficulties, educators should use the concept and metaphor in Xi Jinping's speech to learn Xi Jinping's speech;language style;integrate into the value system of Xi Jinping's speech.%习近平领导的新一届党和政府领导班子为构建新时代中国特色社会主义价值体系作出了巨大的贡献。特别是习近平总书记在思想宣传工作中所发表的重要讲话,对官方的话语体系进行了纠偏和创新,对改善高校的思想政治教育具有重要的启发作用。当前高校思想政治教育面临教育环境急剧变化、学生思想陷入迷思、课堂教育僵化滞后等困境。要破除这些困境,教育工作者应该善于借鉴习近平讲话中的概念和比喻;善于学习习近平讲话的语言风格;善于融入习近平讲话中的价值体系。

  11. On a Supposed Dogma of Speech Perception Research: a Response to Appelbaum (1999

    Directory of Open Access Journals (Sweden)

    Fernando Orphão de Carvalho

    2009-04-01

    Full Text Available . In this paper we purport to qualify the claim, advanced by Appelbaum (1999 that speech perception research, in the last 70 years or so, has endorsed a view on the nature of speech for which no evidence can be adduced and which has resisted falsification through active ad hoc “theoretical repair” carried by speech scientists. We show that the author’s qualms on the putative dogmatic status of speech research are utterly unwarranted, if not misconstrued as a whole. On more general grounds, the present article can be understood as a work on the rather underdeveloped area of the philosophy and history of Linguistics.

  12. Alternative Speech Communication System for Persons with Severe Speech Disorders

    Science.gov (United States)

    Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

    2009-12-01

    Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.

  13. Measurement of speech parameters in casual speech of dementia patients

    NARCIS (Netherlands)

    Ossewaarde, Roelant; Jonkers, Roel; Jalvingh, Fedor; Bastiaanse, Yvonne

    Measurement of speech parameters in casual speech of dementia patients Roelant Adriaan Ossewaarde1,2, Roel Jonkers1, Fedor Jalvingh1,3, Roelien Bastiaanse1 1CLCG, University of Groningen (NL); 2HU University of Applied Sciences Utrecht (NL); 33St. Marienhospital - Vechta, Geriatric Clinic Vechta

  14. Discourse, Statement and Speech Act

    Directory of Open Access Journals (Sweden)

    Елена Александровна Красина

    2016-12-01

    Full Text Available Being a component of socio-cultural interaction discourse constitutes a sophisticated cohesion of language form, meaning and performance, i.e. communicative event or act. Cohesion with event and performance let us treat discourse as a certain lifeform, appealing both to communicative interaction and pragmatic environment using the methodology of studies of E. Benveniste, M. Foucault, I. Kecskes, J.R. Searle et al. In linguistics and other fields of humanitarian knowledge the notion of discourse facilitates the integration of studies in humanities. Principles of integration, incorporation into broad humanitarian context reveal some topics of discourse-speech act-utterance interaction which leads to substantive solutions of a number of linguistic topics, in particular, that of an utterance. Logicians determine utterance through proposition; linguists - through sentence, while speech act theory does it by means of illocutionary act. Integrated in a discourse or its part, utterance makes up their integral constituents although not unique ones. In relation to speech acts, utterance happens to be the unique definitional domain synchronically modelling and denoting speech act by means of propositional content. The goal of the research is to show the conditions of interaction and correlation of discourse, speech act and utterance as linguistic constructions, reveal some similarities and differences of their characteristics and prove the importance of the constructive role of utterance as a minimal unit of speech production. Discourse-speech act-utterance correlation supports the utterance role of a discrete unit within syntactic continuum, facing both language and speech: still, it belongs exclusively neither to language nor speech, but specifies their interaction in course of speech activity exposing simultaneously its nature of an ‘atom of discourse’ and creating the definitional domain of a speech act.

  15. A Pragma-Stylistic Analysis of President Goodluck Ebele Jonathan Inaugural Speech

    Science.gov (United States)

    Abuya, Eromosele John

    2012-01-01

    The study was an examination through the pragma-stylistic approach to meaning of the linguistic acts that manifest in the Inaugural Speech of Goodluck Ebele Jonathan as the democratically elected president in May 2011 General Elections in Nigeria. Hence, the study focused on speech acts type of locution, illocutionary and perlocutionary in the…

  16. Pragmatic Difficulties in the Production of the Speech Act of Apology by Iraqi EFL Learners

    Science.gov (United States)

    Al-Ghazalli, Mehdi Falih; Al-Shammary, Mohanad A. Amert

    2014-01-01

    The purpose of this paper is to investigate the pragmatic difficulties encountered by Iraqi EFL university students in producing the speech act of apology. Although the act of apology is easy to recognize or use by native speakers of English, non-native speakers generally encounter difficulties in discriminating one speech act from another. The…

  17. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    Science.gov (United States)

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…

  18. Quantifying the intelligibility of speech in noise for non-native listeners

    NARCIS (Netherlands)

    Wijngaarden, S.J. van; Steeneken, H.J.M.; Houtgast, T.

    2002-01-01

    When listening to languages learned at a later age, speech intelligibility is generally lower than when listening to one's native language. The main purpose of this study is to quantify speech intelligibility in noise for specific populations of non-native listeners, only broadly addressing the unde

  19. VARIATIONS OF DIRECTIVE SPEECH ACT IN TEMBANG DOLANAN

    Directory of Open Access Journals (Sweden)

    Daru Winarti

    2015-10-01

    Full Text Available This article discusses the directive speech acts contained in tembang dolanan. Using a pragmatic approach, particularly the framework of speech act theory, this article analyzes the different types of directive speech acts, the context which it embodies, and the level of decency. The data used in this research consisted of various tembang dolanan that contain directive statements. These data were analyzed using interpretation and inference by presenting it in the form of descriptive analysis. Descriptive analysis is meant to describe, systematically illustrating or elaborating the facts and relationships between phenomena. In the dolanan song, directive speech acts can be expressed directly or indirectly. Direct expression is conventionally used to rule, invite, and forward, while indirect expression is used when instead of by a command line, the intention is ruled by statement sentences, obligation-stating sentences, and questions. The use of direct speech acts generally does not have the value of politeness because they tend to still contain elements of coercion, have no effort to obscure the form of an order, and show the superiority of the speakers. On the other hand, the use of indirect speech acts seems to be an attempt to obscure the commandments to be more polite in the hope opponents would happily respond to commands.

  20. Acoustic assessment of speech privacy curtains in two nursing units.

    Science.gov (United States)

    Pope, Diana S; Miller-Klein, Erik T

    2016-01-01

    Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.

  1. Investigation of Persian Speech Interaural Attenuation in Adults

    Directory of Open Access Journals (Sweden)

    Fahimeh Hajiabolhassan

    2010-06-01

    Full Text Available Background and Aim: As clinical audiometry assessment of each ear needs to know interaural attenuation (IA, the aim of this study was to investigate Persian speech IA in adults.Methods: This cross-sectional, analytic study was performed on 50 normal hearing students (25 males, 25 females, aged 18-25 years old in Faculty of Rehabilitation, Tehran University of Medical Sciences. Speech reception threshold (SRT was determined with descending method with and without noise. Then speech IA for Persian spondaic words was caculated with TDH-39 earphones.Results: Mean speech IA was 53.06±3.25 dB. There was no significant difference between mean IA in males (53.88±2.93 dB and females (52.24±3.40 dB(p>0.05. The lowest IA was in females (45 dB and the highest IA was in males (60 dB. Mother’s language has no significant effect on speech IA.Conclusion: We may consider 45 dB as the lowest IA for Persian speech assessment, however generalization needs more study on a larger sample.

  2. Acoustic assessment of speech privacy curtains in two nursing units

    Directory of Open Access Journals (Sweden)

    Diana S Pope

    2016-01-01

    Full Text Available Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient′s bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s′ standard hospital construction and the other was newly refurbished (2013 with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.

  3. Listening for the norm: adaptive coding in speech categorization

    Directory of Open Access Journals (Sweden)

    Jingyuan eHuang

    2012-02-01

    Full Text Available Perceptual aftereffects have been referred to as the psychologist’s microelectrode because they can expose dimensions of representation through the residual effect of a context stimulus upon perception of a subsequent target. The present study uses such context dependence to examine the dimensions of representation involved in a classic demonstration of talker normalization in speech perception. Whereas most accounts of talker normalization have emphasized talker-, speech- or articulatory-specific dimensions’ significance, the present work tests an alternative hypothesis: that the long-term average spectrum of speech context is responsible for patterns of context-dependent perception considered to be evidence for talker normalization. In support of this hypothesis, listeners’ vowel categorization was equivalently influenced by speech contexts manipulated to sound as though they were spoken by different talkers and nonspeech analogs matched in LTAS to the speech contexts. Since the nonspeech contexts did not possess talker, speech or articulatory information, general perceptual mechanisms are implicated. Results are described in terms of adaptive perceptual coding.

  4. Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise

    Science.gov (United States)

    Carroll, Rebecca; Warzybok, Anna; Kollmeier, Birger; Ruigendijk, Esther

    2016-01-01

    Vocabulary size has been suggested as a useful measure of “verbal abilities” that correlates with speech recognition scores. Knowing more words is linked to better speech recognition. How vocabulary knowledge translates to general speech recognition mechanisms, how these mechanisms relate to offline speech recognition scores, and how they may be modulated by acoustical distortion or age, is less clear. Age-related differences in linguistic measures may predict age-related differences in speech recognition in noise performance. We hypothesized that speech recognition performance can be predicted by the efficiency of lexical access, which refers to the speed with which a given word can be searched and accessed relative to the size of the mental lexicon. We tested speech recognition in a clinical German sentence-in-noise test at two signal-to-noise ratios (SNRs), in 22 younger (18–35 years) and 22 older (60–78 years) listeners with normal hearing. We also assessed receptive vocabulary, lexical access time, verbal working memory, and hearing thresholds as measures of individual differences. Age group, SNR level, vocabulary size, and lexical access time were significant predictors of individual speech recognition scores, but working memory and hearing threshold were not. Interestingly, longer accessing times were correlated with better speech recognition scores. Hierarchical regression models for each subset of age group and SNR showed very similar patterns: the combination of vocabulary size and lexical access time contributed most to speech recognition performance; only for the younger group at the better SNR (yielding about 85% correct speech recognition) did vocabulary size alone predict performance. Our data suggest that successful speech recognition in noise is mainly modulated by the efficiency of lexical access. This suggests that older adults’ poorer performance in the speech recognition task may have arisen from reduced efficiency in lexical access

  5. Teaching Speech Acts

    Directory of Open Access Journals (Sweden)

    Teaching Speech Acts

    2007-01-01

    Full Text Available In this paper I argue that pragmatic ability must become part of what we teach in the classroom if we are to realize the goals of communicative competence for our students. I review the research on pragmatics, especially those articles that point to the effectiveness of teaching pragmatics in an explicit manner, and those that posit methods for teaching. I also note two areas of scholarship that address classroom needs—the use of authentic data and appropriate assessment tools. The essay concludes with a summary of my own experience teaching speech acts in an advanced-level Portuguese class.

  6. Speech Understanding Systems

    Science.gov (United States)

    1976-02-01

    kHz that is a fixed number of decibels below the maximum value in the spectrum. A value of zero, however, is not recommended. (c) Speech for the...probability distributions for [t,p,k,d,n] should be evaluated using the observed parameters. But the scores on each of the vowels are all bad, so...plosives [p,t,k] is to examine the burst frequency and the voice-onset-time (VOT) when the plosive is followed by a vowel or semi- vowel . However, if

  7. Separating Underdetermined Convolutive Speech Mixtures

    DEFF Research Database (Denmark)

    Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan

    2006-01-01

    a method for underdetermined blind source separation of convolutive mixtures. The proposed framework is applicable for separation of instantaneous as well as convolutive speech mixtures. It is possible to iteratively extract each speech signal from the mixture by combining blind source separation...

  8. Methods of Teaching Speech Recognition

    Science.gov (United States)

    Rader, Martha H.; Bailey, Glenn A.

    2010-01-01

    Objective: This article introduces the history and development of speech recognition, addresses its role in the business curriculum, outlines related national and state standards, describes instructional strategies, and discusses the assessment of student achievement in speech recognition classes. Methods: Research methods included a synthesis of…

  9. PESQ Based Speech Intelligibility Measurement

    NARCIS (Netherlands)

    Beerends, J.G.; Buuren, R.A. van; Vugt, J.M. van; Verhave, J.A.

    2009-01-01

    Several measurement techniques exist to quantify the intelligibility of a speech transmission chain. In the objective domain, the Articulation Index [1] and the Speech Transmission Index STI [2], [3], [4], [5] have been standardized for predicting intelligibility. The STI uses a signal that contains

  10. Perceptual Learning of Interrupted Speech

    NARCIS (Netherlands)

    Benard, Michel Ruben; Başkent, Deniz

    2013-01-01

    The intelligibility of periodically interrupted speech improves once the silent gaps are filled with noise bursts. This improvement has been attributed to phonemic restoration, a top-down repair mechanism that helps intelligibility of degraded speech in daily life. Two hypotheses were investigated u

  11. High-frequency energy in singing and speech

    Science.gov (United States)

    Monson, Brian Bruce

    While human speech and the human voice generate acoustical energy up to (and beyond) 20 kHz, the energy above approximately 5 kHz has been largely neglected. Evidence is accruing that this high-frequency energy contains perceptual information relevant to speech and voice, including percepts of quality, localization, and intelligibility. The present research was an initial step in the long-range goal of characterizing high-frequency energy in singing voice and speech, with particular regard for its perceptual role and its potential for modification during voice and speech production. In this study, a database of high-fidelity recordings of talkers was created and used for a broad acoustical analysis and general characterization of high-frequency energy, as well as specific characterization of phoneme category, voice and speech intensity level, and mode of production (speech versus singing) by high-frequency energy content. Directionality of radiation of high-frequency energy from the mouth was also examined. The recordings were used for perceptual experiments wherein listeners were asked to discriminate between speech and voice samples that differed only in high-frequency energy content. Listeners were also subjected to gender discrimination tasks, mode-of-production discrimination tasks, and transcription tasks with samples of speech and singing that contained only high-frequency content. The combination of these experiments has revealed that (1) human listeners are able to detect very subtle level changes in high-frequency energy, and (2) human listeners are able to extract significant perceptual information from high-frequency energy.

  12. Studies on bilateral cochlear implants at the University of Wisconsin's Binaural Hearing and Speech Laboratory.

    Science.gov (United States)

    Litovsky, Ruth Y; Goupell, Matthew J; Godar, Shelly; Grieco-Calub, Tina; Jones, Gary L; Garadat, Soha N; Agrawal, Smita; Kan, Alan; Todd, Ann; Hess, Christi; Misurelli, Sara

    2012-06-01

    This report highlights research projects relevant to binaural and spatial hearing in adults and children. In the past decade we have made progress in understanding the impact of bilateral cochlear implants (BiCIs) on performance in adults and children. However, BiCI users typically do not perform as well as normal hearing (NH) listeners. In this article we describe the benefits from BiCIs compared with a single cochlear implant (CI), focusing on measures of spatial hearing and speech understanding in noise. We highlight the fact that in BiCI listening the devices in the two ears are not coordinated; thus binaural spatial cues that are available to NH listeners are not available to BiCI users. Through the use of research processors that carefully control the stimulus delivered to each electrode in each ear, we are able to preserve binaural cues and deliver them with fidelity to BiCI users. Results from those studies are discussed as well, with a focus on the effect of age at onset of deafness and plasticity of binaural sensitivity. Our work with children has expanded both in number of subjects tested and age range included. We have now tested dozens of children ranging in age from 2 to 14 yr. Our findings suggest that spatial hearing abilities emerge with bilateral experience. While we originally focused on studying performance in free field, where real world listening experiments are conducted, more recently we have begun to conduct studies under carefully controlled binaural stimulation conditions with children as well. We have also studied language acquisition and speech perception and production in young CI users. Finally, a running theme of this research program is the systematic investigation of the numerous factors that contribute to spatial and binaural hearing in BiCI users. By using CI simulations (with vocoders) and studying NH listeners under degraded listening conditions, we are able to tease apart limitations due to the hardware/software of the CI

  13. Visualizing structures of speech expressiveness

    DEFF Research Database (Denmark)

    Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars

    2008-01-01

    . The Babel myth speaks about distance created when aspiring to the heaven as the reason for language division. Meanwhile, Locquin states through thorough investigations that only a few phonemes are present throughout history. Our interpretation is that a system able to recognize archetypal phonemes through......Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech...... vowels and consonants, and which converts the speech energy into visual particles that form complex visual structures, provides us with a mean to present the expressiveness of speech into a visual mode. This system is presented in an artwork whose scenario is inspired from the reasons of language...

  14. Speech Compression Using Multecirculerletet Transform

    Directory of Open Access Journals (Sweden)

    Sulaiman Murtadha

    2012-01-01

    Full Text Available Compressing the speech reduces the data storage requirements, leading to reducing the time of transmitting the digitized speech over long-haul links like internet. To obtain best performance in speech compression, wavelet transforms require filters that combine a number of desirable properties, such as orthogonality and symmetry.The MCT bases functions are derived from GHM bases function using 2D linear convolution .The fast computation algorithm methods introduced here added desirable features to the current transform. We further assess the performance of the MCT in speech compression application. This paper discusses the effect of using DWT and MCT (one and two dimension on speech compression. DWT and MCT performances in terms of compression ratio (CR, mean square error (MSE and peak signal to noise ratio (PSNR are assessed. Computer simulation results indicate that the two dimensions MCT offer a better compression ratio, MSE and PSNR than DWT.

  15. Segmental duration in Parkinsonian French speech.

    Science.gov (United States)

    Duez, Danielle

    2009-01-01

    The present study had 2 main objectives: (1) examine the effect of Parkinson's disease (PD) on vowel and consonant duration in French read speech and (2) investigate whether the durational contrasts of consonants and vowels are maintained or compromised. The data indicated that the consonant durations were shortened in Parkinsonian speech (PS), compared to control speech (CS). However, this shortening was consonant dependent: unvoiced occlusives and fricatives were significantly shortened compared to other consonant categories. All vowels were slightly longer in PS than in CS, however, the observed differences were below the level of significance. Despite significant shortening of some consonant categories, the general pattern of intrinsic duration was maintained in PS. There was slightly less agreement for vowels with the normal contrast of intrinsic durations, possibly because vowel durational contrasts are more sensitive to PD disorders. Most PD patients tended to maintain the intrinsic duration contrasts of both vowels and consonants, suggesting that low-level articulatory constraints operate in a similar way and with the same weight in PS and CS. Copyright 2009 S. Karger AG, Basel.

  16. Speech Impairment in Down Syndrome: A Review

    Science.gov (United States)

    Kent, Ray D.; Vorperian, Houri K.

    2012-01-01

    Purpose This review summarizes research on disorders of speech production in Down Syndrome (DS) for the purposes of informing clinical services and guiding future research. Method Review of the literature was based on searches using Medline, Google Scholar, Psychinfo, and HighWire Press, as well as consideration of reference lists in retrieved documents (including online sources). Search terms emphasized functions related to voice, articulation, phonology, prosody, fluency and intelligibility. Conclusions The following conclusions pertain to four major areas of review: (a) Voice. Although a number of studies have been reported on vocal abnormalities in DS, major questions remain about the nature and frequency of the phonatory disorder. Results of perceptual and acoustic studies have been mixed, making it difficult to draw firm conclusions or even to identify sensitive measures for future study. (b) Speech sounds. Articulatory and phonological studies show that speech patterns in DS are a combination of delayed development and errors not seen in typical development. Delayed (i.e., developmental) and disordered (i.e., nondevelopmental) patterns are evident by the age of about 3 years, although DS-related abnormalities possibly appear earlier, even in infant babbling. (c) Fluency and prosody. Stuttering and/or cluttering occur in DS at rates of 10 to 45%, compared to about 1% in the general population. Research also points to significant disturbances in prosody. (d) Intelligibility. Studies consistently show marked limitations in this area but it is only recently that research goes beyond simple rating scales. PMID:23275397

  17. Dichotic speech tests.

    Science.gov (United States)

    Hällgren, M; Johansson, M; Larsby, B; Arlinger, S

    1998-01-01

    When central auditory dysfunction is present, ability to understand speech in difficult listening situations can be affected. To study this phenomenon, dichotic speech tests were performed with test material in the Swedish language. Digits, spondees, sentences and consonant-vowel syllables were used as stimuli and the reporting was free or directed. The test material was recorded on CD. The study includes a normal group of 30 people in three different age categories; 11 years, 23-27 years and 67-70 years. It also includes two different groups of subjects with suspected central auditory lesions; 11 children with reading and writing difficulties and 4 adults earlier exposed to organic solvents. The results from the normal group do not show any differences in performance due to age. The children with reading and writing difficulties show a significant deviation for one test with digits and one test with syllables. Three of the four adults exposed to solvents show a significant deviation from the normal group.

  18. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    Science.gov (United States)

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues.

  19. PCA-Based Speech Enhancement for Distorted Speech Recognition

    Directory of Open Access Journals (Sweden)

    Tetsuya Takiguchi

    2007-09-01

    Full Text Available We investigated a robust speech feature extraction method using kernel PCA (Principal Component Analysis for distorted speech recognition. Kernel PCA has been suggested for various image processing tasks requiring an image model, such as denoising, where a noise-free image is constructed from a noisy input image. Much research for robust speech feature extraction has been done, but it remains difficult to completely remove additive or convolution noise (distortion. The most commonly used noise-removal techniques are based on the spectraldomain operation, and then for speech recognition, the MFCC (Mel Frequency Cepstral Coefficient is computed, where DCT (Discrete Cosine Transform is applied to the mel-scale filter bank output. This paper describes a new PCA-based speech enhancement algorithm using kernel PCA instead of DCT, where the main speech element is projected onto low-order features, while the noise or distortion element is projected onto high-order features. Its effectiveness is confirmed by word recognition experiments on distorted speech.

  20. Modeling the Development of Audiovisual Cue Integration in Speech Perception

    Science.gov (United States)

    Getz, Laura M.; Nordeen, Elke R.; Vrabic, Sarah C.; Toscano, Joseph C.

    2017-01-01

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues. PMID:28335558

  1. Hate Speech/Free Speech: Using Feminist Perspectives To Foster On-Campus Dialogue.

    Science.gov (United States)

    Cornwell, Nancy; Orbe, Mark P.; Warren, Kiesha

    1999-01-01

    Explores the complex issues inherent in the tension between hate speech and free speech, focusing on the phenomenon of hate speech on college campuses. Describes the challenges to hate speech made by critical race theorists and explains how a feminist critique can reorient the parameters of hate speech. (SLD)

  2. Hate Speech/Free Speech: Using Feminist Perspectives To Foster On-Campus Dialogue.

    Science.gov (United States)

    Cornwell, Nancy; Orbe, Mark P.; Warren, Kiesha

    1999-01-01

    Explores the complex issues inherent in the tension between hate speech and free speech, focusing on the phenomenon of hate speech on college campuses. Describes the challenges to hate speech made by critical race theorists and explains how a feminist critique can reorient the parameters of hate speech. (SLD)

  3. Speech and language therapy intervention with a group of persistent and prolific young offenders in a non-custodial setting with previously undiagnosed speech, language and communication difficulties.

    Science.gov (United States)

    Gregory, Juliette; Bryan, Karen

    2011-01-01

    Increasing numbers of children with behaviour and school problems (related to both academic achievement and social participation) are recognized as having undiagnosed speech, language and communication difficulties. Both speech, language and communication difficulties and school failure are risk factors for offending. To investigate the prevalence of speech, language and communication difficulties in a group of persistent and prolific young offenders sentenced to the Intensive Supervision and Surveillance Programme (ISSP), and to provide a preliminary evaluation of the impact of speech and language therapy (SLT) intervention. Seventy-two entrants to ISSP over 12 months were screened by the speech and language therapist. Those showing difficulties then had a detailed language assessment followed by intervention delivered jointly by the speech and language therapist and the youth offending team staff. Reassessment occurred at programme completion. A total of 65% of those screened had profiles indicating that they had language difficulties and might benefit from speech and language therapy intervention. As a cohort, their language skills were lower than those of the general population, and 20% scored at the 'severely delayed' level on standardized assessment. This is the first study of speech and language therapy within community services for young offenders, and is the first to demonstrate language improvement detectable on standardized language tests. However, further research is needed to determine the precise role of speech and language therapy within the intervention programme. Children and young people with behavioural or school difficulties coming into contact with criminal justice, mental health, psychiatric, and social care services need to be systematically assessed for undiagnosed speech, language and communication difficulties. Appropriate interventions can then enable the young person to engage with verbally mediated interventions. © 2011 Royal College

  4. Connected Speech Processes in Australian English.

    Science.gov (United States)

    Ingram, J. C. L.

    1989-01-01

    Explores the role of Connected Speech Processes (CSP) in accounting for sociolinguistically significant dimensions of speech variation, and presents initial findings on the distribution of CSPs in the speech of Australian adolescents. The data were gathered as part of a wider survey of speech of Brisbane school children. (Contains 26 references.)…

  5. Linguistic Units and Speech Production Theory.

    Science.gov (United States)

    MacNeilage, Peter F.

    This paper examines the validity of the concept of linguistic units in a theory of speech production. Substantiating data are drawn from the study of the speech production process itself. Secondarily, an attempt is made to reconcile the postulation of linguistic units in speech production theory with their apparent absence in the speech signal.…

  6. Automated Speech Rate Measurement in Dysarthria

    Science.gov (United States)

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  7. Coevolution of Human Speech and Trade

    NARCIS (Netherlands)

    Horan, R.D.; Bulte, E.H.; Shogren, J.F.

    2008-01-01

    We propose a paleoeconomic coevolutionary explanation for the origin of speech in modern humans. The coevolutionary process, in which trade facilitates speech and speech facilitates trade, gives rise to multiple stable trajectories. While a `trade-speech¿ equilibrium is not an inevitable outcome for

  8. The Stylistic Analysis of Public Speech

    Institute of Scientific and Technical Information of China (English)

    李龙

    2011-01-01

    Public speech is a very important part in our daily life.The ability to deliver a good public speech is something we need to learn and to have,especially,in the service sector.This paper attempts to analyze the style of public speech,in the hope of providing inspiration to us whenever delivering such a speech.

  9. Connected Speech Processes in Australian English.

    Science.gov (United States)

    Ingram, J. C. L.

    1989-01-01

    Explores the role of Connected Speech Processes (CSP) in accounting for sociolinguistically significant dimensions of speech variation, and presents initial findings on the distribution of CSPs in the speech of Australian adolescents. The data were gathered as part of a wider survey of speech of Brisbane school children. (Contains 26 references.)…

  10. Speech recognition from spectral dynamics

    Indian Academy of Sciences (India)

    Hynek Hermansky

    2011-10-01

    Information is carried in changes of a signal. The paper starts with revisiting Dudley’s concept of the carrier nature of speech. It points to its close connection to modulation spectra of speech and argues against short-term spectral envelopes as dominant carriers of the linguistic information in speech. The history of spectral representations of speech is briefly discussed. Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) filtering. Next, the frequency domain perceptual linear prediction technique for deriving autoregressive models of temporal trajectories of spectral power in individual frequency bands is reviewed. Finally, posterior-based features, which allow for straightforward application of modulation frequency domain information, are described. The paper is tutorial in nature, aims at a historical global overview of attempts for using spectral dynamics in machine recognition of speech, and does not always provide enough detail of the described techniques. However, extensive references to earlier work are provided to compensate for the lack of detail in the paper.

  11. ARMA Modelling for Whispered Speech

    Institute of Scientific and Technical Information of China (English)

    Xue-li LI; Wei-dong ZHOU

    2010-01-01

    The Autoregressive Moving Average (ARMA) model for whispered speech is proposed. Compared with normal speech, whispered speech has no fundamental frequency because of the glottis being semi-opened and turbulent flow being created, and formant shifting exists in the lower frequency region due to the narrowing of the tract in the false vocal fold regions and weak acoustic coupling with the subglottal system. Analysis shows that the effect of the subglottal system is to introduce additional pole-zero pairs into the vocal tract transfer function. Theoretically, the method based on an ARMA process is superior to that based on an AR process in the spectral analysis of the whispered speech. Two methods, the least squared modified Yule-Walker likelihood estimate (LSMY) algorithm and the Frequency-Domain Steiglitz-Mcbride (FDSM) algorithm, are applied to the ARMA model for the whispered speech. The performance evaluation shows that the ARMA model is much more appropriate for representing the whispered speech than the AR model, and the FDSM algorithm provides a more accurate estimation of the whispered speech spectral envelope than the LSMY algorithm with higher computational complexity.

  12. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    Directory of Open Access Journals (Sweden)

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  13. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

    Science.gov (United States)

    Lee, Byeongwook; Cho, Kwang-Hyun

    2016-11-01

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.

  14. A Survey on Speech Enhancement Methodologies

    Directory of Open Access Journals (Sweden)

    Ravi Kumar. K

    2016-12-01

    Full Text Available Speech enhancement is a technique which processes the noisy speech signal. The aim of speech enhancement is to improve the perceived quality of speech and/or to improve its intelligibility. Due to its vast applications in mobile telephony, VOIP, hearing aids, Skype and speaker recognition, the challenges in speech enhancement have grown over the years. It is more challenging to suppress back ground noise that effects human communication in noisy environments like airports, road works, traffic, and cars. The objective of this survey paper is to outline the single channel speech enhancement methodologies used for enhancing the speech signal which is corrupted with additive background noise and also discuss the challenges and opportunities of single channel speech enhancement. This paper mainly focuses on transform domain techniques and supervised (NMF, HMM speech enhancement techniques. This paper gives frame work for developments in speech enhancement methodologies

  15. Crosslinguistic application of English-centric rhythm descriptors in motor speech disorders.

    Science.gov (United States)

    Liss, Julie M; Utianski, Rene; Lansford, Kaitlin

    2013-01-01

    Rhythmic disturbances are a hallmark of motor speech disorders, in which the motor control deficits interfere with the outward flow of speech and by extension speech understanding. As the functions of rhythm are language-specific, breakdowns in rhythm should have language-specific consequences for communication. The goals of this paper are to (i) provide a review of the cognitive-linguistic role of rhythm in speech perception in a general sense and crosslinguistically; (ii) present new results of lexical segmentation challenges posed by different types of dysarthria in American English, and (iii) offer a framework for crosslinguistic considerations for speech rhythm disturbances in the diagnosis and treatment of communication disorders associated with motor speech disorders. This review presents theoretical and empirical reasons for considering speech rhythm as a critical component of communication deficits in motor speech disorders, and addresses the need for crosslinguistic research to explore language-universal versus language-specific aspects of motor speech disorders. Copyright © 2013 S. Karger AG, Basel.

  16. Speech perception and talker segregation: Effects of level, pitch, and tactile support with multiple simultaneous talkers

    Science.gov (United States)

    Drullman, Rob; Bronkhorst, Adelbert W.

    2004-11-01

    Speech intelligibility was investigated by varying the number of interfering talkers, level, and mean pitch differences between target and interfering speech, and the presence of tactile support. In a first experiment the speech-reception threshold (SRT) for sentences was measured for a male talker against a background of one to eight interfering male talkers or speech noise. Speech was presented diotically and vibro-tactile support was given by presenting the low-pass-filtered signal (0-200 Hz) to the index finger. The benefit in the SRT resulting from tactile support ranged from 0 to 2.4 dB and was largest for one or two interfering talkers. A second experiment focused on masking effects of one interfering talker. The interference was the target talker's own voice with an increased mean pitch by 2, 4, 8, or 12 semitones. Level differences between target and interfering speech ranged from -16 to +4 dB. Results from measurements of correctly perceived words in sentences show an intelligibility increase of up to 27% due to tactile support. Performance gradually improves with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences. Differences in performance between noise and speech maskers and between speech maskers with various mean pitches are explained by the effect of informational masking. .

  17. Quantifying the intelligibility of speech in noise for non-native talkers

    Science.gov (United States)

    van Wijngaarden, Sander J.; Steeneken, Herman J. M.; Houtgast, Tammo

    2002-12-01

    The intelligibility of speech pronounced by non-native talkers is generally lower than speech pronounced by native talkers, especially under adverse conditions, such as high levels of background noise. The effect of foreign accent on speech intelligibility was investigated quantitatively through a series of experiments involving voices of 15 talkers, differing in language background, age of second-language (L2) acquisition and experience with the target language (Dutch). Overall speech intelligibility of L2 talkers in noise is predicted with a reasonable accuracy from accent ratings by native listeners, as well as from the self-ratings for proficiency of L2 talkers. For non-native speech, unlike native speech, the intelligibility of short messages (sentences) cannot be fully predicted by phoneme-based intelligibility tests. Although incorrect recognition of specific phonemes certainly occurs as a result of foreign accent, the effect of reduced phoneme recognition on the intelligibility of sentences may range from severe to virtually absent, depending on (for instance) the speech-to-noise ratio. Objective acoustic-phonetic analyses of accented speech were also carried out, but satisfactory overall predictions of speech intelligibility could not be obtained with relatively simple acoustic-phonetic measures.

  18. Computational neuroanatomy of speech production.

    Science.gov (United States)

    Hickok, Gregory

    2012-01-05

    Speech production has been studied predominantly from within two traditions, psycholinguistics and motor control. These traditions have rarely interacted, and the resulting chasm between these approaches seems to reflect a level of analysis difference: whereas motor control is concerned with lower-level articulatory control, psycholinguistics focuses on higher-level linguistic processing. However, closer examination of both approaches reveals a substantial convergence of ideas. The goal of this article is to integrate psycholinguistic and motor control approaches to speech production. The result of this synthesis is a neuroanatomically grounded, hierarchical state feedback control model of speech production.

  19. Speech enhancement theory and practice

    CERN Document Server

    Loizou, Philipos C

    2013-01-01

    With the proliferation of mobile devices and hearing devices, including hearing aids and cochlear implants, there is a growing and pressing need to design algorithms that can improve speech intelligibility without sacrificing quality. Responding to this need, Speech Enhancement: Theory and Practice, Second Edition introduces readers to the basic problems of speech enhancement and the various algorithms proposed to solve these problems. Updated and expanded, this second edition of the bestselling textbook broadens its scope to include evaluation measures and enhancement algorithms aimed at impr

  20. Look Who’s Talking NOW! Parentese Speech, Social Context, and Language Development Across Time

    Science.gov (United States)

    Ramírez-Esparza, Nairán; García-Sierra, Adrián; Kuhl, Patricia K.

    2017-01-01

    In previous studies, we found that the social interactions infants experience in their everyday lives at 11- and 14-months of age affect language ability at 24 months of age. These studies investigated relationships between the speech style (i.e., parentese speech vs. standard speech) and social context [i.e., one-on-one (1:1) vs. group] of language input in infancy and later speech development (i.e., at 24 months of age), controlling for socioeconomic status (SES). Results showed that the amount of exposure to parentese speech-1:1 in infancy was related to productive vocabulary at 24 months. The general goal of the present study was to investigate changes in (1) the pattern of social interactions between caregivers and their children from infancy to childhood and (2) relationships among speech style, social context, and language learning across time. Our study sample consisted of 30 participants from the previously published infant studies, evaluated at 33 months of age. Social interactions were assessed at home using digital first-person perspective recordings of the auditory environment. We found that caregivers use less parentese speech-1:1, and more standard speech-1:1, as their children get older. Furthermore, we found that the effects of parentese speech-1:1 in infancy on later language development at 24 months persist at 33 months of age. Finally, we found that exposure to standard speech-1:1 in childhood was the only social interaction that related to concurrent word production/use. Mediation analyses showed that standard speech-1:1 in childhood fully mediated the effects of parentese speech-1:1 in infancy on language development in childhood, controlling for SES. This study demonstrates that engaging in one-on-one interactions in infancy and later in life has important implications for language development. PMID:28676774

  1. Look Who’s Talking NOW! Parentese Speech, Social Context, and Language Development Across Time

    Directory of Open Access Journals (Sweden)

    Nairán Ramírez-Esparza

    2017-06-01

    Full Text Available In previous studies, we found that the social interactions infants experience in their everyday lives at 11- and 14-months of age affect language ability at 24 months of age. These studies investigated relationships between the speech style (i.e., parentese speech vs. standard speech and social context [i.e., one-on-one (1:1 vs. group] of language input in infancy and later speech development (i.e., at 24 months of age, controlling for socioeconomic status (SES. Results showed that the amount of exposure to parentese speech-1:1 in infancy was related to productive vocabulary at 24 months. The general goal of the present study was to investigate changes in (1 the pattern of social interactions between caregivers and their children from infancy to childhood and (2 relationships among speech style, social context, and language learning across time. Our study sample consisted of 30 participants from the previously published infant studies, evaluated at 33 months of age. Social interactions were assessed at home using digital first-person perspective recordings of the auditory environment. We found that caregivers use less parentese speech-1:1, and more standard speech-1:1, as their children get older. Furthermore, we found that the effects of parentese speech-1:1 in infancy on later language development at 24 months persist at 33 months of age. Finally, we found that exposure to standard speech-1:1 in childhood was the only social interaction that related to concurrent word production/use. Mediation analyses showed that standard speech-1:1 in childhood fully mediated the effects of parentese speech-1:1 in infancy on language development in childhood, controlling for SES. This study demonstrates that engaging in one-on-one interactions in infancy and later in life has important implications for language development.

  2. Look Who's Talking NOW! Parentese Speech, Social Context, and Language Development Across Time.

    Science.gov (United States)

    Ramírez-Esparza, Nairán; García-Sierra, Adrián; Kuhl, Patricia K

    2017-01-01

    In previous studies, we found that the social interactions infants experience in their everyday lives at 11- and 14-months of age affect language ability at 24 months of age. These studies investigated relationships between the speech style (i.e., parentese speech vs. standard speech) and social context [i.e., one-on-one (1:1) vs. group] of language input in infancy and later speech development (i.e., at 24 months of age), controlling for socioeconomic status (SES). Results showed that the amount of exposure to parentese speech-1:1 in infancy was related to productive vocabulary at 24 months. The general goal of the present study was to investigate changes in (1) the pattern of social interactions between caregivers and their children from infancy to childhood and (2) relationships among speech style, social context, and language learning across time. Our study sample consisted of 30 participants from the previously published infant studies, evaluated at 33 months of age. Social interactions were assessed at home using digital first-person perspective recordings of the auditory environment. We found that caregivers use less parentese speech-1:1, and more standard speech-1:1, as their children get older. Furthermore, we found that the effects of parentese speech-1:1 in infancy on later language development at 24 months persist at 33 months of age. Finally, we found that exposure to standard speech-1:1 in childhood was the only social interaction that related to concurrent word production/use. Mediation analyses showed that standard speech-1:1 in childhood fully mediated the effects of parentese speech-1:1 in infancy on language development in childhood, controlling for SES. This study demonstrates that engaging in one-on-one interactions in infancy and later in life has important implications for language development.

  3. Steganalysis of recorded speech

    Science.gov (United States)

    Johnson, Micah K.; Lyu, Siwei; Farid, Hany

    2005-03-01

    Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.

  4. Speech recovery device

    Energy Technology Data Exchange (ETDEWEB)

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  5. Silog: Speech Input Logon

    Science.gov (United States)

    Grau, Sergio; Allen, Tony; Sherkat, Nasser

    Silog is a biometrie authentication system that extends the conventional PC logon process using voice verification. Users enter their ID and password using a conventional Windows logon procedure but then the biometrie authentication stage makes a Voice over IP (VoIP) call to a VoiceXML (VXML) server. User interaction with this speech-enabled component then allows the user's voice characteristics to be extracted as part of a simple user/system spoken dialogue. If the captured voice characteristics match those of a previously registered voice profile, then network access is granted. If no match is possible, then a potential unauthorised system access has been detected and the logon process is aborted.

  6. Speech recovery device

    Energy Technology Data Exchange (ETDEWEB)

    Frankle, Christen M.

    2000-10-19

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  7. Join Cost for Unit Selection Speech Synthesis

    OpenAIRE

    Vepa, Jithendra

    2004-01-01

    Undoubtedly, state-of-the-art unit selection-based concatenative speech systems produce very high quality synthetic speech. this is due to a large speech database containing many instances of each speech unit, with a varied and natural distribution of prosodic and spectral characteristics. the join cost, which measures how well two units can be joined together is one of the main criteria for selecting appropriate units from this large speech database. The ideal join cost is one that measur...

  8. Speech distortion measure based on auditory properties

    Institute of Scientific and Technical Information of China (English)

    CHEN Guo; HU Xiulin; ZHANG Yunyu; ZHU Yaoting

    2000-01-01

    The Perceptual Spectrum Distortion (PSD), based on auditory properties of human being, is presented to measure speech distortion. The PSD measure calculates the speech distortion distance by simulating the auditory properties of human being and converting short-time speech power spectrum to auditory perceptual spectrum. Preliminary simulative experiments in comparison with the Itakura measure have been done. The results show that the PSD measure is a perferable speech distortion measure and more consistent with subjective assessment of speech quality.

  9. A NOVEL APPROACH TO STUTTERED SPEECH CORRECTION

    OpenAIRE

    Alim Sabur Ajibola; Nahrul Khair bin Alang Md. Rashid; Wahju Sediono; Nik Nur Wahidah Nik Hashim

    2016-01-01

    Stuttered speech is a dysfluency rich speech, more prevalent in males than females. It has been associated with insufficient air pressure or poor articulation, even though the root causes are more complex. The primary features include prolonged speech and repetitive speech, while some of its secondary features include, anxiety, fear, and shame. This study used LPC analysis and synthesis algorithms to reconstruct the stuttered speech. The results were evaluated using cepstral distance, Itakura...

  10. Inhibitory Control Predicts Language Switching Performance in Trilingual Speech Production

    Science.gov (United States)

    Linck, Jared A.; Schwieter, John W.; Sunderman, Gretchen

    2012-01-01

    This study investigated the role of domain-general inhibitory control in trilingual speech production. Taking an individual differences approach, we examined the relationship between performance on a non-linguistic measure of inhibitory control (the Simon task) and a multilingual language switching task for a group of fifty-six native English (L1)…

  11. Hate Speech: Political Correctness v. the First Amendment.

    Science.gov (United States)

    Stern, Ralph D.

    Both freedom of speech and freedom from discrimination are generally accepted expressions of public policy. The application of these policies, however, leads to conflicts that pose both practical and conceptual problems. This paper presents a review of court litigation and addresses the question of how to reconcile the conflicting societal goals…

  12. Refining a model of hearing impairment using speech psychophysics

    DEFF Research Database (Denmark)

    Jepsen, Morten Løve; Dau, Torsten; Ghitza, Oded

    2014-01-01

    The premise of this study is that models of hearing, in general, and of individual hearing impairment, in particular, can be improved by using speech test results as an integral part of the modeling process. A conceptual iterative procedure is presented which, for an individual, considers measures...

  13. Theater, Speech, Light

    Directory of Open Access Journals (Sweden)

    Primož Vitez

    2011-07-01

    Full Text Available This paper considers a medium as a substantial translator: an intermediary between the producers and receivers of a communicational act. A medium is a material support to the spiritual potential of human sources. If the medium is a support to meaning, then the relations between different media can be interpreted as a space for making sense of these meanings, a generator of sense: it means that the interaction of substances creates an intermedial space that conceives of a contextualization of specific meaningful elements in order to combine them into the sense of a communicational intervention. The theater itself is multimedia. A theatrical event is a communicational act based on a combination of several autonomous structures: text, scenography, light design, sound, directing, literary interpretation, speech, and, of course, the one that contains all of these: the actor in a human body. The actor is a physical and symbolic, anatomic, and emblematic figure in the synesthetic theatrical act because he reunites in his body all the essential principles and components of theater itself. The actor is an audio-visual being, made of kinetic energy, speech, and human spirit. The actor’s body, as a source, instrument, and goal of the theater, becomes an intersection of sound and light. However, theater as intermedial art is no intermediate practice; it must be seen as interposing bodies between conceivers and receivers, between authors and auditors. The body is not self-evident; the body in contemporary art forms is being redefined as a privilege. The art needs bodily dimensions to explore the medial qualities of substances: because it is alive, it returns to studying biology. The fact that theater is an archaic art form is also the purest promise of its future.

  14. Speech of people with autism: Echolalia and echolalic speech

    OpenAIRE

    Błeszyński, Jacek Jarosław

    2013-01-01

    Speech of people with autism is recognised as one of the basic diagnostic, therapeutic and theoretical problems. One of the most common symptoms of autism in children is echolalia, described here as being of different types and severity. This paper presents the results of studies into different levels of echolalia, both in normally developing children and in children diagnosed with autism, discusses the differences between simple echolalia and echolalic speech - which can be considered to b...

  15. 在“思想道德修养与法律基础”课教学中贯彻习近平总书记系列重要讲话的思考%On Thoughts of Carrying out the Series of Important Speeches of General Secretary XI Jinping in Teaching of " Moral Cultivation and Legal Basis"

    Institute of Scientific and Technical Information of China (English)

    李红梅

    2015-01-01

    在“思想道德修养与法律基础”(以下简称“基础”)课中融入习近平总书记系列重要讲话,是高校思政课的现实要求和逻辑必然。“系列讲话”为“基础”课提供科学的理论指导和丰富的教学素材,有利于提升教师的理论素养和政治素质,引领大学生树立正确的道德观、政治观和法律观。教师应当根据“宏观驾驭,微观放活”的思路,革新教学方法,采取喜闻乐道的方式将“系列讲话”与“基础”课各章节的教学内容有机结合起来,真正做到“系列讲话”“进教材、进课堂、进头脑”。%Carrying out the series of important speeches of General Secretary Xi Jinping in the teaching of"Moral Cultivation and Legal Basis"is required by the reality and logic of ideological and political lessons in colleges and universities. The series of speeches provide theoretical guidance and abundant teaching materials for the "Basis"course, which helps to improve the theoretical and political abilities of the faculty and encourages college students to establish correct moral, political and legal outlooks. Teachers should innovate teaching methods based on the ideas of "macro-control, micro-deregulation" to combine the series of important speeches with each chapter of the lesson content in a delightful way, thus truly achieving "into the teaching material, into the classroom, into the brain".

  16. An acoustic feature-based similarity scoring system for speech rehabilitation assistance.

    Science.gov (United States)

    Syauqy, Dahnial; Wu, Chao-Min; Setyawati, Onny

    2016-08-01

    The purpose of this study is to develop a tool to assist speech therapy and rehabilitation, which focused on automatic scoring based on the comparison of the patient's speech with another normal speech on several aspects including pitch, vowel, voiced-unvoiced segments, strident fricative and sound intensity. The pitch estimation employed the use of cepstrum-based algorithm for its robustness; the vowel classification used multilayer perceptron (MLP) to classify vowel from pitch and formants; and the strident fricative detection was based on the major peak spectral intensity, location and the pitch existence in the segment. In order to evaluate the performance of the system, this study analyzed eight patient's speech recordings (four males, four females; 4-58-years-old), which had been recorded in previous study in cooperation with Taipei Veterans General Hospital and Taoyuan General Hospital. The experiment result on pitch algorithm showed that the cepstrum method had 5.3% of gross pitch error from a total of 2086 frames. On the vowel classification algorithm, MLP method provided 93% accuracy (men), 87% (women) and 84% (children). In total, the overall results showed that 156 tool's grading results (81%) were consistent compared to 192 audio and visual observations done by four experienced respondents. Implication for Rehabilitation Difficulties in communication may limit the ability of a person to transfer and exchange information. The fact that speech is one of the primary means of communication has encouraged the needs of speech diagnosis and rehabilitation. The advances of technology in computer-assisted speech therapy (CAST) improve the quality, time efficiency of the diagnosis and treatment of the disorders. The present study attempted to develop tool to assist speech therapy and rehabilitation, which provided simple interface to let the assessment be done even by the patient himself without the need of particular knowledge of speech processing while at the

  17. Proficiency and Linguistic Complexity Influence Speech Motor Control and Performance in Spanish Language Learners.

    Science.gov (United States)

    Nip, Ignatius S B; Blumenfeld, Henrike K

    2015-06-01

    Second-language (L2) production requires greater cognitive resources to inhibit the native language and to retrieve less robust lexical representations. The current investigation identifies how proficiency and linguistic complexity, specifically syntactic and lexical factors, influence speech motor control and performance. Speech movements of 29 native English speakers with low or high proficiency in Spanish were recorded while producing simple and syntactically complex sentences in English and Spanish. Sentences were loaded with cognate (e.g., baby-bebé) or noncognate (e.g., dog-perro) words. Effects of proficiency, lexicality (cognate vs. noncognate), and syntactic complexity on maximum speed, range of movement, duration, and speech movement variability were examined. In general, speakers with lower L2 proficiency differed in their speech motor control and performance from speakers with higher L2 proficiency. Speakers with higher L2 proficiency generally had less speech movement variability, shorter phrase durations, greater maximum speeds, and greater ranges of movement. In addition, lexicality and syntactic complexity affected speech motor control and performance. L2 proficiency, lexicality, and syntactic complexity influence speech motor control and performance in adult L2 learners. Information about relationships between speech motor control, language proficiency, and cognitive-linguistic demands may be used to assess and treat bilingual clients and language learners.

  18. Speech Telepractice: Installing a Speech Therapy Upgrade for the 21st Century

    Directory of Open Access Journals (Sweden)

    Michael P. Towey

    2012-12-01

    Full Text Available Much of speech therapy involves the clinician guiding the therapeutic process (e.g., presenting stimuli and eliciting client responses; however, this Brief Communication describes a different approach to speech therapy delivery. Clinicians at Waldo County General Hospital (WCGH use high definition audio and video to engage clients in telepractice using interactive web-based virtual environments. This technology enables clients and their clinicians to co-create salient treatment activities using authentic materials captured via digital cameras, video and/or curricular materials.  Both therapists and clients manipulate the materials and interact online in real-time. The web-based technology engenders highly personalized and engaging activities, such that clients’ interactions with these high interest tasks often continue well beyond the therapy sessions.

  19. Speech telepractice: installing a speech therapy upgrade for the 21st century.

    Science.gov (United States)

    Towey, Michael P

    2012-01-01

    Much of speech therapy involves the clinician guiding the therapeutic process (e.g., presenting stimuli and eliciting client responses). However, this Brief Communication describes a different approach to speech therapy delivery. Clinicians at Waldo County General Hospital (WCGH) use high definition audio and video to engage clients in telepractice using interactive web-based virtual environments. This technology enables clients and their clinicians to co-create salient treatment activities using authentic materials captured via digital cameras, video and/or curricular materials. Both therapists and clients manipulate the materials and interact online in real-time. The web-based technology engenders highly personalized and engaging activities, such that clients' interactions with these high interest tasks often continue well beyond the therapy sessions.

  20. [Perception of emotions in speech. A review of psychological and physiological research].

    Science.gov (United States)

    Kislova, O O; Rusalova, M N

    2013-01-01

    The article is a review of the general concepts and approaches in research of recognition of emotions in speech: psychological concepts, principles and methods of study and physiological data in studies on animals and human. The concepts of emotional intelligence (ability to understand and recognize emotions of other people and to understand and regulate personal emotions), emotional hearing (ability to recognize emotions in speech) are discussed, general review of the paradigms is presented. The research of brain mechanisms of speech emotions differentiation is based on the study of local injuries and dysfunctions, along with the study on healthy subjects.

  1. Phoneme Compression: processing of the speech signal and effects on speech intelligibility in hearing-Impaired listeners

    NARCIS (Netherlands)

    A. Goedegebure (Andre)

    2005-01-01

    textabstractHearing-aid users often continue to have problems with poor speech understanding in difficult acoustical conditions. Another generally accounted problem is that certain sounds become too loud whereas other sounds are still not audible. Dynamic range compression is a signal processing tec

  2. Theta Brain Rhythms Index Perceptual Narrowing in Infant Speech Perception

    Directory of Open Access Journals (Sweden)

    Alexis eBosseler

    2013-10-01

    Full Text Available The development of speech perception shows a dramatic transition between infancy and adulthood. Between 6 and 12 months, infants’ initial ability to discriminate all phonetic units across the worlds’ languages narrows—native discrimination increases while nonnative discrimination shows a steep decline. We used magnetoencephalography (MEG to examine whether brain oscillations in the theta band (4-8Hz, reflecting increases in attention and cognitive effort, would provide a neural measure of the perceptual narrowing phenomenon in speech. Using an oddball paradigm, we varied speech stimuli in two dimensions, stimulus frequency (frequent vs. infrequent and language (native vs. nonnative speech syllables and tested 6-month-old infants, 12-month-old infants, and adults. We hypothesized that 6-month-old infants would show increased relative theta power (RTP for frequent syllables, regardless of their status as native or nonnative syllables, reflecting young infants’ attention and cognitive effort in response to highly frequent stimuli (statistical learning. In adults, we hypothesized increased RTP for nonnative stimuli, regardless of their presentation frequency, reflecting increased cognitive effort for nonnative phonetic categories. The 12-month-old infants were expected to show a pattern in transition, but one more similar to adults than to 6-month-old infants. The MEG brain rhythm results supported these hypotheses. We suggest that perceptual narrowing in speech perception is governed by an implicit learning process. This learning process involves an implicit shift in attention from frequent events (infants to learned categories (adults. Theta brain oscillatory activity may provide an index of perceptual narrowing beyond speech, and would offer a test of whether the early speech learning process is governed by domain-general or domain-specific processes.

  3. Emotion Recognition using Speech Features

    CERN Document Server

    Rao, K Sreenivasa

    2013-01-01

    “Emotion Recognition Using Speech Features” covers emotion-specific features present in speech and discussion of suitable models for capturing emotion-specific information for distinguishing different emotions.  The content of this book is important for designing and developing  natural and sophisticated speech systems. Drs. Rao and Koolagudi lead a discussion of how emotion-specific information is embedded in speech and how to acquire emotion-specific knowledge using appropriate statistical models. Additionally, the authors provide information about using evidence derived from various features and models. The acquired emotion-specific knowledge is useful for synthesizing emotions. Discussion includes global and local prosodic features at syllable, word and phrase levels, helpful for capturing emotion-discriminative information; use of complementary evidences obtained from excitation sources, vocal tract systems and prosodic features in order to enhance the emotion recognition performance;  and pro...

  4. Why Go to Speech Therapy?

    Science.gov (United States)

    ... a Difference (PDF) Brief History About The Founder Corporate Directors Audit The Facts FAQ Basic Research Resources ... teens who stutter make positive changes in their communication skills. As you work with your speech pathologist ...

  5. English Speeches Of Three Minutes

    Institute of Scientific and Technical Information of China (English)

    凌和军; 丁小琴

    2002-01-01

    English speeches, which were made at the beginning of this term, are popular among us, English learners, as it is very useful for us to improve our spoken English. So each of us feels very interested te join the activity.

  6. Speech and Language Developmental Milestones

    Science.gov (United States)

    ... also use special spoken tests to evaluate your child. A hearing test is often included in the evaluation because a hearing problem can affect speech and language development. Depending on the result of the evaluation, the ...

  7. Writing, Inner Speech, and Meditation.

    Science.gov (United States)

    Moffett, James

    1982-01-01

    Examines the interrelationships among meditation, inner speech (stream of consciousness), and writing. Considers the possibilities and implications of using the techniques of meditation in educational settings, especially in the writing classroom. (RL)

  8. Delayed Speech or Language Development

    Science.gov (United States)

    ... What Parents Can Do en español Retraso en el desarrollo del habla o del lenguaje Your son ... for communication exchange and participation? What kind of feedback does the child get? When speech, language, hearing, ...

  9. Perceptual learning of interrupted speech.

    Directory of Open Access Journals (Sweden)

    Michel Ruben Benard

    Full Text Available The intelligibility of periodically interrupted speech improves once the silent gaps are filled with noise bursts. This improvement has been attributed to phonemic restoration, a top-down repair mechanism that helps intelligibility of degraded speech in daily life. Two hypotheses were investigated using perceptual learning of interrupted speech. If different cognitive processes played a role in restoring interrupted speech with and without filler noise, the two forms of speech would be learned at different rates and with different perceived mental effort. If the restoration benefit were an artificial outcome of using the ecologically invalid stimulus of speech with silent gaps, this benefit would diminish with training. Two groups of normal-hearing listeners were trained, one with interrupted sentences with the filler noise, and the other without. Feedback was provided with the auditory playback of the unprocessed and processed sentences, as well as the visual display of the sentence text. Training increased the overall performance significantly, however restoration benefit did not diminish. The increase in intelligibility and the decrease in perceived mental effort were relatively similar between the groups, implying similar cognitive mechanisms for the restoration of the two types of interruptions. Training effects were generalizable, as both groups improved their performance also with the other form of speech than that they were trained with, and retainable. Due to null results and relatively small number of participants (10 per group, further research is needed to more confidently draw conclusions. Nevertheless, training with interrupted speech seems to be effective, stimulating participants to more actively and efficiently use the top-down restoration. This finding further implies the potential of this training approach as a rehabilitative tool for hearing-impaired/elderly populations.

  10. The interlanguage speech intelligibility benefit

    Science.gov (United States)

    Bent, Tessa; Bradlow, Ann R.

    2003-09-01

    This study investigated how native language background influences the intelligibility of speech by non-native talkers for non-native listeners from either the same or a different native language background as the talker. Native talkers of Chinese (n=2), Korean (n=2), and English (n=1) were recorded reading simple English sentences. Native listeners of English (n=21), Chinese (n=21), Korean (n=10), and a mixed group from various native language backgrounds (n=12) then performed a sentence recognition task with the recordings from the five talkers. Results showed that for native English listeners, the native English talker was most intelligible. However, for non-native listeners, speech from a relatively high proficiency non-native talker from the same native language background was as intelligible as speech from a native talker, giving rise to the ``matched interlanguage speech intelligibility benefit.'' Furthermore, this interlanguage intelligibility benefit extended to the situation where the non-native talker and listeners came from different language backgrounds, giving rise to the ``mismatched interlanguage speech intelligibility benefit.'' These findings shed light on the nature of the talker-listener interaction during speech communication.

  11. Enhancement of speech signals - with a focus on voiced speech models

    DEFF Research Database (Denmark)

    Nørholm, Sidsel Marie

    This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based on this mo......This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...

  12. Transformation of feelings using pitch parameter for Marathi speech

    Directory of Open Access Journals (Sweden)

    Sangramsing N. Kayte

    2015-11-01

    Full Text Available The Many researches have been done in the transformation of emotion. However, for Marathi not many studies have been done. In this paper we construct a Marathi speech database to study the effects of change of emotion. Emotion is an important element in expressive speech synthesis and is investigated by many researchers. In this research paper we build a Marathi speech database to study the effects of change of emotion. We describe methods to optimize the database for analysis and study. The pitch information is extracted from the database for different emotions like Joy, Angry and Sad. Pitch analysis is done on the database using the extracted pitch points, and a general algorithm is devised for the change of neutral state to emotional state. To perform the experiments, three expressive styles- Joy, Anger and Sad are done with Neutral.

  13. Linguistic Processing of Accented Speech Across the Lifespan

    Directory of Open Access Journals (Sweden)

    Alejandrina eCristia

    2012-11-01

    Full Text Available In most of the world, people have regular exposure to multiple accents. Therefore, learning to quickly process accented speech is a prerequisite to successful communication. In this paper, we examine work on the perception of accented speech across the lifespan, from early infancy to late adulthood. Unfamiliar accents initially impair linguistic processing by infants, children, younger adults, and older adults, but listeners of all ages come to adapt to accented speech. Emergent research also goes beyond these perceptual abilities, by assessing links with production and the relative contributions of linguistic knowledge and general cognitive skills. We conclude by underlining points of convergence across ages, and the gaps left to face in future work.

  14. Control of vocal-tract length in speech

    Energy Technology Data Exchange (ETDEWEB)

    Riordan, C.J.

    1977-10-01

    Essential for the correct production of vowels is the accurate control of vocal-tract length. Perkell (Psychology of Speech Production (MIT, Cambridge, MA, 1969)) has suggested that two important determinants of vocal-tract length are vertical larynx position and lip spreading/protrusion, often acting together. The present study was designed to determine whether constraining lip spreading/protrusion induces compensatory vertical larynx displacements, particularly on rounded vowels. Upper lip and larynx movement were monitored photoelectrically while French and Mandarin native speakers produced the vowels /i,y,u/ first under normal-speech conditions and then with lip activity constrained. Significant differences were found in upper-lip protrusion and larynx position depending on the vowel uttered. Moreover, the generally low-larynx position of rounded vowels became even lower when lip protrusion was constrained. These results imply that compensatory articulations contribute to a contrast-preserving strategy in speech production.

  15. Cross-language and second language speech perception

    DEFF Research Database (Denmark)

    Bohn, Ocke-Schwen

    2017-01-01

    in cross-language and second language speech perception research: The mapping issue (the perceptual relationship of sounds of the native and the nonnative language in the mind of the native listener and the L2 learner), the perceptual and learning difficulty/ease issue (how this relationship may or may...... not cause perceptual and learning difficulty), and the plasticity issue (whether and how experience with the nonnative language affects the perceptual organization of speech sounds in the mind of L2 learners). One important general conclusion from this research is that perceptual learning is possible at all......This chapter provides an overview of the main research questions and findings in the areas of second language and cross-language speech perception research, and of the most widely used models that have guided this research. The overview is structured in a way that addresses three overarching topics...

  16. An overview of the SPHINX speech recognition system

    Science.gov (United States)

    Lee, Kai-Fu; Hon, Hsiao-Wuen; Reddy, Raj

    1990-01-01

    A description is given of SPHINX, a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition. SPHINX is based on discrete hidden Markov models (HMMs) with linear-predictive-coding derived parameters. To provide speaker independence, knowledge was added to these HMMs in several ways: multiple codebooks of fixed-width parameters, and an enhanced recognizer with carefully designed models and word-duration modeling. To deal with coarticulation in continuous speech, yet still adequately represent a large vocabulary, two new subword speech units are introduced: function-word-dependent phone models and generalized triphone models. With grammars of perplexity 997, 60, and 20, SPHINX attained word accuracies of 71, 94, and 96 percent, respectively, on a 997-word task.

  17. Neuromotor speech impairment: it's all in the talking.

    Science.gov (United States)

    Ziegler, Wolfram; Ackermann, Hermann

    2013-01-01

    The aim of this article is to explicate the uniqueness of the motor activity implied in spoken language production and to emphasize how important it is, from a theoretical and a clinical perspective, to consider the motor events associated with speaking as domain-specific, i.e., as pertaining to the domain of linguistic expression. First, phylogenetic data are reviewed demonstrating the specificity of the human vocal tract motor network regarding (i) the entrenchment of laryngeal motor skills within the organization of vocal tract movements, (ii) the evolution of a neural basis for skill acquisition within this system, and (iii) the integration of this system into an auditory-motor network. Second, ontogenetic evidence and existing knowledge about the experience-dependent plasticity of the brain are reported to explicate that during speech acquisition the vocal tract motor system is constrained by universal properties of speech production and by the specific phonological properties of the speaker's ambient language. Third, clinical data from dysarthria and apraxia of speech provide the background for a discussion about the theoretical underpinnings of domain-general versus domain-specific views of speech motor control. The article ends with a brief sketch of a holistic neurophonetic approach in experimental inquiries, assessment, and treatment of neuromotor speech impairment. Copyright © 2013 S. Karger AG, Basel.

  18. Feedback delays eliminate auditory-motor learning in speech production.

    Science.gov (United States)

    Max, Ludo; Maffett, Derek G

    2015-03-30

    Neurologically healthy individuals use sensory feedback to alter future movements by updating internal models of the effector system and environment. For example, when visual feedback about limb movements or auditory feedback about speech movements is experimentally perturbed, the planning of subsequent movements is adjusted - i.e., sensorimotor adaptation occurs. A separate line of studies has demonstrated that experimentally delaying the sensory consequences of limb movements causes the sensory input to be attributed to external sources rather than to one's own actions. Yet similar feedback delays have remarkably little effect on visuo-motor adaptation (although the rate of learning varies, the amount of adaptation is only moderately affected with delays of 100-200ms, and adaptation still occurs even with a delay as long as 5000ms). Thus, limb motor learning remains largely intact even in conditions where error assignment favors external factors. Here, we show a fundamentally different result for sensorimotor control of speech articulation: auditory-motor adaptation to formant-shifted feedback is completely eliminated with delays of 100ms or more. Thus, for speech motor learning, real-time auditory feedback is critical. This novel finding informs theoretical models of human motor control in general and speech motor control in particular, and it has direct implications for the application of motor learning principles in the habilitation and rehabilitation of individuals with various sensorimotor speech disorders. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  19. Speech spectrum's correlation with speakers' Eysenck personality traits.

    Science.gov (United States)

    Hu, Chao; Wang, Qiandong; Short, Lindsey A; Fu, Genyue

    2012-01-01

    The current study explored the correlation between speakers' Eysenck personality traits and speech spectrum parameters. Forty-six subjects completed the Eysenck Personality Questionnaire. They were instructed to verbally answer the questions shown on a computer screen and their responses were recorded by the computer. Spectrum parameters of /sh/ and /i/ were analyzed by Praat voice software. Formant frequencies of the consonant /sh/ in lying responses were significantly lower than that in truthful responses, whereas no difference existed on the vowel /i/ speech spectrum. The second formant bandwidth of the consonant /sh/ speech spectrum was significantly correlated with the personality traits of Psychoticism, Extraversion, and Neuroticism, and the correlation differed between truthful and lying responses, whereas the first formant frequency of the vowel /i/ speech spectrum was negatively correlated with Neuroticism in both response types. The results suggest that personality characteristics may be conveyed through the human voice, although the extent to which these effects are due to physiological differences in the organs associated with speech or to a general Pygmalion effect is yet unknown.

  20. Novel Techniques for Dialectal Arabic Speech Recognition

    CERN Document Server

    Elmahdy, Mohamed; Minker, Wolfgang

    2012-01-01

    Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...

  1. Impaired motor speech performance in Huntington's disease.

    Science.gov (United States)

    Skodda, Sabine; Schlegel, Uwe; Hoffmann, Rainer; Saft, Carsten

    2014-04-01

    Dysarthria is a common symptom of Huntington's disease and has been reported, besides other features, to be characterized by alterations of speech rate and regularity. However, data on the specific pattern of motor speech impairment and their relationship to other motor and neuropsychological symptoms are sparse. Therefore, the aim of the present study was to describe and objectively analyse different speech parameters with special emphasis on the aspect of speech timing of connected speech and non-speech verbal utterances. 21 patients with manifest Huntington's disease and 21 age- and gender-matched healthy controls had to perform a reading task and several syllable repetition tasks. Computerized acoustic analysis of different variables for the measurement of speech rate and regularity generated a typical pattern of impaired motor speech performance with a reduction of speech rate, an increase of pauses and a marked disability to steadily repeat single syllables. Abnormalities of speech parameters were more pronounced in the subgroup of patients with Huntington's disease receiving antidopaminergic medication, but were also present in the drug-naïve patients. Speech rate related to connected speech and parameters of syllable repetition showed correlations to overall motor impairment, capacity of tapping in a quantitative motor assessment and some score of cognitive function. After these preliminary data, further investigations on patients in different stages of disease are warranted to survey if the analysis of speech and non-speech verbal utterances might be a helpful additional tool for the monitoring of functional disability in Huntington's disease.

  2. Tactile Modulation of Emotional Speech Samples

    Directory of Open Access Journals (Sweden)

    Katri Salminen

    2012-01-01

    Full Text Available Traditionally only speech communicates emotions via mobile phone. However, in daily communication the sense of touch mediates emotional information during conversation. The present aim was to study if tactile stimulation affects emotional ratings of speech when measured with scales of pleasantness, arousal, approachability, and dominance. In the Experiment 1 participants rated speech-only and speech-tactile stimuli. The tactile signal mimicked the amplitude changes of the speech. In the Experiment 2 the aim was to study whether the way the tactile signal was produced affected the ratings. The tactile signal either mimicked the amplitude changes of the speech sample in question, or the amplitude changes of another speech sample. Also, concurrent static vibration was included. The results showed that the speech-tactile stimuli were rated as more arousing and dominant than the speech-only stimuli. The speech-only stimuli were rated as more approachable than the speech-tactile stimuli, but only in the Experiment 1. Variations in tactile stimulation also affected the ratings. When the tactile stimulation was static vibration the speech-tactile stimuli were rated as more arousing than when the concurrent tactile stimulation was mimicking speech samples. The results suggest that tactile stimulation offers new ways of modulating and enriching the interpretation of speech.

  3. An Approach to Hide Secret Speech Information

    Institute of Scientific and Technical Information of China (English)

    WU Zhi-jun; DUAN Hai-xin; LI Xing

    2006-01-01

    This paper presented an approach to hide secret speech information in code excited linear prediction(CELP)-based speech coding scheme by adopting the analysis-by-synthesis (ABS)-based algorithm of speech information hiding and extracting for the purpose of secure speech communication. The secret speech is coded in 2.4Kb/s mixed excitation linear prediction (MELP), which is embedded in CELP type public speech. The ABS algorithm adopts speech synthesizer in speech coder. Speech embedding and coding are synchronous, i.e. a fusion of speech information data of public and secret. The experiment of embedding 2.4 Kb/s MELP secret speech in G.728 scheme coded public speech transmitted via public switched telephone network (PSTN) shows that the proposed approach satisfies the requirements of information hiding, meets the secure communication speech quality constraints, and achieves high hiding capacity of average 3.2 Kb/s with an excellent speech quality and complicating speakers' recognition.

  4. Speech sound discrimination training improves auditory cortex responses in a rat model of autism

    Directory of Open Access Journals (Sweden)

    Crystal T Engineer

    2014-08-01

    Full Text Available Children with autism often have language impairments and degraded cortical responses to speech. Extensive behavioral interventions can improve language outcomes and cortical responses. Prenatal exposure to the antiepileptic drug valproic acid (VPA increases the risk for autism and language impairment. Prenatal exposure to VPA also causes weaker and delayed auditory cortex responses in rats. In this study, we document speech sound discrimination ability in VPA exposed rats and document the effect of extensive speech training on auditory cortex responses. VPA exposed rats were significantly impaired at consonant, but not vowel, discrimination. Extensive speech training resulted in both stronger and faster anterior auditory field responses compared to untrained VPA exposed rats, and restored responses to control levels. This neural response improvement generalized to non-trained sounds. The rodent VPA model of autism may be used to improve the understanding of speech processing in autism and contribute to improving language outcomes.

  5. Speech sound discrimination training improves auditory cortex responses in a rat model of autism

    Science.gov (United States)

    Engineer, Crystal T.; Centanni, Tracy M.; Im, Kwok W.; Kilgard, Michael P.

    2014-01-01

    Children with autism often have language impairments and degraded cortical responses to speech. Extensive behavioral interventions can improve language outcomes and cortical responses. Prenatal exposure to the antiepileptic drug valproic acid (VPA) increases the risk for autism and language impairment. Prenatal exposure to VPA also causes weaker and delayed auditory cortex responses in rats. In this study, we document speech sound discrimination ability in VPA exposed rats and document the effect of extensive speech training on auditory cortex responses. VPA exposed rats were significantly impaired at consonant, but not vowel, discrimination. Extensive speech training resulted in both stronger and faster anterior auditory field (AAF) responses compared to untrained VPA exposed rats, and restored responses to control levels. This neural response improvement generalized to non-trained sounds. The rodent VPA model of autism may be used to improve the understanding of speech processing in autism and contribute to improving language outcomes. PMID:25140133

  6. Do age-related word retrieval difficulties appear (or disappear) in connected speech?

    Science.gov (United States)

    Kavé, Gitit; Goral, Mira

    2017-09-01

    We conducted a comprehensive literature review of studies of word retrieval in connected speech in healthy aging and reviewed relevant aphasia research that could shed light on the aging literature. Four main hypotheses guided the review: (1) Significant retrieval difficulties would lead to reduced output in connected speech. (2) Significant retrieval difficulties would lead to a more limited lexical variety in connected speech. (3) Significant retrieval difficulties would lead to an increase in word substitution errors and in pronoun use as well as to greater dysfluency and hesitation in connected speech. (4) Retrieval difficulties on tests of single-word production would be associated with measures of word retrieval in connected speech. Studies on aging did not confirm these four hypotheses, unlike studies on aphasia that generally did. The review suggests that future research should investigate how context facilitates word production in old age.

  7. Noise-robust cortical tracking of attended speech in real-world acoustic scenes

    DEFF Research Database (Denmark)

    Fuglsang, Søren; Dau, Torsten; Hjortkjær, Jens

    2017-01-01

    Selectively attending to one speaker in a multi-speaker scenario is thought to synchronize low-frequency cortical activity to the attended speech signal. In recent studies, reconstruction of speech from single-trial electroencephalogram (EEG) data has been used to decode which talker a listener...... is attending to in a two-talker situation. It is currently unclear how this generalizes to more complex sound environments. Behaviorally, speech perception is robust to the acoustic distortions that listeners typically encounter in everyday life, but it is unknown whether this is mirrored by a noise......-robust neural tracking of attended speech. Here we used advanced acoustic simulations to recreate real-world acoustic scenes in the laboratory. In virtual acoustic realities with varying amounts of reverberation and number of interfering talkers, listeners selectively attended to the speech stream...

  8. Child directed speech, speech in noise and hyperarticulated speech in the Pacific Northwest

    Science.gov (United States)

    Wright, Richard; Carmichael, Lesley; Beckford Wassink, Alicia; Galvin, Lisa

    2001-05-01

    Three types of exaggerated speech are thought to be systematic responses to accommodate the needs of the listener: child-directed speech (CDS), hyperspeech, and the Lombard response. CDS (e.g., Kuhl et al., 1997) occurs in interactions with young children and infants. Hyperspeech (Johnson et al., 1993) is a modification in response to listeners difficulties in recovering the intended message. The Lombard response (e.g., Lane et al., 1970) is a compensation for increased noise in the signal. While all three result from adaptations to accommodate the needs of the listener, and therefore should share some features, the triggering conditions are quite different, and therefore should exhibit differences in their phonetic outcomes. While CDS has been the subject of a variety of acoustic studies, it has never been studied in the broader context of the other ``exaggerated'' speech styles. A large crosslinguistic study was undertaken that compares speech produced under four conditions: spontaneous conversations, CDS aimed at 6-9-month-old infants, hyperarticulated speech, and speech in noise. This talk will present some findings for North American English as spoken in the Pacific Northwest. The measures include f0, vowel duration, F1 and F2 at vowel midpoint, and intensity.

  9. Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

    Science.gov (United States)

    Larm, Petra; Hongisto, Valtteri

    2006-02-01

    During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.

  10. NICT/ATR Chinese-Japanese-English Speech-to-Speech Translation System

    Institute of Scientific and Technical Information of China (English)

    Tohru Shimizu; Yutaka Ashikari; Eiichiro Sumita; ZHANG Jinsong; Satoshi Nakamura

    2008-01-01

    This paper describes the latest version of the Chinese-Japanese-English handheld speech-to-speech translation system developed by NICT/ATR,which is now ready to be deployed for travelers.With the entire speech-to-speech translation function being implemented into one terminal,it realizes real-time,location-free speech-to-speech translation.A new noise-suppression technique notably improves the speech recognition performance.Corpus-based approaches of speech recognition,machine translation,and speech synthesis enable coverage of a wide variety of topics and portability to other languages.Test results show that the character accuracy of speech recognition is 82%-94% for Chinese speech,with a bilingual evaluation understudy score of machine translation is 0.55-0.74 for Chinese-Japanese and Chinese-English.

  11. Critical Thinking Process in English Speech

    Institute of Scientific and Technical Information of China (English)

    WANG Jia-li

    2016-01-01

    With the development of mass media, English speech has become an important way for international cultural exchange in the context of globalization. Whether it is a political speech, a motivational speech, or an ordinary public speech, the wisdom and charm of critical thinking are always given into full play. This study analyzes the cultivation of critical thinking in English speech with the aid of representative examples, which is significant for cultivating college students’critical thinking as well as developing their critical thinking skills in English speech.

  12. Experimental Study of Generalized Subspace Filters for the Cocktail Party Situation

    DEFF Research Database (Denmark)

    Christensen, Knud Bank; Christensen, Mads Græsbøll; Boldt, Jesper B.

    2016-01-01

    This paper investigates the potential performance of generalized subspace filters for speech enhancement in cocktail party situations with very poor signal/noise ratio, e.g. down to -15 dB. Performance metrics output signal/noise ratio, signal/ distortion ratio, speech quality rating and speech...

  13. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    Energy Technology Data Exchange (ETDEWEB)

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2006-02-14

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  14. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    Energy Technology Data Exchange (ETDEWEB)

    Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

    2006-08-08

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  15. System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech

    Energy Technology Data Exchange (ETDEWEB)

    Burnett, Greg C.; Holzrichter, John F.; Ng, Lawrence C.

    2004-03-23

    The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

  16. Effect of concurrent walking and interlocutor distance on conversational speech intensity and rate in Parkinson's disease.

    Science.gov (United States)

    McCaig, Cassandra M; Adams, Scott G; Dykstra, Allyson D; Jog, Mandar

    2016-01-01

    Previous studies have demonstrated a negative effect of concurrent walking and talking on gait in Parkinson's disease (PD) but there is limited information about the effect of concurrent walking on speech production. The present study examined the effect of sitting, standing, and three concurrent walking tasks (slow, normal, fast) on conversational speech intensity and speech rate in fifteen individuals with hypophonia related to idiopathic Parkinson's disease (PD) and fourteen age-equivalent controls. Interlocuter (talker-to-talker) distance effects and walking speed were also examined. Concurrent walking was found to produce a significant increase in speech intensity, relative to standing and sitting, in both the control and PD groups. Faster walking produced significantly greater speech intensity than slower walking. Concurrent walking had no effect on speech rate. Concurrent walking and talking produced significant reductions in walking speed in both the control and PD groups. In general, the results of the present study indicate that concurrent walking tasks and the speed of concurrent walking can have a significant positive effect on conversational speech intensity. These positive, "energizing" effects need to be given consideration in future attempts to develop a comprehensive model of speech intensity regulation and they may have important implications for the development of new evaluation and treatment procedures for individuals with hypophonia related to PD.

  17. Adult-like processing of time-compressed speech by newborns: A NIRS study

    Directory of Open Access Journals (Sweden)

    Cécile Issard

    2017-06-01

    Full Text Available Humans can adapt to a wide range of variations in the speech signal, maintaining an invariant representation of the linguistic information it contains. Among them, adaptation to rapid or time-compressed speech has been well studied in adults, but the developmental origin of this capacity remains unknown. Does this ability depend on experience with speech (if yes, as heard in utero or as heard postnatally, with sounds in general or is it experience-independent? Using near-infrared spectroscopy, we show that the newborn brain can discriminate between three different compression rates: normal, i.e. 100% of the original duration, moderately compressed, i.e. 60% of original duration and highly compressed, i.e. 30% of original duration. Even more interestingly, responses to normal and moderately compressed speech are similar, showing a canonical hemodynamic response in the left temporoparietal, right frontal and right temporal cortex, while responses to highly compressed speech are inverted, showing a decrease in oxyhemoglobin concentration. These results mirror those found in adults, who readily adapt to moderately compressed, but not to highly compressed speech, showing that adaptation to time-compressed speech requires little or no experience with speech, and happens at an auditory, and not at a more abstract linguistic level.

  18. Cortical differentiation of speech and nonspeech sounds at 100 ms: implications for dyslexia.

    Science.gov (United States)

    Parviainen, Tiina; Helenius, Päivi; Salmelin, Riitta

    2005-07-01

    Neurophysiological measures indicate cortical sensitivity to speech sounds by 150 ms after stimulus onset. In this time window dyslexic subjects start to show abnormal cortical processing. We investigated whether phonetic analysis is reflected in the robust auditory cortical activation at approximately 100 ms (N100m), and whether dyslexic subjects show abnormal N100m responses to speech or nonspeech sounds. We used magnetoencephalography to record auditory responses of 10 normally reading and 10 dyslexic adults. The speech stimuli were synthetic Finnish speech sounds (/a/, /u/, /pa/, /ka/). The nonspeech stimuli were complex nonspeech sounds and simple sine wave tones, composed of the F1+F2+F3 and F2 formant frequencies of the speech sounds, respectively. All sounds evoked a prominent N100m response in the bilateral auditory cortices. The N100m activation was stronger to speech than nonspeech sounds in the left but not in the right auditory cortex, in both subject groups. The leftward shift of hemispheric balance for speech sounds is likely to reflect analysis at the phonetic level. In dyslexic subjects the overall interhemispheric amplitude balance and timing were altered for all sound types alike. Dyslexic individuals thus seem to have an unusual cortical organization of general auditory processing in the time window of speech-sensitive analysis.

  19. Park Play: a picture description task for assessing childhood motor speech disorders.

    Science.gov (United States)

    Patel, Rupal; Connaghan, Kathryn

    2014-08-01

    The purpose of this study was to develop a picture description task for eliciting connected speech from children with motor speech disorders. The Park Play scene is a child-friendly picture description task aimed at augmenting current assessment protocols for childhood motor speech disorders. The design process included a literature review to: (1) establish optimal design features for child assessment, (2) identify a set of evidence-based speech targets specifically tailored to tax the motor speech system, and (3) enhance current assessment tools. To establish proof of concept, five children (ages 4;3-11;1) with dysarthria or childhood apraxia of speech were audio-recorded while describing the Park Play scene. Feedback from the feasibility test informed iterative design modifications. Descriptive, segmental, and prosodic analyses revealed the task was effective in eliciting desired targets in a connected speech sample, thereby yielding additional information beyond the syllables, words, and sentences generally elicited through imitation during the traditional motor speech examination. Further discussion includes approaches to adapt the task for a variety of clinical needs.

  20. Persist in and Carry Forward the Integrative Medical Research——Speech at the 6th National General Congress of Chinese Association of Integrative Medicine by Academician CHEN Zhu,Minister of Ministry of Health

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    @@ The 6th National General Congress of Chinese Association of Integrative Medicine(CAIM)was convened at 19-20,April 2008 in Beijing.Academician CHEN Zhu,the minister of Ministry of Health indicated at the congress that the integration of Chinese and Westem medicine is very well in keeping with the situation of our countw and the general rule of development in medical science;and as a good integration of Chinese medicine and Western medicine,it is mutually beneficial and advantageous to both of them.Seeing the creativity shown in integrative medical investigation in theoretic and methodological sides,we should and must persist in and develop it.

  1. Private speech in preschool children: developmental stability and change, across-task consistency, and relations with classroom behaviour.

    Science.gov (United States)

    Winsler, Adam; De León, Jesus René; Wallace, Beverly A; Carlton, Martha P; Willson-Quayle, Angela

    2003-08-01

    This study examined (a) developmental stability and change in children's private speech during the preschool years, (b) across-task consistency in children's self-speech, and (c) across-setting relations between children's private speech in the laboratory and their behaviour at home and in the preschool classroom. A group of 32 normally developing three- and four-year-old children was observed twice (six month interobservation interval) while engaging in the same individual problem-solving tasks. Measures of private speech were collected from transcribed videotapes. Naturalistic observations of children's behaviour in the preschool classroom were conducted, and teachers and parents reported on children's behaviour at home and school. Individual differences in preschool children's private speech use were generally stable across tasks and time and related to children's observed and reported behaviour at school and home. Children whose private speech was more partially internalized had fewer externalizing behaviour problems and better social skills as reported by parents and teachers. Children whose private speech was largely task-irrelevant engaged in less goal-directed behaviour in the classroom, expressed more negative affect in the classroom, and rated as having poorer social skills and more behaviour problems. Developmental change occurred during the preschool years in children's use and internalization of private speech during problem-solving in the form of a reduction over time in the total number of social speech utterances, a decrease in the average number of words per utterance, and an increase in the proportion of private speech that was partially internalized.

  2. Production and perception of clear speech

    Science.gov (United States)

    Bradlow, Ann R.

    2003-04-01

    When a talker believes that the listener is likely to have speech perception difficulties due to a hearing loss, background noise, or a different native language, she or he will typically adopt a clear speaking style. Previous research has established that, with a simple set of instructions to the talker, ``clear speech'' can be produced by most talkers under laboratory recording conditions. Furthermore, there is reliable evidence that adult listeners with either impaired or normal hearing typically find clear speech more intelligible than conversational speech. Since clear speech production involves listener-oriented articulatory adjustments, a careful examination of the acoustic-phonetic and perceptual consequences of the conversational-to-clear speech transformation can serve as an effective window into talker- and listener-related forces in speech communication. Furthermore, clear speech research has considerable potential for the development of speech enhancement techniques. After reviewing previous and current work on the acoustic properties of clear versus conversational speech, this talk will present recent data from a cross-linguistic study of vowel production in clear speech and a cross-population study of clear speech perception. Findings from these studies contribute to an evolving view of clear speech production and perception as reflecting both universal, auditory and language-specific, phonological contrast enhancement features.

  3. Contextual variability during speech-in-speech recognition.

    Science.gov (United States)

    Brouwer, Susanne; Bradlow, Ann R

    2014-07-01

    This study examined the influence of background language variation on speech recognition. English listeners performed an English sentence recognition task in either "pure" background conditions in which all trials had either English or Dutch background babble or in mixed background conditions in which the background language varied across trials (i.e., a mix of English and Dutch or one of these background languages mixed with quiet trials). This design allowed the authors to compare performance on identical trials across pure and mixed conditions. The data reveal that speech-in-speech recognition is sensitive to contextual variation in terms of the target-background language (mis)match depending on the relative ease/difficulty of the test trials in relation to the surrounding trials.

  4. ``The Boundaries of Nature: Special and general relativity and quantum mechanics, a second course in physics:'' Edwin F. Taylor's acceptance speech for the 1998 Oersted Medal presented by the American Association of Physics Teachers, 6 January 1998

    Science.gov (United States)

    Taylor, Edwin F.

    1998-05-01

    Public hunger for relativity and quantum mechanics is insatiable, and we should use it selectively but shamelessly to attract students, most of whom will not become physics majors, but all of whom can experience "deep physics." Science, engineering, and mathematics students, indeed anyone comfortable with calculus, can now delve deeply into special and general relativity and quantum mechanics. Big chunks of general relativity require only calculus if one starts with the metric describing spacetime around Earth or black hole. Expressions for energy and angular momentum follow, along with orbit predictions for particles and light. Feynman's Sum Over Paths quantum theory simply commands the electron: Explore all paths. Students can model this command with the computer, pointing and clicking to tell the electron which paths to explore; wave functions and bound states arise naturally. A second full-year course in physics covering special relativity, general relativity, and quantum mechanics would have wide appeal—and might also lead to significant advancements in upper-level courses for the physics major.

  5. 46 CFR 197.328 - PVHO-General.

    Science.gov (United States)

    2010-10-01

    ... GENERAL PROVISIONS Commercial Diving Operations Equipment § 197.328 PVHO—General. (a) Each PVHO... controls; (17) Have a speech unscrambler when used with mixed-gas; (18) Have interior electrical...

  6. Upgrading-General Trend of Textile Industrial Development in China

    Institute of Scientific and Technical Information of China (English)

    DU Yuzhou

    2007-01-01

    @@ The leitmotif of my speech today: Industrial Upgrading - General Trend of Develoment of Chinese Textile Industry. Centering on this general trend. I will elaborate on the following four viewpoints:

  7. Oral and Hand Movement Speeds Are Associated with Expressive Language Ability in Children with Speech Sound Disorder

    Science.gov (United States)

    Peter, Beate

    2012-01-01

    This study tested the hypothesis that children with speech sound disorder have generalized slowed motor speeds. It evaluated associations among oral and hand motor speeds and measures of speech (articulation and phonology) and language (receptive vocabulary, sentence comprehension, sentence imitation), in 11 children with moderate to severe SSD…

  8. Long-Term Outcomes of Speech Therapy for Seven Adolescents with Visual Feedback Technologies: Ultrasound and Electropalatography

    Science.gov (United States)

    Bacsfalvi, Penelope; Bernhardt, Barbara May

    2011-01-01

    This follow-up study investigated the speech production of seven adolescents and young adults with hearing impairment 2-4 years after speech intervention with ultrasound and electropalatography. Perceptual judgments by seven expert listeners revealed that five out of seven speakers either continued to generalize post-treatment or maintained their…

  9. Long-Term Outcomes of Speech Therapy for Seven Adolescents with Visual Feedback Technologies: Ultrasound and Electropalatography

    Science.gov (United States)

    Bacsfalvi, Penelope; Bernhardt, Barbara May

    2011-01-01

    This follow-up study investigated the speech production of seven adolescents and young adults with hearing impairment 2-4 years after speech intervention with ultrasound and electropalatography. Perceptual judgments by seven expert listeners revealed that five out of seven speakers either continued to generalize post-treatment or maintained their…

  10. Oral and Hand Movement Speeds Are Associated with Expressive Language Ability in Children with Speech Sound Disorder

    Science.gov (United States)

    Peter, Beate

    2012-01-01

    This study tested the hypothesis that children with speech sound disorder have generalized slowed motor speeds. It evaluated associations among oral and hand motor speeds and measures of speech (articulation and phonology) and language (receptive vocabulary, sentence comprehension, sentence imitation), in 11 children with moderate to severe SSD…

  11. Risk of Reading Difficulty among Students with a History of Speech or Language Impairment: Implications for Student Support Teams

    Science.gov (United States)

    Zipoli, Richard P., Jr.; Merritt, Donna D.

    2017-01-01

    Many students with a history of speech or language impairment have an elevated risk of reading difficulty. Specific subgroups of these students remain at risk of reading problems even after clinical manifestations of a speech or language disorder have diminished. These students may require reading intervention within a general education system of…

  12. A PILOT STUDY COMPARING THE BLOCK SYSTEM AND THE INTERMITTENT SYSTEM OF SCHEDULING SPEECH CORRECTION CASES IN THE PUBLIC SCHOOLS.

    Science.gov (United States)

    WEAVER, JOHN B.; WOLLERSHEIM, JANET P.

    TO DETERMINE THE MOST EFFICIENT USES OF THE PUBLIC SCHOOL SPEECH CORRECTIONIST'S SKILLS AND TIME, A STUDY WAS UNDERTAKEN TO INVESTIGATE THE EFFECTIVENESS OF THE INTERMITTENT SYSTEM AND THE BLOCK SYSTEM OF SCHEDULING SPEECH CASES. WITH THE INTERMITTENT SYSTEM THE CORRECTIONIST IS ASSIGNED TO A NUMBER OF SCHOOLS AND GENERALLY SEES CHILDREN TWICE A…

  13. DISODERS IN THE SPEECH DEVELOPMENT EARLY DETECTION AND TREATMENT

    OpenAIRE

    Vasilka RAZMOVSKA; Vasilka DOLEVSKA

    1998-01-01

    Introduction;· Causes for disorders in the speech development;· Disorders in the speech development, mental retardation and treatment;· Disorders in the speech development, hearing remainders and treatment;· Autism and disorders in the speech development;· Bilingual and disordered speech development;· Speech of neglected children

  14. Speech Therapy for Children with Cleft Lip and Palate Using a Community-Based Speech Therapy Model with Speech Assistants.

    Science.gov (United States)

    Makarabhirom, Kalyanee; Prathanee, Benjamas; Suphawatjariyakul, Ratchanee; Yoodee, Phanomwan

    2015-08-01

    Evaluate the speech services using a Community-Based Speech Therapy model by trained speech assistants (SAs) on the improvement of articulation in cleft palate children. Seventeen children with repaired cleft palates who lived in Chiang Rai and Phayao provinces were registered to the camp. They received speech therapy with a 4-day intensive camp and five follow-up camps at Chiang Rai's The Young Men's Christian Association (YMCA). Eight speech assistants (SAs) were trained to correct articulation errors with specific modeling by the speech-language pathologists (SLPs). SAs encouraged family members to stimulate their children every day with speech exercise at home. Each camp was covered with a main speech therapy and others supported by the multidisciplinary team, as well as, discussion among SLPs, SAs and the care givers for feedback or difficulties. Results showed a sufficient method for treating persistent speech disorders associated with cleft palate. Perceptual analyses presented significant improvement of misarticulation sounds both word and sentence levels after speech camp (mean difference = 1.5, 95% confidence interval = 0.5-2.5, p-value Speech Therapy model is a valid and efficient method for providing speech therapy in cleft palate children.

  15. Connected Speech Processes in Developmental Speech Impairment: Observations from an Electropalatographic Perspective

    Science.gov (United States)

    Howard, Sara

    2004-01-01

    This paper uses a combination of perceptual and electropalatographic (EPG) analysis to explore the presence and characteristics of connected speech processes in the speech output of five older children with developmental speech impairments. Each of the children is shown to use some processes typical of normal speech production but also to use a…

  16. Speech Perception and Short-Term Memory Deficits in Persistent Developmental Speech Disorder

    Science.gov (United States)

    Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

    2006-01-01

    Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech…

  17. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    Science.gov (United States)

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  18. The treatment of apraxia of speech : Speech and music therapy, an innovative joint effort

    NARCIS (Netherlands)

    Hurkmans, Josephus Johannes Stephanus

    2016-01-01

    Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called S

  19. The Role of Visual Speech Information in Supporting Perceptual Learning of Degraded Speech

    Science.gov (United States)

    Wayne, Rachel V.; Johnsrude, Ingrid S.

    2012-01-01

    Following cochlear implantation, hearing-impaired listeners must adapt to speech as heard through their prosthesis. Visual speech information (VSI; the lip and facial movements of speech) is typically available in everyday conversation. Here, we investigate whether learning to understand a popular auditory simulation of speech as transduced by a…

  20. The Role of Visual Speech Information in Supporting Perceptual Learning of Degraded Speech

    Science.gov (United States)

    Wayne, Rachel V.; Johnsrude, Ingrid S.

    2012-01-01

    Following cochlear implantation, hearing-impaired listeners must adapt to speech as heard through their prosthesis. Visual speech information (VSI; the lip and facial movements of speech) is typically available in everyday conversation. Here, we investigate whether learning to understand a popular auditory simulation of speech as transduced by a…

  1. Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

    Science.gov (United States)

    Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

    2014-01-01

    Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

  2. Motor Speech Phenotypes of Frontotemporal Dementia, Primary Progressive Aphasia, and Progressive Apraxia of Speech

    Science.gov (United States)

    Poole, Matthew L.; Brodtmann, Amy; Darby, David; Vogel, Adam P.

    2017-01-01

    Purpose: Our purpose was to create a comprehensive review of speech impairment in frontotemporal dementia (FTD), primary progressive aphasia (PPA), and progressive apraxia of speech in order to identify the most effective measures for diagnosis and monitoring, and to elucidate associations between speech and neuroimaging. Method: Speech and…

  3. The treatment of apraxia of speech : Speech and music therapy, an innovative joint effort

    NARCIS (Netherlands)

    Hurkmans, Josephus Johannes Stephanus

    2016-01-01

    Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called

  4. The treatment of apraxia of speech : Speech and music therapy, an innovative joint effort

    NARCIS (Netherlands)

    Hurkmans, Josephus Johannes Stephanus

    2016-01-01

    Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called S

  5. Connected Speech Processes in Developmental Speech Impairment: Observations from an Electropalatographic Perspective

    Science.gov (United States)

    Howard, Sara

    2004-01-01

    This paper uses a combination of perceptual and electropalatographic (EPG) analysis to explore the presence and characteristics of connected speech processes in the speech output of five older children with developmental speech impairments. Each of the children is shown to use some processes typical of normal speech production but also to use a…

  6. Quick Statistics about Voice, Speech, and Language

    Science.gov (United States)

    ... here Home » Health Info » Statistics and Epidemiology Quick Statistics About Voice, Speech, Language Voice, Speech, Language, and ... no 205. Hyattsville, MD: National Center for Health Statistics. 2015. Hoffman HJ, Li C-M, Losonczy K, ...

  7. Modeling speech intelligibility in adverse conditions

    DEFF Research Database (Denmark)

    Dau, Torsten

    2012-01-01

    by the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII......) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting...... the intelligibility of reverberant speech as well as noisy speech processed by spectral subtraction. However, the sEPSM cannot account for speech subjected to phase jitter, a condition in which the spectral structure of speech is destroyed, while the broadband temporal envelope is kept largely intact. In contrast...

  8. Represented Speech in Qualitative Health Research

    DEFF Research Database (Denmark)

    Musaeus, Peter

    2017-01-01

    Represented speech refers to speech where we reference somebody. Represented speech is an important phenomenon in everyday conversation, health care communication, and qualitative research. This case will draw first from a case study on physicians’ workplace learning and second from a case study...... on nurses’ apprenticeship learning. The aim of the case is to guide the qualitative researcher to use own and others’ voices in the interview and to be sensitive to represented speech in everyday conversation. Moreover, reported speech matters to health professionals who aim to represent the voice...... of their patients. Qualitative researchers and students might learn to encourage interviewees to elaborate different voices or perspectives. Qualitative researchers working with natural speech might pay attention to how people talk and use represented speech. Finally, represented speech might be relevant...

  9. Spectral Psychoanalysis of Speech under Strain | Sharma ...

    African Journals Online (AJOL)

    Spectral Psychoanalysis of Speech under Strain. ... Different voice features from the speech signal to be influenced by strain are: loudness, fundamental frequency, jitter, zero-crossing rate, ... EMAIL FREE FULL TEXT EMAIL FREE FULL TEXT

  10. Experimental study on phase perception in speech

    Institute of Scientific and Technical Information of China (English)

    BU Fanliang; CHEN Yanpu

    2003-01-01

    As the human ear is dull to the phase in speech, little attention has been paid tophase information in speech coding. In fact, the speech perceptual quality may be degeneratedif the phase distortion is very large. The perceptual effect of the STFT (Short time Fouriertransform) phase spectrum is studied by auditory subjective hearing tests. Three main con-clusions are (1) If the phase information is neglected completely, the subjective quality of thereconstructed speech may be very poor; (2) Whether the neglected phase is in low frequencyband or high frequency band, the difference from the original speech can be perceived by ear;(3) It is very difficult for the human ear to perceive the difference of speech quality betweenoriginal speech and reconstructed speech while the phase quantization step size is shorter thanπ/7.

  11. Speech input interfaces for anaesthesia records

    DEFF Research Database (Denmark)

    Alapetite, Alexandre; Andersen, Henning Boje

    2009-01-01

    Speech recognition as a medical transcript tool is now common in hospitals and is steadily increasing......Speech recognition as a medical transcript tool is now common in hospitals and is steadily increasing...

  12. Speech Recognition: Its Place in Business Education.

    Science.gov (United States)

    Szul, Linda F.; Bouder, Michele

    2003-01-01

    Suggests uses of speech recognition devices in the classroom for students with disabilities. Compares speech recognition software packages and provides guidelines for selection and teaching. (Contains 14 references.) (SK)

  13. What Is Language? What Is Speech?

    Science.gov (United States)

    ... request did not produce results) Speech is the verbal means of communicating. Speech consists of the following: ... questions and requests for information from members and non-members. Available 8:30 a.m.–5:00 ...

  14. A NOVEL APPROACH TO STUTTERED SPEECH CORRECTION

    Directory of Open Access Journals (Sweden)

    Alim Sabur Ajibola

    2016-06-01

    Full Text Available Stuttered speech is a dysfluency rich speech, more prevalent in males than females. It has been associated with insufficient air pressure or poor articulation, even though the root causes are more complex. The primary features include prolonged speech and repetitive speech, while some of its secondary features include, anxiety, fear, and shame. This study used LPC analysis and synthesis algorithms to reconstruct the stuttered speech. The results were evaluated using cepstral distance, Itakura-Saito distance, mean square error, and likelihood ratio. These measures implied perfect speech reconstruction quality. ASR was used for further testing, and the results showed that all the reconstructed speech samples were perfectly recognized while only three samples of the original speech were perfectly recognized.

  15. STUDY ON PHASE PERCEPTION IN SPEECH

    Institute of Scientific and Technical Information of China (English)

    Tong Ming; Bian Zhengzhong; Li Xiaohui; Dai Qijun; Chen Yanpu

    2003-01-01

    The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show that: (1) If the envelop of the reconstructed speech signalis unchanged, there is indistinctive auditory perception between the original speech and thereconstructed speech; (2) The auditory perception effect of the reconstructed speech mainly lieson the amplitude of the derivative of the additive phase; (3) td is the maximum relative time shiftbetween different frequency components of the reconstructed speech signal. The speech qualityis excellent while td <10ms; good while 10ms< td <20ms; common while 20ms< td <35ms, andpoor while td >35ms.

  16. Hate speech, report 1. Research on the nature and extent of hate speech

    OpenAIRE

    Nadim, Marjan; Fladmoe, Audun

    2016-01-01

    The purpose of this report is to gather research-based knowledge concerning: • the extent of online hate speech • which groups in society are particularly subjected to online hate speech • who produces hate speech, and what motivates them Hate speech is commonly understood as any speech that is persecutory, degrading or discriminatory on grounds of the recipient’s minority group identity. To be defined as hate speech, the speech must be conveyed publicly or in the presence of others and be di...

  17. Spatial localization of speech segments

    DEFF Research Database (Denmark)

    Karlsen, Brian Lykkegaard

    1999-01-01

    angle the target is likely to have originated from. The model is trained on the experimental data. On the basis of the experimental results, it is concluded that the human ability to localize speech segments in adverse noise depends on the speech segment as well as its point of origin in space...... the task of the experiment. The psychoacoustical experiment used naturally-spoken Danish consonant-vowel combinations as targets presented in diffuse speech-shaped noise at a peak SNR of -10 dB. The subjects were normal hearing persons. The experiment took place in an anechoic chamber where eight...... loudspeakers were suspended so that they surrounded the subjects in the horizontal plane. The subjects were required to push a button on a pad indicating where they had localized the target to in the horizontal plane. The response pad had twelve buttons arranged uniformly in a circle and two further buttons so...

  18. MUSAN: A Music, Speech, and Noise Corpus

    OpenAIRE

    Snyder, David; Chen, Guoguo; Povey, Daniel

    2015-01-01

    This report introduces a new corpus of music, speech, and noise. This dataset is suitable for training models for voice activity detection (VAD) and music/speech discrimination. Our corpus is released under a flexible Creative Commons license. The dataset consists of music from several genres, speech from twelve languages, and a wide assortment of technical and non-technical noises. We demonstrate use of this corpus for music/speech discrimination on Broadcast news and VAD for speaker identif...

  19. Relative Contributions of the Dorsal vs. Ventral Speech Streams to Speech Perception are Context Dependent: a lesion study

    Directory of Open Access Journals (Sweden)

    Corianne Rogalsky

    2014-04-01

    Full Text Available The neural basis of speech perception has been debated for over a century. While it is generally agreed that the superior temporal lobes are critical for the perceptual analysis of speech, a major current topic is whether the motor system contributes to speech perception, with several conflicting findings attested. In a dorsal-ventral speech stream framework (Hickok & Poeppel 2007, this debate is essentially about the roles of the dorsal versus ventral speech processing streams. A major roadblock in characterizing the neuroanatomy of speech perception is task-specific effects. For example, much of the evidence for dorsal stream involvement comes from syllable discrimination type tasks, which have been found to behaviorally doubly dissociate from auditory comprehension tasks (Baker et al. 1981. Discrimination task deficits could be a result of difficulty perceiving the sounds themselves, which is the typical assumption, or it could be a result of failures in temporary maintenance of the sensory traces, or the comparison and/or the decision process. Similar complications arise in perceiving sentences: the extent of inferior frontal (i.e. dorsal stream activation during listening to sentences increases as a function of increased task demands (Love et al. 2006. Another complication is the stimulus: much evidence for dorsal stream involvement uses speech samples lacking semantic context (CVs, non-words. The present study addresses these issues in a large-scale lesion-symptom mapping study. 158 patients with focal cerebral lesions from the Mutli-site Aphasia Research Consortium underwent a structural MRI or CT scan, as well as an extensive psycholinguistic battery. Voxel-based lesion symptom mapping was used to compare the neuroanatomy involved in the following speech perception tasks with varying phonological, semantic, and task loads: (i two discrimination tasks of syllables (non-words and words, respectively, (ii two auditory comprehension tasks

  20. THE ONTOGENESIS OF SPEECH DEVELOPMENT

    Directory of Open Access Journals (Sweden)

    T. E. Braudo

    2017-01-01

    Full Text Available The purpose of this article is to acquaint the specialists, working with children having developmental disorders, with age-related norms for speech development. Many well-known linguists and psychologists studied speech ontogenesis (logogenesis. Speech is a higher mental function, which integrates many functional systems. Speech development in infants during the first months after birth is ensured by the innate hearing and emerging ability to fix the gaze on the face of an adult. Innate emotional reactions are also being developed during this period, turning into nonverbal forms of communication. At about 6 months a baby starts to pronounce some syllables; at 7–9 months – repeats various sounds combinations, pronounced by adults. At 10–11 months a baby begins to react on the words, referred to him/her. The first words usually appear at an age of 1 year; this is the start of the stage of active speech development. At this time it is acceptable, if a child confuses or rearranges sounds, distorts or misses them. By the age of 1.5 years a child begins to understand abstract explanations of adults. Significant vocabulary enlargement occurs between 2 and 3 years; grammatical structures of the language are being formed during this period (a child starts to use phrases and sentences. Preschool age (3–7 y. o. is characterized by incorrect, but steadily improving pronunciation of sounds and phonemic perception. The vocabulary increases; abstract speech and retelling are being formed. Children over 7 y. o. continue to improve grammar, writing and reading skills. The described stages may not have strict age boundaries, as soon as they are dependent not only on environment, but also on the child’s mental constitution, heredity and character.

  1. Interventions for Speech Sound Disorders in Children

    Science.gov (United States)

    Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.

    2010-01-01

    With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…

  2. Application of wavelets in speech processing

    CERN Document Server

    Farouk, Mohamed Hesham

    2014-01-01

    This book provides a survey on wide-spread of employing wavelets analysis  in different applications of speech processing. The author examines development and research in different application of speech processing. The book also summarizes the state of the art research on wavelet in speech processing.

  3. Speech and Debate as Civic Education

    Science.gov (United States)

    Hogan, J. Michael; Kurr, Jeffrey A.; Johnson, Jeremy D.; Bergmaier, Michael J.

    2016-01-01

    In light of the U.S. Senate's designation of March 15, 2016 as "National Speech and Debate Education Day" (S. Res. 398, 2016), it only seems fitting that "Communication Education" devote a special section to the role of speech and debate in civic education. Speech and debate have been at the heart of the communication…

  4. Epoch-based analysis of speech signals

    Indian Academy of Sciences (India)

    B Yegnanarayana; Suryakanth V Gangashetty

    2011-10-01

    Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis. Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations, formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated.

  5. Recovering Asynchronous Watermark Tones from Speech

    Science.gov (United States)

    2009-04-01

    Audio steganography for covert data transmission by impercep- tible tone insertion,” Proceedings Communications Sys- tems and Applications, IEEE, vol. 4, pp. 1647–1653, 2004. 1408 ...by a comfortable margin. Index Terms— Speech Watermarking, Hidden Tones, Speech Steganography , Speech Data Hiding 1. BACKGROUND Imperceptibly

  6. Cognitive Functions in Childhood Apraxia of Speech

    Science.gov (United States)

    Nijland, Lian; Terband, Hayo; Maassen, Ben

    2015-01-01

    Purpose: Childhood apraxia of speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional problems. Method: Cognitive functions were investigated…

  7. Audiovisual Asynchrony Detection in Human Speech

    Science.gov (United States)

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  8. Speech-Song Interface of Chinese Speakers

    Science.gov (United States)

    Mang, Esther

    2007-01-01

    Pitch is a psychoacoustic construct crucial in the production and perception of speech and songs. This article is an exploration of the interface of speech and song performance of Chinese speakers. Although parallels might be drawn from the prosodic and sound structures of the linguistic and musical systems, perceiving and producing speech and…

  9. Syllable Structure in Dysfunctional Portuguese Children's Speech

    Science.gov (United States)

    Candeias, Sara; Perdigao, Fernando

    2010-01-01

    The goal of this work is to investigate whether children with speech dysfunctions (SD) show a deficit in planning some Portuguese syllable structures (PSS) in continuous speech production. Knowledge of which aspects of speech production are affected by SD is necessary for efficient improvement in the therapy techniques. The case-study is focused…

  10. Factors of Politeness and Indirect Speech Acts

    Institute of Scientific and Technical Information of China (English)

    杨雪梅

    2016-01-01

    Polite principle is influenced deeply by a nation's history,culture,custom and so on,therefor different countries have different understandings and expressions of politeness and indirect speech acts.This paper shows some main factors influencing a polite speech.Through this article,readers can comprehensively know about politeness and indirect speech acts.

  11. Freedom of Speech as an Academic Discipline.

    Science.gov (United States)

    Haiman, Franklyn S.

    Since its formation, the Speech Communication Association's Committee on Freedom of Speech has played a critical leadership role in course offerings, research efforts, and regional activities in freedom of speech. Areas in which research has been done and in which further research should be carried out include: historical-critical research, in…

  12. Hate Speech and the First Amendment.

    Science.gov (United States)

    Rainey, Susan J.; Kinsler, Waren S.; Kannarr, Tina L.; Reaves, Asa E.

    This document is comprised of California state statutes, federal legislation, and court litigation pertaining to hate speech and the First Amendment. The document provides an overview of California education code sections relating to the regulation of speech; basic principles of the First Amendment; government efforts to regulate hate speech,…

  13. Liberalism, Speech Codes, and Related Problems.

    Science.gov (United States)

    Sunstein, Cass R.

    1993-01-01

    It is argued that universities are pervasively and necessarily engaged in regulation of speech, which complicates many existing claims about hate speech codes on campus. The ultimate test is whether the restriction on speech is a legitimate part of the institution's mission, commitment to liberal education. (MSE)

  14. Hate Speech on Campus: A Practical Approach.

    Science.gov (United States)

    Hogan, Patrick

    1997-01-01

    Looks at arguments concerning hate speech and speech codes on college campuses, arguing that speech codes are likely to be of limited value in achieving civil rights objectives, and that there are alternatives less harmful to civil liberties and more successful in promoting civil rights. Identifies specific goals, and considers how restriction of…

  15. Cognitive functions in Childhood Apraxia of Speech

    NARCIS (Netherlands)

    Nijland, L.; Terband, H.; Maassen, B.

    2015-01-01

    Purpose: Childhood Apraxia of Speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional proble

  16. Speech perception of noise with binary gains

    DEFF Research Database (Denmark)

    Wang, DeLiang; Kjems, Ulrik; Pedersen, Michael Syskind

    2008-01-01

    For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed...

  17. Audiovisual Asynchrony Detection in Human Speech

    Science.gov (United States)

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  18. Development of binaural speech transmission index

    NARCIS (Netherlands)

    Wijngaarden, S.J. van; Drullman, R.

    2006-01-01

    Although the speech transmission index (STI) is a well-accepted and standardized method for objective prediction of speech intelligibility in a wide range of-environments and applications, it is essentially a monaural model. Advantages of binaural hearing to the intelligibility of speech are disrega

  19. Current trends in multilingual speech processing

    Indian Academy of Sciences (India)

    Hervé Bourlard; John Dines; Mathew Magimai-Doss; Philip N Garner; David Imseng; Petr Motlicek; Hui Liang; Lakshmi Saheer; Fabio Valente

    2011-10-01

    In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processing.

  20. Audiovisual Speech Integration and Lipreading in Autism

    Science.gov (United States)

    Smith, Elizabeth G.; Bennetto, Loisa

    2007-01-01

    Background: During speech perception, the ability to integrate auditory and visual information causes speech to sound louder and be more intelligible, and leads to quicker processing. This integration is important in early language development, and also continues to affect speech comprehension throughout the lifespan. Previous research shows that…