Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.
Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…
Rosen, Stuart; Souza, Pamela; Ekelund, Caroline; Majeed, Arooj A
Some of the most common interfering background sounds a listener experiences are the sounds of other talkers. In Experiment 1, recognition for natural Institute of Electrical and Electronics Engineers (IEEE) sentences was measured in normal-hearing adults at two fixed signal-to-noise ratios (SNRs) in 16 backgrounds with the same long-term spectrum: unprocessed speech babble (1, 2, 4, 8, and 16 talkers), noise-vocoded versions of the babbles (12 channels), noise modulated with the wide-band envelope of the speech babbles, and unmodulated noise. All talkers were adult males. For a given number of talkers, natural speech was always the most effective masker. The greatest changes in performance occurred as the number of talkers in the maskers increased from 1 to 2 or 4, with small changes thereafter. In Experiment 2, the same targets and maskers (1, 2, and 16 talkers) were used to measure speech reception thresholds (SRTs) adaptively. Periodicity in the target was also manipulated by noise-vocoding, which led to considerably higher SRTs. The greatest masking effect always occurred for the masker type most similar to the target, while the effects of the number of talkers were generally small. Implications are drawn with reference to glimpsing, informational vs energetic masking, overall SNR, and aspects of periodicity.
Liu, Chang; Azimi, Behnam; Tahmina, Qudsia; Hu, Yi
This study investigated the contribution of low-frequency harmonics to identifying Mandarin tones in natural and vocoded speech in quiet and noisy conditions. Results showed that low-frequency harmonics of natural speech led to highly accurate tone identification; however, for vocoded speech, low-frequency harmonics yielded lower tone identification than stimuli with full harmonics, except for tone 4. Analysis of the correlation between tone accuracy and the amplitude-F0 correlation index suggested that "more" speech contents (i.e., more harmonics) did not necessarily yield better tone recognition for vocoded speech, especially when the amplitude contour of the signals did not co-vary with the F0 contour.
Full Text Available Noise-vocoded speech is commonly used to simulate the sensation after cochlear implantation as it consists of spectrally degraded speech. High individual variability exists in learning to understand both noise-vocoded speech and speech perceived through a cochlear implant (CI. This variability is partly ascribed to differing cognitive abilities like working memory, verbal skills or attention. Although clinically highly relevant, up to now, no consensus has been achieved about which cognitive factors exactly predict the intelligibility of speech in noise-vocoded situations in healthy subjects or in patients after cochlear implantation. We aimed to establish a test battery that can be used to predict speech understanding in patients prior to receiving a CI. Young and old healthy listeners completed a noise-vocoded speech test in addition to cognitive tests tapping on verbal memory, working memory, lexicon and retrieval skills as well as cognitive flexibility and attention. Partial-least-squares analysis revealed that six variables were important to significantly predict vocoded-speech performance. These were the ability to perceive visually degraded speech tested by the Text Reception Threshold, vocabulary size assessed with the Multiple Choice Word Test, working memory gauged with the Operation Span Test, verbal learning and recall of the Verbal Learning and Retention Test and task switching abilities tested by the Comprehensive Trail-Making Test. Thus, these cognitive abilities explain individual differences in noise-vocoded speech understanding and should be considered when aiming to predict hearing-aid outcome.
Full Text Available A vocoder technique is described in which speech signals are represented by their short-time phase and amplitude spectra. A complete transmission system utilizing this approach is simulated on a digital computer. The encoding method leads to an economy in transmission bandwidth and to a means for time compression and expansion of speech signals.
Roman, Adrienne S; Pisoni, David B; Kronenberger, William G; Faulkner, Kathleen F
Noise-vocoded speech is a valuable research tool for testing experimental hypotheses about the effects of spectral degradation on speech recognition in adults with normal hearing (NH). However, very little research has utilized noise-vocoded speech with children with NH. Earlier studies with children with NH focused primarily on the amount of spectral information needed for speech recognition without assessing the contribution of neurocognitive processes to speech perception and spoken word recognition. In this study, we first replicated the seminal findings reported by ) who investigated effects of lexical density and word frequency on noise-vocoded speech perception in a small group of children with NH. We then extended the research to investigate relations between noise-vocoded speech recognition abilities and five neurocognitive measures: auditory attention (AA) and response set, talker discrimination, and verbal and nonverbal short-term working memory. Thirty-one children with NH between 5 and 13 years of age were assessed on their ability to perceive lexically controlled words in isolation and in sentences that were noise-vocoded to four spectral channels. Children were also administered vocabulary assessments (Peabody Picture Vocabulary test-4th Edition and Expressive Vocabulary test-2nd Edition) and measures of AA (NEPSY AA and response set and a talker discrimination task) and short-term memory (visual digit and symbol spans). Consistent with the findings reported in the original ) study, we found that children perceived noise-vocoded lexically easy words better than lexically hard words. Words in sentences were also recognized better than the same words presented in isolation. No significant correlations were observed between noise-vocoded speech recognition scores and the Peabody Picture Vocabulary test-4th Edition using language quotients to control for age effects. However, children who scored higher on the Expressive Vocabulary test-2nd Edition
William L Schuerman
Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.
Roman, Adrienne S.; Pisoni, David B.; Kronenberger, William G.; Faulkner, Kathleen F.
Objectives Noise-vocoded speech is a valuable research tool for testing experimental hypotheses about the effects of spectral-degradation on speech recognition in adults with normal hearing (NH). However, very little research has utilized noise-vocoded speech with children with NH. Earlier studies with children with NH focused primarily on the amount of spectral information needed for speech recognition without assessing the contribution of neurocognitive processes to speech perception and spoken word recognition. In this study, we first replicated the seminal findings reported by Eisenberg et al. (2002) who investigated effects of lexical density and word frequency on noise-vocoded speech perception in a small group of children with NH. We then extended the research to investigate relations between noise-vocoded speech recognition abilities and five neurocognitive measures: auditory attention and response set, talker discrimination and verbal and nonverbal short-term working memory. Design Thirty-one children with NH between 5 and 13 years of age were assessed on their ability to perceive lexically controlled words in isolation and in sentences that were noise-vocoded to four spectral channels. Children were also administered vocabulary assessments (PPVT-4 and EVT-2) and measures of auditory attention (NEPSY Auditory Attention (AA) and Response Set (RS) and a talker discrimination task (TD)) and short-term memory (visual digit and symbol spans). Results Consistent with the findings reported in the original Eisenberg et al. (2002) study, we found that children perceived noise-vocoded lexically easy words better than lexically hard words. Words in sentences were also recognized better than the same words presented in isolation. No significant correlations were observed between noise-vocoded speech recognition scores and the PPVT-4 using language quotients to control for age effects. However, children who scored higher on the EVT-2 recognized lexically easy words
Megnin-Viggars, Odette; Goswami, Usha
Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…
Full Text Available For a broad range of sound transformations, quality is measured according to the common expectation about the result: if a male’s voice has to be changed in a female’s one, there exists a common reference for the perceptive evaluation of the result; the same holds if an instrumental sound has to be made longer, or shorter. Following the argument in Röbel, “Between Physics and Perception: Signal Models for High Level Audio Processing”, a fundamental requirement for these transformation algorithms is their need of signal models that are strongly linked to perceptually relevant physical properties of the sound source. This paper is a short survey about the phase vocoder technique, together with its extensions and improvements relying on appropriate sound models, which have led to high level audio processing algorithms.
Ng, L C; Holzrichter, J F; Larson, P E
Low-power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference . By combining these data with the corresponding acoustic signal, we've demonstrated an almost 10-fold bandwidth reduction in speech compression, compared to a standard 2.4 kbps LPC10 protocol used in the STU-III (Secure Terminal Unit, third generation) telephone. This paper describes a potential EM sensor/acoustic based vocoder implementation
Van der Burg, Erik; Goodbourn, Patrick T
The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Thordis Marisa Neger
Full Text Available Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech.In the present study, 73 older adults (aged over 60 years and 60 younger adults (aged between 18 and 30 years performed a visual artificial grammar learning task and were presented with sixty meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory and processing speed. Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly.
Ries, Stephanie; Janssen, Niels; Dufau, Stephane; Alario, F.-Xavier; Burle, Boris
The concept of "monitoring" refers to our ability to control our actions on-line. Monitoring involved in speech production is often described in psycholinguistic models as an inherent part of the language system. We probed the specificity of speech monitoring in two psycholinguistic experiments where electroencephalographic activities were…
Drijvers, L.; Özyürek, A.
Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech
Drijvers, Linda; Ozyurek, Asli
Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…
Full Text Available The Phase Vocoder plays a central role in sound analysis and synthesis, allowing us to represent a sound signal in both time and frequency, similar to a music score – but possibly at much finer time and frequency scales – describing the evolution of sound events. According to the uncertainty principle, time and frequency are not independent variables so that any time-frequency representation is the result of a compromise between time and frequency resolutions, the product of which cannot be smaller than a given constant. Therefore, finer frequency resolution can only be achieved with coarser time resolution and, similarly, finer time resolution results in coarser frequency resolution.While most of the conventional methods for time-frequency representations are based on uniform time and uniform frequency resolutions, perception and physical characteristics of sound signals suggest the need for nonuniform analysis and synthesis. As the results of psycho-acoustic research show, human hearing is naturally organized in nonuniform frequency bands. On the physical side, the sounds of percussive instruments as well as piano in the low register, show partials whose frequencies are not uniformly spaced, as opposed to the uniformly spaced partial frequencies found in harmonic sounds. Moreover, the different characteristics of sound signals at the onset transients with respect to stationary segments suggest the need for nonuniform time resolution. In the effort to exploit the time-frequency resolution compromise at its best, a tight time-frequency suit should be tailored to snuggly fit the sound body.In this paper we overview flexible design methods for phase vocoders with nonuniform resolutions. The methods are based on remapping the time or the frequency axis, or both, by employing suitable functions acting as warping maps, which locally change the characteristics of the time-frequency plane. As a result, the sliding windows may have time dependent
Emelyanova, Irina A.; Borisova, Elena A.; Shapovalova, Olga E.; Karynbaeva, Olga V.; Vorotilkina, Irina M.
The relevance of the research is due to the necessity of creating the pedagogical conditions for correction and development of speech in children having the general speech underdevelopment. For them, difficulties generating a coherent utterance are characteristic, which prevents a sufficient speech readiness for schooling forming in them as well…
Pilling, Michael; Thomas, Sharon
Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…
Tucker, Raymond K.
General systems theory can be applied to problems in the teaching of speech communication courses. The author describes general systems theory as it is applied to the designing, conducting and evaluation of speech communication courses. (Author/MS)
Full Text Available A review of the cultural origins of the speaking machines is proposed. These origins are identified in the separation between soul and body, proposed by Descartes. The speaking machines, and then the vocoder, are substantially formed by a controlled part and an automatic part; in the voder, the synthetic speaker is directly controlled by a human being, conversely in the vocoder an authomatic process is implemented. Through historical examples such as the ‘automata’ by Mical, Kratzenstein, von Kempelen, Wheatstone, Helmoltz, until the invention of the Dudley’s vocoder, are described a growth and a stratification of devices and synthesizers applied to varied fields (signal representation, encrypted transmission, medical apparatus, cybernetic, music. The historical period treated ideally ends with Dolson’s article written in 1986, “The Phase Vocoder: A Tutorial”. Through musical examples is described the origin of control data handling in composition. The musical application of the phase vocoder describes the distinction between control and synthesis better than other technologies. Both control (parameter values and synthesis (instruments design are compositional problems that involve all the electroacoustic music composers.
Schepers, E; Bodar, L; van Os, J; Lousberg, R
There is evidence that experimentally elicited auditory illusions in the general population index risk for psychotic symptoms. As little is known about underlying cortical mechanisms of auditory illusions, an experiment was conducted to analyze processing of auditory illusions in a general population sample. In a follow-up design with two measurement moments (baseline and 6 months), participants (n = 83) underwent the White Noise task under simultaneous recording with a 14-lead EEG. An auditory illusion was defined as hearing any speech in a sound fragment containing white noise. A total number of 256 speech illusions (SI) were observed over the two measurements, with a high degree of stability of SI over time. There were 7 main effects of speech illusion on the EEG alpha band-the most significant indicating a decrease in activity at T3 (t = -4.05). Other EEG frequency bands (slow beta, fast beta, gamma, delta, theta) showed no significant associations with SI. SIs are characterized by reduced alpha activity in non-clinical populations. Given the association of SIs with psychosis, follow-up research is required to examine the possibility of reduced alpha activity mediating SIs in high risk and symptomatic populations.
Kolbæk, Morten; Tan, Zheng-Hua; Jensen, Jesper
In this paper, we study aspects of single microphone speech enhancement (SE) based on deep neural networks (DNNs). Specifically, we explore the generalizability capabilities of state-of-the-art DNN-based SE systems with respect to the background noise type, the gender of the target speaker...... general. Finally, we compare how a DNN-based SE system trained to be noise type general, speaker general, and SNR general performs relative to a state-of-the-art short-time spectral amplitude minimum mean square error (STSA-MMSE) based SE algorithm. We show that DNN-based SE systems, when trained...... a state-of-the-art STSA-MMSE based SE method, when tested using a range of unseen speakers and noise types. Finally, a listening test using several DNN-based SE systems tested in unseen speaker conditions show that these systems can improve SI for some SNR and noise type configurations but degrade SI...
Rose, Richard C.; Barnwell, Thomas P., III
The performance of a broad class of analysis-by-synthesis linear predictive speech coders is quantified experimentally. The class of coders includes a number of well-known techniques as well as a very large number of speech coders which have not been named or studied. A general formulation for deriving the parametric representation used in all of the coders in the class is presented. A new coder, named the self-excited vocoder, is discussed because of its good performance with low complexity, and because of the insight this coder gives to analysis-by-synthesis coders in general. The results of a study comparing the performances of different members of this class are presented. The study takes the form of a series of formal subjective and objective speech quality tests performed on selected coders. The results of this study lead to some interesting and important observations concerning the controlling parameters for analysis-by-synthesis speech coders.
Ottosen, Emil Solsbæk; Dörfler, Monika
We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive timefrequency (TF) representations and adaptive phase locking. The adaptive TF...
Recent experiments using low power EM-radar like sensors (e.g, GEMs) have demonstrated a new method for measuring vocal fold activity and the onset times of voiced speech, as vocal fold contact begins to take place. Similarly the end time of a voiced speech segment can be measured. Secondly it appears that in most normal uses of American English speech, unvoiced-speech segments directly precede or directly follow voiced-speech segments. For many applications, it is useful to know typical duration times of these unvoiced speech segments. A corpus, assembled earlier of spoken ''Timit'' words, phrases, and sentences and recorded using simultaneously measured acoustic and EM-sensor glottal signals, from 16 male speakers, was used for this study. By inspecting the onset (or end) of unvoiced speech, using the acoustic signal, and the onset (or end) of voiced speech using the EM sensor signal, the average duration times for unvoiced segments preceding onset of vocalization were found to be 300ms, and for following segments, 500ms. An unvoiced speech period is then defined in time, first by using the onset of the EM-sensed glottal signal, as the onset-time marker for the voiced speech segment and end marker for the unvoiced segment. Then, by subtracting 300ms from the onset time mark of voicing, the unvoiced speech segment start time is found. Similarly, the times for a following unvoiced speech segment can be found. While data of this nature have proven to be useful for work in our laboratory, a great deal of additional work remains to validate such data for use with general populations of users. These procedures have been useful for applying optimal processing algorithms over time segments of unvoiced, voiced, and non-speech acoustic signals. For example, these data appear to be of use in speaker validation, in vocoding, and in denoising algorithms.
Foxx, R. M.; And Others
Three mentally handicapped students (aged 13, 36, and 40) with maladaptive speech received training to answer questions with verbal labels. The results of their cues-pause-point training showed that the students replaced their maladaptive speech with correct labels (answers) to questions in the training setting and three generalization settings.…
Rossi, N. F.; Giacheti, C. M.
Background: Williams syndrome (WS) phenotype is described as unique and intriguing. The aim of this study was to investigate the associations between speech-language abilities, general cognitive functioning and behavioural problems in individuals with WS, considering age effects and speech-language characteristics of WS sub-groups. Methods: The…
Christiansen, Claus; MacDonald, Ewen; Dau, Torsten
Masking release (MR) is the improvement in speech intelligibility for a fluctuating interferer compared to stationary noise. Reduction in MR due to vocoder processing is usually linked to distortions in the temporal fine structure of the stimuli and a corresponding reduction in the fundamental fr...
Congdon, Eliza L; Novack, Miriam A; Brooks, Neon; Hemani-Lopez, Naureen; O'Keefe, Lucy; Goldin-Meadow, Susan
When teachers gesture during instruction, children retain and generalize what they are taught (Goldin-Meadow, 2014). But why does gesture have such a powerful effect on learning? Previous research shows that children learn most from a math lesson when teachers present one problem-solving strategy in speech while simultaneously presenting a different, but complementary, strategy in gesture (Singer & Goldin-Meadow, 2005). One possibility is that gesture is powerful in this context because it presents information simultaneously with speech. Alternatively, gesture may be effective simply because it involves the body, in which case the timing of information presented in speech and gesture may be less important for learning. Here we find evidence for the importance of simultaneity: 3 rd grade children retain and generalize what they learn from a math lesson better when given instruction containing simultaneous speech and gesture than when given instruction containing sequential speech and gesture. Interpreting these results in the context of theories of multimodal learning, we find that gesture capitalizes on its synchrony with speech to promote learning that lasts and can be generalized.
Ottosen, Emil Solsbæk; Dörfler, Monika
We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive timefrequency (TF) representations and adaptive phase locking. The adaptive TF...... representations imply good time resolution for the onsets of attack transients and good frequency resolution for the sinusoidal components. We estimate the phase values only at peak channels and the remaining phases are then locked to the values of the peaks in an adaptive manner. During attack transients we keep...... that with just three times as many TF coefficients as signal samples, artifacts such as phasiness and transient smearing can be greatly reduced compared to the classical PV. The proposed algorithm is tested on both synthetic and real world signals and compared with state of the art algorithms in a reproducible...
Christiansen, Claus Forup Corlin; Pedersen, Michael Syskind; Dau, Torsten
in noise experiment was used for training and an ideal binary mask experiment was used for evaluation. All three models were able to capture the trends in the speech in noise training data well, but the proposed model provides a better prediction of the binary mask test data, particularly when the binary...... masks degenerate to a noise vocoder....
Maidment, David W.; Kang, Hi Jee; Stewart, Hannah J.; Amitay, Sygal
Purpose: The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Method: Children (n = 69) and adults (n = 15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in…
Kleinschmidt, Dave F.; Jaeger, T. Florian
Successful speech perception requires that listeners map the acoustic signal to linguistic categories. These mappings are not only probabilistic, but change depending on the situation. For example, one talker’s /p/ might be physically indistinguishable from another talker’s /b/ (cf. lack of invariance). We characterize the computational problem posed by such a subjectively non-stationary world and propose that the speech perception system overcomes this challenge by (1) recognizing previously encountered situations, (2) generalizing to other situations based on previous similar experience, and (3) adapting to novel situations. We formalize this proposal in the ideal adapter framework: (1) to (3) can be understood as inference under uncertainty about the appropriate generative model for the current talker, thereby facilitating robust speech perception despite the lack of invariance. We focus on two critical aspects of the ideal adapter. First, in situations that clearly deviate from previous experience, listeners need to adapt. We develop a distributional (belief-updating) learning model of incremental adaptation. The model provides a good fit against known and novel phonetic adaptation data, including perceptual recalibration and selective adaptation. Second, robust speech recognition requires listeners learn to represent the structured component of cross-situation variability in the speech signal. We discuss how these two aspects of the ideal adapter provide a unifying explanation for adaptation, talker-specificity, and generalization across talkers and groups of talkers (e.g., accents and dialects). The ideal adapter provides a guiding framework for future investigations into speech perception and adaptation, and more broadly language comprehension. PMID:25844873
Speech recognition by normal-hearing listeners improves as a function of the number of spectral channels when tested with a noiseband vocoder simulating cochlear implant signal processing. Speech recognition by the best cochlear implant users, however, saturates around eight channels and does not
Full Text Available Recent findings [Dilley and Pitt, 2010. Psych. Science. 21, 1664-1670] have shown that manipulating context speech rate in English can cause entire syllables to disappear or appear perceptually. The current studies tested two rate-based explanations of this phenomenon while attempting to replicate and extend these findings to another language, Russian. In Experiment 1, native Russian speakers listened to Russian sentences which had been subjected to rate manipulations and performed a lexical report task. Experiment 2 investigated speech rate effects in cross-language speech perception; non-native speakers of Russian of both high and low proficiency were tested on the same Russian sentences as in Experiment 1. They decided between two lexical interpretations of a critical portion of the sentence, where one choice contained more phonological material than the other (e.g., /stərʌ'na/ side vs. /strʌ'na/ country. In both experiments, with native and non-native speakers of Russian, context speech rate and the relative duration of the critical sentence portion were found to influence the amount of phonological material perceived. The results support the generalized rate normalization hypothesis, according to which the content perceived in a spectrally ambiguous stretch of speech depends on the duration of that content relative to the surrounding speech, while showing that the findings of Dilley and Pitt (2010 extend to a variety of morphosyntactic contexts and a new language, Russian. Findings indicate that relative timing cues across an utterance can be critical to accurate lexical perception by both native and non-native speakers.
Fenn, Kimberly M; Margoliash, Daniel; Nusbaum, Howard C
Sleep-dependent consolidation has been demonstrated for declarative and procedural memory but few theories of consolidation distinguish between rote and generalized learning, suggesting similar consolidation should occur for both. However, studies using rote and generalized learning have suggested different patterns of consolidation may occur, although different tasks have been used across studies. Here we directly compared consolidation of rote and generalized learning using a single speech identification task. Training on a large set of novel stimuli resulted in substantial generalized learning, and sleep restored performance that had degraded after 12 waking hours. Training on a small set of repeated stimuli primarily resulted in rote learning and performance also degraded after 12 waking hours but was not restored by sleep. Moreover performance was significantly worse 24-h after rote training. Our results suggest a functional dissociation between the mechanisms of consolidation for rote and generalized learning which has broad implications for memory models. Copyright © 2013 Elsevier B.V. All rights reserved.
spectrograms of these phrases were generated by a List 13 Processing Language (LISP) on a Symbolics 3670 artificial intelligence computer (see Figure 10). The...speech and the amount of difference varies with the type of vocoder. 26 ADPC INTELIGIBILITY AND TOE OF MAING 908 78- INTELLIGIBILITY 48 LI OS NORMA 30
Cabrera, Laurianne; Bertoncini, Josiane; Lorenzi, Christian
Purpose: The capacity of 6-month-old infants to discriminate a voicing contrast (/aba/--/apa/) on the basis of "amplitude modulation (AM) cues" and "frequency modulation (FM) cues" was evaluated. Method: Several vocoded speech conditions were designed to either degrade FM cues in 4 or 32 bands or degrade AM in 32 bands. Infants…
This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering
Kriengwatana, B.; Escudero, P.; Kerkhoven, A.H.; ten Cate, C.
Different speakers produce the same speech sound differently, yet listeners are still able to reliably identify the speech sound. How listeners can adjust their perception to compensate for speaker differences in speech, and whether these compensatory processes are unique only to humans, is still
Carbonell, Kathy M.
One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.
Casini, Laurence; Burle, Boris; Nguyen, Noel
Time is essential to speech. The duration of speech segments plays a critical role in the perceptual identification of these segments, and therefore in that of spoken words. Here, using a French word identification task, we show that vowels are perceived as shorter when attention is divided between two tasks, as compared to a single task control…
Lotto, A J; Kluender, K R
When members of a series of synthesized stop consonants varying acoustically in F3 characteristics and varying perceptually from /da/ to /ga/ are preceded by /al/, subjects report hearing more /ga/ syllables relative to when each member is preceded by /ar/ (Mann, 1980). It has been suggested that this result demonstrates the existence of a mechanism that compensates for coarticulation via tacit knowledge of articulatory dynamics and constraints, or through perceptual recovery of vocal-tract dynamics. The present study was designed to assess the degree to which these perceptual effects are specific to qualities of human articulatory sources. In three experiments, series of consonant-vowel (CV) stimuli varying in F3-onset frequency (/da/-/ga/) were preceded by speech versions or nonspeech analogues of /al/ and /ar/. The effect of liquid identity on stop consonant labeling remained when the preceding VC was produced by a female speaker and the CV syllable was modeled after a male speaker's productions. Labeling boundaries also shifted when the CV was preceded by a sine wave glide modeled after F3 characteristics of /al/ and /ar/. Identifications shifted even when the preceding sine wave was of constant frequency equal to the offset frequency of F3 from a natural production. These results suggest an explanation in terms of general auditory processes as opposed to recovery of or knowledge of specific articulatory dynamics.
Full Text Available The article reports the author’s experiences with the phase vocoder. Starting from the first attempts during the years 1973-77 – in connection with a speculative project to morph the sounds of a speaking voice into sounds from the natural world, project subsequently developed at Ircam in Paris between 1979 and 1986 – up to the most recent experiences in 2011-12 associated with the realization of Supernova, an 8-channel sound-surround piece, where the phase vocoder data format is used as a synthesis tool.
Full Text Available Different speakers produce the same speech sound differently, yet listeners are still able to reliably identify the speech sound. How listeners can adjust their perception to compensate for speaker differences in speech, and whether these compensatory processes are unique only to humans, is still not fully understood. In this study we compare the ability of humans and zebra finches to categorize vowels despite speaker variation in speech in order to test the hypothesis that accommodating speaker and gender differences in isolated vowels can be achieved without prior experience with speaker-related variability. Using a behavioural Go/No-go task and identical stimuli, we compared Australian English adults’ (naïve to Dutch and zebra finches’ (naïve to human speech ability to categorize /ɪ/ and /ɛ/ vowels of an novel Dutch speaker after learning to discriminate those vowels from only one other speaker. Experiment 1 and 2 presented vowels of two speakers interspersed or blocked, respectively. Results demonstrate that categorization of vowels is possible without prior exposure to speaker-related variability in speech for zebra finches, and in non-native vowel categories for humans. Therefore, this study is the first to provide evidence for what might be a species-shared auditory bias that may supersede speaker-related information during vowel categorization. It additionally provides behavioural evidence contradicting a prior hypothesis that accommodation of speaker differences is achieved via the use of formant ratios. Therefore, investigations of alternative accounts of vowel normalization that incorporate the possibility of an auditory bias for disregarding inter-speaker variability are warranted.
Conroy, Paul; Sage, Karen; Ralph, Matt Lambon
Naming accuracy for nouns and verbs in aphasia can vary across different elicitation contexts, for example, simple picture naming, composite picture description, narratives, and conversation. For some people with aphasia, naming may be more accurate to simple pictures as opposed to naming in spontaneous, connected speech; for others, the opposite pattern may be evident. These differences have, in some instances, been related to word class (for example, noun or verb) as well as aphasia subtype. Given that the aim of picture-naming therapies is to improve word-finding in general, these differences in naming accuracy across contexts may have important implications for the potential functional benefits of picture-naming therapies. This study aimed to explore single-word therapy for both nouns and verbs, and to answer the following questions. (1) To what extent does an increase in naming accuracy after picture-naming therapy (for both nouns and verbs) predict accurate naming of the same items in less constrained spontaneous connected speech tasks such as composite picture description and retelling of a narrative? (2) Does the word class targeted in therapy (verb or noun) dictate whether there is 'carry-over' of the therapy item to connected speech tasks? (3) Does the speed at which the picture is named after therapy predict whether it will also be used appropriately in connected speech tasks? Seven participants with aphasia of varying degrees of severity and subtype took part in ten therapy sessions over five weeks. A set of potentially useful items was collected from control participant accounts of the Cookie Theft Picture Description and the Cinderella Story from the Quantitative Production Analysis. Twenty-four of these words (twelve verbs and twelve nouns) were collated for each participant, on the basis that they had failed to name them in either simple picture naming or connected speech tasks (picture-supported narrative and unsupported retelling of a narrative
Anand, Ankita; Kar, Asutosh; Swamy, M.N.S.
a BLPC vocoderbased adaptive feedback canceller with probe noise with an objective of reducing the low-frequency bias in digital hearing-aids. A step-wise mathematical analysis of the proposed feedback canceller is presented employing the recursive least square and normalized least mean square adaptive......The band-limited linear predictive coding (BLPC) vocoder-based adaptive feedback cancellation (AFC) removes the high-frequency bias, while the low frequency bias persists between the desired input signal and the loudspeaker signal in the estimate of the feedback path. In this paper, we present...... algorithms. It is observed that the optimal solution of the feedback path is unbiased for an unshaped probe noise, but is biased for a shaped probe signal; the bias term does not consist of correlation between the desired input and the loudspeaker output. The identifiability conditions are analysed...
In this speech the author points out two main recommendations. The first message concerns the necessity of a whole mobilization in favor of the sustainable development, from the government policy and the enterprises management to the human behavior. He presents then three main axis to heighten the enterprises (reinforce the information on the environmental and social impact of the economic activities, the development of sustainable investments, the development of the environmental sponsorship). The second message concerns the necessity to place the environment in the economic growth by the development of the ecology and the eco-technology. (A.L.B.)
Stone, Michael A; Füllgrabe, Christian; Mackinnon, Robert C; Moore, Brian C J
Spectrally shaped steady noise is commonly used as a masker of speech. The effects of inherent random fluctuations in amplitude of such a noise are typically ignored. Here, the importance of these random fluctuations was assessed by comparing two cases. For one, speech was mixed with steady speech-shaped noise and N-channel tone vocoded, a process referred to as signal-domain mixing (SDM); this preserved the random fluctuations of the noise. For the second, the envelope of speech alone was extracted for each vocoder channel and a constant was added corresponding to the root-mean-square value of the noise envelope for that channel. This is referred to as envelope-domain mixing (EDM); it removed the random fluctuations of the noise. Sinusoidally modulated noise and a single talker were also used as backgrounds, with both SDM and EDM. Speech intelligibility was measured for N = 12, 19, and 30, with the target-to-background ratio fixed at -7 dB. For SDM, performance was best for the speech background and worst for the steady noise. For EDM, this pattern was reversed. Intelligibility with steady noise was consistently very poor for SDM, but near-ceiling for EDM, demonstrating that the random fluctuations in steady noise have a large effect.
Schulthess, H. von
The speech to the Geneva conference relates to recent and future developments in Switzerland in the production of electricity. Total electricity consumption in Switzerland fell by 1.7% during the 1975 recession, but otherwise has continued to grow steadily and is expected to increase at about 3.7% annually until 1984 and 2.5% thereafter. Hydro-electricity has contributed 71% of total electricity generation so far. References are made to the recent legal changes permitting increased use in nuclear energy and to nuclear stations already built or under development. In other energy consumption fields, natural gas has expanded most rapidly, almost three times in 10 years. Expansion into oil usage was retarded by the Middle East crisis. Some plans made for rationing electricity consumption to meet future crises are quoted. (G.C.)
Full Text Available Until a few months to the 2015 general elections, many political parties that have conveniently hibernated for a better part of their existence, perhaps owing to lack of proper organizational structure or support base, uncoordinated programmes or were registered because of pecuniary gains or admittance of anticipated poor electoral outing, suddenly began to jostle for political space. The main opposition party and the ruling party were either perfecting a merger processes or engulfed in internal wrangling such that communication with the electorate on fundamental issues became inconsequential. In fact, the two dominant parties, the Peoples Democratic Party and All Progressive Congress only produced their presidential candidates less than five months to the election; and the electoral campaign assumed desperate contestation in a climate of prejudice and intolerance. Hate speeches and violence were the hallmarks of their electoral campaigns. The 2015 general elections therefore, offer a unique context to interrogate the place of party political communication in an emerging democracy and specifically how hate campaigns among political gladiators/contending parties could generate violence, and if not tamed, derail democratic consolidation. This essay affirms that hate speech is not only inspired by some social circumstances but also part of a general democratic process. It attests to the fact that Nigerian politicians have become more desperate and daring in taking and retaining political power; and more intolerant of opposition, criticism and efforts at replacing them. Relying extensively on secondary sources with the aid of descriptive and narrative tools, this essay concludes that the political culture of a country determines the behavior and attitude of the population towards the political system and that democratic transition from one administration to another, particularly in emerging democracies, has often been accompanied by violence
Lee, Yune-Sang; Min, Nam Eun; Wingfield, Arthur; Grossman, Murray; Peelle, Jonathan E
The information contained in a sensory signal plays a critical role in determining what neural processes are engaged. Here we used interleaved silent steady-state (ISSS) functional magnetic resonance imaging (fMRI) to explore how human listeners cope with different degrees of acoustic richness during auditory sentence comprehension. Twenty-six healthy young adults underwent scanning while hearing sentences that varied in acoustic richness (high vs. low spectral detail) and syntactic complexity (subject-relative vs. object-relative center-embedded clause structures). We manipulated acoustic richness by presenting the stimuli as unprocessed full-spectrum speech, or noise-vocoded with 24 channels. Importantly, although the vocoded sentences were spectrally impoverished, all sentences were highly intelligible. These manipulations allowed us to test how intelligible speech processing was affected by orthogonal linguistic and acoustic demands. Acoustically rich speech showed stronger activation than acoustically less-detailed speech in a bilateral temporoparietal network with more pronounced activity in the right hemisphere. By contrast, listening to sentences with greater syntactic complexity resulted in increased activation of a left-lateralized network including left posterior lateral temporal cortex, left inferior frontal gyrus, and left dorsolateral prefrontal cortex. Significant interactions between acoustic richness and syntactic complexity occurred in left supramarginal gyrus, right superior temporal gyrus, and right inferior frontal gyrus, indicating that the regions recruited for syntactic challenge differed as a function of acoustic properties of the speech. Our findings suggest that the neural systems involved in speech perception are finely tuned to the type of information available, and that reducing the richness of the acoustic signal dramatically alters the brain's response to spoken language, even when intelligibility is high. Copyright © 2015 Elsevier
McKee, John M.; And Others
Many people take for granted the use of language as a tool for coping with everyday occupational and social problems. However, there are those, such as prison inmates, who have difficulty using language in this manner. Realizing that prison inmates are not always able to communicate effectively through standard patterns of speech and thus are…
Apoux, Frédéric; Healy, Eric W
Many behavioral studies have reported a significant decrease in intelligibility when the temporal fine structure (TFS) of a sound mixture is replaced with noise or tones (i.e., vocoder processing). This finding has led to the conclusion that TFS information is critical for speech recognition in noise. How the normal -auditory system takes advantage of the original TFS, however, remains unclear. Three -experiments on the role of TFS in noise are described. All three experiments measured speech recognition in various backgrounds while manipulating the envelope, TFS, or both. One experiment tested the hypothesis that vocoder processing may artificially increase the apparent importance of TFS cues. Another experiment evaluated the relative contribution of the target and masker TFS by disturbing only the TFS of the target or that of the masker. Finally, a last experiment evaluated the -relative contribution of envelope and TFS information. In contrast to previous -studies, however, the original envelope and TFS were both preserved - to some extent - in all conditions. Overall, the experiments indicate a limited influence of TFS and suggest that little speech information is extracted from the TFS. Concomitantly, these experiments confirm that most speech information is carried by the temporal envelope in real-world conditions. When interpreted within the framework of the glimpsing model, the results of these experiments suggest that TFS is primarily used as a grouping cue to select the time-frequency regions -corresponding to the target speech signal.
Shahin, Antoine J; Shen, Stanley; Kerlin, Jess R
We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync . Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
This article provides an overview of how the constitutional protections for commercial speech affect the Food and Drug Administration's (FDA) regulation of drugs, and the emerging issues about the scope of these protections. A federal district court has already found that commercial speech allows manufacturers to distribute reprints of medical articles about a new off-label use of a drug as long as it contains disclosures to prevent deception and to inform readers about the lack of FDA review. This paper summarizes the current agency guidance that accepts the manufacturer's distribution of reprints with disclosures. Allergan, the maker of Botox, recently maintained in a lawsuit that the First Amendment permits drug companies to provide "truthful information" to doctors about "widely accepted" off-label uses of a drug. While the case was settled as part of a fraud and abuse case on other grounds, extending constitutional protections generally to "widely accepted" uses is not warranted, especially if it covers the use of a drug for a new purpose that needs more proof of efficacy, and that can involve substantial risks. A health law academic pointed out in an article examining a fraud and abuse case that off-label use of drugs is common, and that practitioners may lack adequate dosage information about the off-label uses. Drug companies may obtain approval of a drug for a narrow use, such as for a specific type of pain, but practitioners use the drug for similar uses based on their experience. The writer maintained that a controlled study may not be necessary to establish efficacy for an expanded use of a drug for pain. Even if this is the case, as discussed below in this paper, added safety risks may exist if the expansion covers a longer period of time and use by a wider number of patients. The protections for commercial speech should not be extended to allow manufacturers to distribute information about practitioner use with a disclosure about the lack of FDA
Bernstein, Lynne E; Auer, Edward T; Eberhardt, Silvio P; Jiang, Jintao
Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called "reverse hierarchy theory" of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.
Cheng, Octavian; Dines, John; Magimai.-Doss, Mathew
We propose a generalized dynamic composition algorithm of weighted finite state transducers (WFST), which avoids the creation of non-coaccessible paths, performs weight look-ahead and does not impose any constraints to the topology of the WFSTs. Experimental results on Wall Street Journal (WSJ1) 20k-word trigram task show that at 17\\% WER (moderately-wide beam width), the decoding time of the proposed approach is about 48\\% and 65\\% of the other two dynamic composition approaches. In comparis...
Oh, Soo Hee; Donaldson, Gail S; Kong, Ying-Yee
Low-frequency acoustic cues have been shown to enhance speech perception by cochlear-implant users, particularly when target speech occurs in a competing background. The present study examined the extent to which a continuous representation of low-frequency harmonicity cues contributes to bimodal benefit in simulated bimodal listeners. Experiment 1 examined the benefit of restoring a continuous temporal envelope to the low-frequency ear while the vocoder ear received a temporally interrupted stimulus. Experiment 2 examined the effect of providing continuous harmonicity cues in the low-frequency ear as compared to restoring a continuous temporal envelope in the vocoder ear. Findings indicate that bimodal benefit for temporally interrupted speech increases when continuity is restored to either or both ears. The primary benefit appears to stem from the continuous temporal envelope in the low-frequency region providing additional phonetic cues related to manner and F1 frequency; a secondary contribution is provided by low-frequency harmonicity cues when a continuous representation of the temporal envelope is present in the low-frequency, or both ears. The continuous temporal envelope and harmonicity cues of low-frequency speech are thought to support bimodal benefit by facilitating identification of word and syllable boundaries, and by restoring partial phonetic cues that occur during gaps in the temporally interrupted stimulus.
... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...
Micheyl, Christophe; Oxenham, Andrew J
The "combined-stimulation advantage" refers to an improvement in speech recognition when cochlear-implant or vocoded stimulation is supplemented by low-frequency acoustic information. Previous studies have been interpreted as evidence for "super-additive" or "synergistic" effects in the combination of low-frequency and electric or vocoded speech information by human listeners. However, this conclusion was based on predictions of performance obtained using a suboptimal high-threshold model of information combination. The present study shows that a different model, based on Gaussian signal detection theory, can predict surprisingly large combined-stimulation advantages, even when performance with either information source alone is close to chance, without involving any synergistic interaction. A reanalysis of published data using this model reveals that previous results, which have been interpreted as evidence for super-additive effects in perception of combined speech stimuli, are actually consistent with a more parsimonious explanation, according to which the combined-stimulation advantage reflects an optimal combination of two independent sources of information. The present results do not rule out the possible existence of synergistic effects in combined stimulation; however, they emphasize the possibility that the combined-stimulation advantages observed in some studies can be explained simply by non-interactive combination of two information sources.
Sohoglu, Ediz; Peelle, Jonathan E; Carlyon, Robert P; Davis, Matthew H
An unresolved question is how the reported clarity of degraded speech is enhanced when listeners have prior knowledge of speech content. One account of this phenomenon proposes top-down modulation of early acoustic processing by higher-level linguistic knowledge. Alternative, strictly bottom-up accounts argue that acoustic information and higher-level knowledge are combined at a late decision stage without modulating early acoustic processing. Here we tested top-down and bottom-up accounts using written text to manipulate listeners' knowledge of speech content. The effect of written text on the reported clarity of noise-vocoded speech was most pronounced when text was presented before (rather than after) speech (Experiment 1). Fine-grained manipulation of the onset asynchrony between text and speech revealed that this effect declined when text was presented more than 120 ms after speech onset (Experiment 2). Finally, the influence of written text was found to arise from phonological (rather than lexical) correspondence between text and speech (Experiment 3). These results suggest that prior knowledge effects are time-limited by the duration of auditory echoic memory for degraded speech, consistent with top-down modulation of early acoustic processing by linguistic knowledge. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Jerry D. Gibson
Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.
CERN 50th Anniversary Official Celebration : keynote speech from Professor Federico Mayor Zaragoza, Professor in the Molecular Biology at the Universidad Autónoma of Madrid, Former Director-General of UNESCO
CERN 50th Anniversary Official Celebration : keynote speech from Professor Federico Mayor Zaragoza, Professor in the Molecular Biology at the Universidad Autónoma of Madrid, Former Director-General of UNESCO
The VDE system developed had the capability of recognizing up to 248 separate words in syntactic structures. 4 The two systems described are isolated...AND SPEAKER RECOGNITION by M.J.Hunt 5 ASSESSMENT OF SPEECH SYSTEMS ’ ..- * . by R.K.Moore 6 A SURVEY OF CURRENT EQUIPMENT AND RESEARCH’ by J.S.Bridle...TECHNOLOGY IN NAVY TRAINING SYSTEMS by R.Breaux, M.Blind and R.Lynchard 10 9 I-I GENERAL REVIEW OF MILITARY APPLICATIONS OF VOICE PROCESSING DR. BRUNO
Maidment, David W; Kang, Hi Jee; Stewart, Hannah J; Amitay, Sygal
The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Children (n=69) and adults (n=15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in auditory-only or audiovisual conditions. The number of bands was adaptively varied to modulate the degradation of the auditory signal, with the number of bands required for approximately 79% correct identification calculated as the threshold. The youngest children (4- to 5-year-olds) did not benefit from accompanying visual information, in comparison to 6- to 11-year-old children and adults. Audiovisual gain also increased with age in the child sample. The current data suggest that children younger than 6 years of age do not fully utilize visual speech cues to enhance speech perception when the auditory signal is degraded. This evidence not only has implications for understanding the development of speech perception skills in children with normal hearing but may also inform the development of new treatment and intervention strategies that aim to remediate speech perception difficulties in pediatric cochlear implant users.
Rader, Tobias; Adel, Youssef; Fastl, Hugo; Baumann, Uwe
The aim of this study is to simulate speech perception with combined electric-acoustic stimulation (EAS), verify the advantage of combined stimulation in normal-hearing (NH) subjects, and then compare it with cochlear implant (CI) and EAS user results from the authors' previous study. Furthermore, an automatic speech recognition (ASR) system was built to examine the impact of low-frequency information and is proposed as an applied model to study different hypotheses of the combined-stimulation advantage. Signal-detection-theory (SDT) models were applied to assess predictions of subject performance without the need to assume any synergistic effects. Speech perception was tested using a closed-set matrix test (Oldenburg sentence test), and its speech material was processed to simulate CI and EAS hearing. A total of 43 NH subjects and a customized ASR system were tested. CI hearing was simulated by an aurally adequate signal spectrum analysis and representation, the part-tone-time-pattern, which was vocoded at 12 center frequencies according to the MED-EL DUET speech processor. Residual acoustic hearing was simulated by low-pass (LP)-filtered speech with cutoff frequencies 200 and 500 Hz for NH subjects and in the range from 100 to 500 Hz for the ASR system. Speech reception thresholds were determined in amplitude-modulated noise and in pseudocontinuous noise. Previously proposed SDT models were lastly applied to predict NH subject performance with EAS simulations. NH subjects tested with EAS simulations demonstrated the combined-stimulation advantage. Increasing the LP cutoff frequency from 200 to 500 Hz significantly improved speech reception thresholds in both noise conditions. In continuous noise, CI and EAS users showed generally better performance than NH subjects tested with simulations. In modulated noise, performance was comparable except for the EAS at cutoff frequency 500 Hz where NH subject performance was superior. The ASR system showed similar behavior
Gaudrain, Etienne; Carlyon, Robert P
Previous studies have suggested that cochlear implant users may have particular difficulties exploiting opportunities to glimpse clear segments of a target speech signal in the presence of a fluctuating masker. Although it has been proposed that this difficulty is associated with a deficit in linking the glimpsed segments across time, the details of this mechanism are yet to be explained. The present study introduces a method called Zebra-speech developed to investigate the relative contribution of simultaneous and sequential segregation mechanisms in concurrent speech perception, using a noise-band vocoder to simulate cochlear implants. One experiment showed that the saliency of the difference between the target and the masker is a key factor for Zebra-speech perception, as it is for sequential segregation. Furthermore, forward masking played little or no role, confirming that intelligibility was not limited by energetic masking but by across-time linkage abilities. In another experiment, a binaural cue was used to distinguish the target and the masker. It showed that the relative contribution of simultaneous and sequential segregation depended on the spectral resolution, with listeners relying more on sequential segregation when the spectral resolution was reduced. The potential of Zebra-speech as a segregation enhancement strategy for cochlear implants is discussed.
Steinmetzger, Kurt; Rosen, Stuart
The ability of normal-hearing listeners to perceive sentences in quiet and in background noise was investigated in a variety of conditions mixing the presence and absence of periodicity (i.e., voicing) in both target and masker. Experiment 1 showed that in quiet, aperiodic noise-vocoded speech and speech with a natural amount of periodicity were equally intelligible, while fully periodic speech was much harder to understand. In Experiments 2 and 3, speech reception thresholds for these targets were measured in the presence of four different maskers: speech-shaped noise, harmonic complexes with a dynamically varying F0 contour, and 10 Hz amplitude-modulated versions of both. For experiment 2, results of experiment 1 were used to identify conditions with equal intelligibility in quiet, while in experiment 3 target intelligibility in quiet was near ceiling. In the presence of a masker, periodicity in the target speech mattered little, but listeners strongly benefited from periodicity in the masker. Substantial fluctuating-masker benefits required the target speech to be almost perfectly intelligible in quiet. In summary, results suggest that the ability to exploit periodicity cues may be an even more important factor when attempting to understand speech embedded in noise than the ability to benefit from masker fluctuations.
Hasse Jørgensen, Stina
About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....
Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...
... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...
Booth, Alannah; Choto, Fadziso; Gotlieb, Jessica; Robertson, Rebecca; Morris, Gabriella; Stockley, Nicola; Mauff, Katya
Background Upon graduation, newly qualified speech-language therapists are expected to provide services independently. This study describes new graduates’ perceptions of their preparedness to provide services across the scope of the profession and explores associations between perceptions of dysphagia theory and clinical learning curricula with preparedness for adult and paediatric dysphagia service delivery. Methods New graduates of six South African universities were recruited to participate in a survey by completing an electronic questionnaire exploring their perceptions of the dysphagia curricula and their preparedness to practise across the scope of the profession of speech-language therapy. Results Eighty graduates participated in the study yielding a response rate of 63.49%. Participants perceived themselves to be well prepared in some areas (e.g. child language: 100%; articulation and phonology: 97.26%), but less prepared in other areas (e.g. adult dysphagia: 50.70%; paediatric dysarthria: 46.58%; paediatric dysphagia: 38.36%) and most unprepared to provide services requiring sign language (23.61%) and African languages (20.55%). There was a significant relationship between perceptions of adequate theory and clinical learning opportunities with assessment and management of dysphagia and perceptions of preparedness to provide dysphagia services. Conclusion There is a need for review of existing curricula and consideration of developing a standard speech-language therapy curriculum across universities, particularly in service provision to a multilingual population, and in both the theory and clinical learning of the assessment and management of adult and paediatric dysphagia, to better equip graduates for practice. PMID:26304217
Grieco-Calub, Tina M; Ward, Kristina M; Brehm, Laurel
Multitasking requires individuals to allocate their cognitive resources across different tasks. The purpose of the current study was to assess school-age children's multitasking abilities during degraded speech recognition. Children (8 to 12 years old) completed a dual-task paradigm including a sentence recognition (primary) task containing speech that was either unprocessed or noise-band vocoded with 8, 6, or 4 spectral channels and a visual monitoring (secondary) task. Children's accuracy and reaction time on the visual monitoring task was quantified during the dual-task paradigm in each condition of the primary task and compared with single-task performance. Children experienced dual-task costs in the 6- and 4-channel conditions of the primary speech recognition task with decreased accuracy on the visual monitoring task relative to baseline performance. In all conditions, children's dual-task performance on the visual monitoring task was strongly predicted by their single-task (baseline) performance on the task. Results suggest that children's proficiency with the secondary task contributes to the magnitude of dual-task costs while multitasking during degraded speech recognition.
Full Text Available Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN trained on electromagnetic articulography (EMA data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer.
Full Text Available Speech comprehension abilities decline with age and with age-related hearing loss, but it is unclear how this decline expresses in terms of central neural mechanisms. The current study examined neural speech processing in a group of older adults (aged 56–77, n=16, with varying degrees of sensorineural hearing loss, and compared them to a cohort of young adults (aged 22–31, n=30, self-reported normal hearing. In an fMRI experiment, listeners heard and repeated back degraded sentences (4-band vocoding, which preserves the temporal envelope of the acoustic signal, while substantially degrading spectral information. Behaviourally, older adults adapted to degraded speech at the same rate as young listeners, although their overall comprehension of degraded speech was lower. Neurally, both older and young adults relied on the left anterior insula for degraded more than clear speech perception. However, anterior insula engagement in older adults was dependent on hearing acuity. Young adults additionally employed the anterior cingulate cortex (ACC. Interestingly, this age group × degradation interaction was driven by a reduced dynamic range in older adults, who displayed elevated levels of ACC activity in both conditions, consistent with a persistent upregulation in cognitive control irrespective of task difficulty. For correct speech comprehension, older adults recruited the middle frontal gyrus in addition to a core speech comprehension network on which young adults relied, suggestive of a compensatory mechanism. Taken together, the results indicate that older adults increasingly recruit cognitive control networks, even under optimal listening conditions, at the expense of these systems’ dynamic range.
Eberhardt, Silvio P; Auer, Edward T; Bernstein, Lynne E
In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee's primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee's lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT).
Full Text Available There is a wide range of acoustic and visual variability across different talkers and different speaking contexts. Listeners with normal hearing accommodate that variability in ways that facilitate efficient perception, but it is not known whether listeners with cochlear implants can do the same. In this study, listeners with normal hearing (NH and listeners with cochlear implants (CIs were tested for accommodation to auditory and visual phonetic contexts created by gender-driven speech differences as well as vowel coarticulation and lip rounding in both consonants and vowels. Accommodation was measured as the shifting of perceptual boundaries between /s/ and /ʃ/ sounds in various contexts, as modeled by mixed-effects logistic regression. Owing to the spectral contrasts thought to underlie these context effects, CI listeners were predicted to perform poorly, but showed considerable success. Listeners with cochlear implants not only showed sensitivity to auditory cues to gender, they were also able to use visual cues to gender (i.e. faces as a supplement or proxy for information in the acoustic domain, in a pattern that was not observed for listeners with normal hearing. Spectrally-degraded stimuli heard by listeners with normal hearing generally did not elicit strong context effects, underscoring the limitations of noise vocoders and/or the importance of experience with electric hearing. Visual cues for consonant lip rounding and vowel lip rounding were perceived in a manner consistent with coarticulation and were generally used more heavily by listeners with CIs. Results suggest that listeners with cochlear implants are able to accommodate various sources of acoustic variability either by attending to appropriate acoustic cues or by inferring them via the visual signal.
Methods: New graduates of six South African universities were recruited to participate in a survey by completing an electronic questionnaire exploring their perceptions of the dysphagia curricula and their preparedness to practise across the scope of the profession of speechlanguage therapy. Results: Eighty graduates participated in the study yielding a response rate of 63.49%. Participants perceived themselves to be well prepared in some areas (e.g. child language: 100%; articulation and phonology: 97.26%, but less prepared in other areas (e.g. adult dysphagia: 50.70%; paediatric dysarthria: 46.58%; paediatric dysphagia: 38.36% and most unprepared to provide services requiring sign language (23.61% and African languages (20.55%. There was a significant relationship between perceptions of adequate theory and clinical learning opportunities with assessment and management of dysphagia and perceptions of preparedness to provide dysphagia services. Conclusion: There is a need for review of existing curricula and consideration of developing a standard speech-language therapy curriculum across universities, particularly in service provision to a multilingual population, and in both the theory and clinical learning of the assessment and management of adult and paediatric dysphagia, to better equip graduates for practice.
Moberly, Aaron C; Patel, Tirth R; Castellanos, Irina
As a result of their hearing loss, adults with cochlear implants (CIs) would self-report poorer executive functioning (EF) skills than normal-hearing (NH) peers, and these EF skills would be associated with performance on speech recognition tasks. EF refers to a group of high order neurocognitive skills responsible for behavioral and emotional regulation during goal-directed activity, and EF has been found to be poorer in children with CIs than their NH age-matched peers. Moreover, there is increasing evidence that neurocognitive skills, including some EF skills, contribute to the ability to recognize speech through a CI. Thirty postlingually deafened adults with CIs and 42 age-matched NH adults were enrolled. Participants and their spouses or significant others (informants) completed well-validated self-reports or informant-reports of EF, the Behavior Rating Inventory of Executive Function - Adult (BRIEF-A). CI users' speech recognition skills were assessed in quiet using several measures of sentence recognition. NH peers were tested for recognition of noise-vocoded versions of the same speech stimuli. CI users self-reported difficulty on EF tasks of shifting and task monitoring. In CI users, measures of speech recognition correlated with several self-reported EF skills. The present findings provide further evidence that neurocognitive factors, including specific EF skills, may decline in association with hearing loss, and that some of these EF skills contribute to speech processing under degraded listening conditions.
Ravishankar, C., Hughes Network Systems, Germantown, MD
Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the
Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M J; Poletto, Christopher J; Ludlow, Christy L
The issue of whether speech is supported by the same neural substrates as non-speech vocal tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, was compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed "auditory dorsal stream" in the left hemisphere--to support the production of vocal tract gestures that are not limited to speech processing.
Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias
Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....
Lynne E Bernstein
Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Heimbauer, Lisa A; Beran, Michael J; Owren, Michael J
A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human. Copyright © 2011 Elsevier Ltd. All rights reserved.
Power, Alan J; Colling, Lincoln J; Mead, Natasha; Barnes, Lisa; Goswami, Usha
Developmental dyslexia is consistently associated with difficulties in processing phonology (linguistic sound structure) across languages. One view is that dyslexia is characterised by a cognitive impairment in the "phonological representation" of word forms, which arises long before the child presents with a reading problem. Here we investigate a possible neural basis for developmental phonological impairments. We assess the neural quality of speech encoding in children with dyslexia by measuring the accuracy of low-frequency speech envelope encoding using EEG. We tested children with dyslexia and chronological age-matched (CA) and reading-level matched (RL) younger children. Participants listened to semantically-unpredictable sentences in a word report task. The sentences were noise-vocoded to increase reliance on envelope cues. Envelope reconstruction for envelopes between 0 and 10Hz showed that the children with dyslexia had significantly poorer speech encoding in the 0-2Hz band compared to both CA and RL controls. These data suggest that impaired neural encoding of low frequency speech envelopes, related to speech prosody, may underpin the phonological deficit that causes dyslexia across languages. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Shafiro, Valeriy; Sheft, Stanley; Gygi, Brian; Ho, Kim Thien N
Perceptual training with spectrally degraded environmental sounds results in improved environmental sound identification, with benefits shown to extend to untrained speech perception as well. The present study extended those findings to examine longer-term training effects as well as effects of mere repeated exposure to sounds over time. Participants received two pretests (1 week apart) prior to a week-long environmental sound training regimen, which was followed by two posttest sessions, separated by another week without training. Spectrally degraded stimuli, processed with a four-channel vocoder, consisted of a 160-item environmental sound test, word and sentence tests, and a battery of basic auditory abilities and cognitive tests. Results indicated significant improvements in all speech and environmental sound scores between the initial pretest and the last posttest with performance increments following both exposure and training. For environmental sounds (the stimulus class that was trained), the magnitude of positive change that accompanied training was much greater than that due to exposure alone, with improvement for untrained sounds roughly comparable to the speech benefit from exposure. Additional tests of auditory and cognitive abilities showed that speech and environmental sound performance were differentially correlated with tests of spectral and temporal-fine-structure processing, whereas working memory and executive function were correlated with speech, but not environmental sound perception. These findings indicate generalizability of environmental sound training and provide a basis for implementing environmental sound training programs for cochlear implant (CI) patients.
Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne
Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and
Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.
Anne Birgitta Nilsen
Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the
Benesty, Jacob; Chen, Jingdong
We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red
Mehta, G.; Cutler, A.
Although spontaneous speech occurs more frequently in most listeners’ experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalize to the recognition of spontaneous and read speech materials, and their response time to detect word-initial target phonem...
Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.
Evans, Samuel; Davis, Matthew H
How humans extract the identity of speech sounds from highly variable acoustic signals remains unclear. Here, we use searchlight representational similarity analysis (RSA) to localize and characterize neural representations of syllables at different levels of the hierarchically organized temporo-frontal pathways for speech perception. We asked participants to listen to spoken syllables that differed considerably in their surface acoustic form by changing speaker and degrading surface acoustics using noise-vocoding and sine wave synthesis while we recorded neural responses with functional magnetic resonance imaging. We found evidence for a graded hierarchy of abstraction across the brain. At the peak of the hierarchy, neural representations in somatomotor cortex encoded syllable identity but not surface acoustic form, at the base of the hierarchy, primary auditory cortex showed the reverse. In contrast, bilateral temporal cortex exhibited an intermediate response, encoding both syllable identity and the surface acoustic form of speech. Regions of somatomotor cortex associated with encoding syllable identity in perception were also engaged when producing the same syllables in a separate session. These findings are consistent with a hierarchical account of how variable acoustic signals are transformed into abstract representations of the identity of speech sounds. © The Author 2015. Published by Oxford University Press.
...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...
... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...
Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.
Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z
The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.
Gocen, Gokcen; Okur, Alpaslan
Generally, the speaking aspect is not properly debated when discussing the positive and negative effects of television (TV), especially on children. So, to highlight this point, this study was first initialized by asking the question: "What are the effects of TV on speech?" and secondly, to transform the effects that TV has on speech in…
Engels, H.J.M.; Heynick, F.; Staak, C.P.F. van der
Freud's contemporary fellow psychiatrist Emil Kraepelin collected over the course of several decades some 700 specimens of speech in dreams, mostly his own, along with various concomitant data. These generally exhibit far more obvious primary-process influence than do the dream speech specimens
The primary objective of this position paper is to assess the theoretical and empirical support that exists for the Mayo Clinic view of motor speech disorders in general, and for oromotor, nonverbal tasks as a window to speech production processes in particular. Literature both in support of and against the Mayo clinic view and the associated use…
Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.
This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...
Nittrouer, Susan; Lowenstein, Joanna H.
Developmental dyslexia is a condition in which children encounter difficulty learning to read in spite of adequate instruction. Although considerable effort has been expended trying to identify the source of the problem, no single solution has been agreed upon. The current study explored a new hypothesis, that developmental dyslexia may be due to faulty perceptual organization of linguistically relevant sensory input. To test that idea, sentence-length speech signals were processed to create either sine-wave or noise-vocoded analogs. Seventy children between 8 and 11 years of age, with and without dyslexia participated. Children with dyslexia were selected to have phonological awareness deficits, although those without such deficits were retained in the study. The processed sentences were presented for recognition, and measures of reading, phonological awareness, and expressive vocabulary were collected. Results showed that children with dyslexia, regardless of phonological subtype, had poorer recognition scores than children without dyslexia for both kinds of degraded sentences. Older children with dyslexia recognized the sine-wave sentences better than younger children with dyslexia, but no such effect of age was found for the vocoded materials. Recognition scores were used as predictor variables in regression analyses with reading, phonological awareness, and vocabulary measures used as dependent variables. Scores for both sorts of sentence materials were strong predictors of performance on all three dependent measures when all children were included, but only performance for the sine-wave materials explained significant proportions of variance when only children with dyslexia were included. Finally, matching young, typical readers with older children with dyslexia on reading abilities did not mitigate the group difference in recognition of vocoded sentences. Conclusions were that children with dyslexia have difficulty organizing linguistically relevant sensory
... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...
In this speech the author points out two main recommendations. The first message concerns the necessity of a whole mobilization in favor of the sustainable development, from the government policy and the enterprises management to the human behavior. He presents then three main axis to heighten the enterprises (reinforce the information on the environmental and social impact of the economic activities, the development of sustainable investments, the development of the environmental sponsorship). The second message concerns the necessity to place the environment in the economic growth by the development of the ecology and the eco-technology. (A.L.B.)
Morrill, Tuuli H; McAuley, J Devin; Dilley, Laura C; Hambrick, David Z
Do the same mechanisms underlie processing of music and language? Recent investigations of this question have yielded inconsistent results. Likely factors contributing to discrepant findings are use of small samples and failure to control for individual differences in cognitive ability. We investigated the relationship between music and speech prosody processing, while controlling for cognitive ability. Participants (n = 179) completed a battery of cognitive ability tests, the Montreal Battery of Evaluation of Amusia (MBEA) to assess music perception, and a prosody test of pitch peak timing discrimination (early, as in insight vs. late, incite). Structural equation modeling revealed that only music perception was a significant predictor of prosody test performance. Music perception accounted for 34.5% of variance on prosody test performance; cognitive abilities and music training added only about 8%. These results indicate musical pitch and temporal processing are highly predictive of pitch discrimination in speech processing, even after controlling for other possible predictors of this aspect of language processing. (c) 2015 APA, all rights reserved).
Eskelund, Kasper; Andersen, Tobias
Speech perception is audiovisual as evidenced by bimodal integration in the McGurk effect. This integration effect may be specific to speech or be applied to all stimuli in general. To investigate this, Tuomainen et al. (2005) used sine-wave speech, which naÃ¯ve observers may perceive as non......-speech, but hear as speech once informed of the linguistic origin of the signal. Combinations of sine-wave speech and incongruent video of the talker elicited a McGurk effect only for informed observers. This indicates that the audiovisual integration effect is specific to speech perception. However, observers...... that observers did look near the mouth. We conclude that eye-movements did not influence the results of Tuomainen et al. and that their results thus can be taken as evidence of a speech specific mode of audiovisual integration underlying the McGurk illusion....
Spiel, G; Brunner, E; Allmayer, B; Pletz, A
Speech disabilities (articulation deficits) and language disorders--expressive (vocabulary) receptive (language comprehension) are not uncommon in children. An overview of these along with a global description of the impairment of communication as well as clinical characteristics of language developmental disorders are presented in this article. The diagnostic tables, which are applied in the European and Anglo-American speech areas, ICD-10 and DSM-IV, have been explained and compared. Because of their strengths and weaknesses an alternative classification of language and speech developmental disorders is proposed, which allows a differentiation between expressive and receptive language capabilities with regard to the semantic and the morphological/syntax domains. Prevalence and comorbidity rates, psychosocial influences, biological factors and the biological social interaction have been discussed. The necessity of the use of standardized examinations is emphasised. General logopaedic treatment paradigms, specific therapy concepts and an overview of prognosis have been described.
Chung, Tae Sub; Suh, Jung Ho; Kim, Dong Ik; Kim, Gwi Eon; Hong, Won Phy; Lee, Won Sang
Total laryngectomee requires some form of alaryngeal speech for communication. Generally, esophageal speech is regarded as the most available and comfortable technique for alaryngeal speech. But esophageal speech is difficult to train, so many patients are unable to attain esophageal speech for communication. To understand mechanism of esophageal of esophageal speech on total laryngectomee, evaluation of anatomical change of the pharyngoesophageal segment is very important. We used video fluoroscopy for evaluation of pharyngesophageal segment during esophageal speech. Eighteen total laryngectomees were evaluated with video fluoroscopy from Dec. 1986 to May 1987 at Y.U.M.C. Our results were as follows: 1. Peseudoglottis is the most important factor for esophageal speech, which is visualized in 7 cases among 8 cases of excellent esophageal speech group. 2. Two cases of longer A-P diameter at the pseudoglottis have the best quality of esophageal speech than others. 3. Two cases of mucosal vibration at the pharyngoesophageal segment can make excellent esophageal speech. 4. The cases of failed esophageal speech are poor aerophagia in 6 cases, abscence of pseudoglottis in 4 cases and poor air ejection in 3 cases. 5. Aerophagia synchronizes with diaphragmatic motion in 8 cases of excellent esophageal speech.
Eskelund, Kasper; Andersen, Tobias
Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...
... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...
Chan, Alexander M; Dykstra, Andrew R; Jayaram, Vinay; Leonard, Matthew K; Travis, Katherine E; Gygi, Brian; Baker, Janet M; Eskandar, Emad; Hochberg, Leigh R; Halgren, Eric; Cash, Sydney S
How the brain extracts words from auditory signals is an unanswered question. We recorded approximately 150 single and multi-units from the left anterior superior temporal gyrus of a patient during multiple auditory experiments. Against low background activity, 45% of units robustly fired to particular spoken words with little or no response to pure tones, noise-vocoded speech, or environmental sounds. Many units were tuned to complex but specific sets of phonemes, which were influenced by local context but invariant to speaker, and suppressed during self-produced speech. The firing of several units to specific visual letters was correlated with their response to the corresponding auditory phonemes, providing the first direct neural evidence for phonological recoding during reading. Maximal decoding of individual phonemes and words identities was attained using firing rates from approximately 5 neurons within 200 ms after word onset. Thus, neurons in human superior temporal gyrus use sparse spatially organized population encoding of complex acoustic-phonetic features to help recognize auditory and visual words. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org.
McGettigan, Carolyn; Faulkner, Andrew; Altarelli, Irene; Obleser, Jonas; Baverstock, Harriet; Scott, Sophie K.
Speech comprehension is a complex human skill, the performance of which requires the perceiver to combine information from several sources – e.g. voice, face, gesture, linguistic context – to achieve an intelligible and interpretable percept. We describe a functional imaging investigation of how auditory, visual and linguistic information interact to facilitate comprehension. Our specific aims were to investigate the neural responses to these different information sources, alone and in interaction, and further to use behavioural speech comprehension scores to address sites of intelligibility-related activation in multifactorial speech comprehension. In fMRI, participants passively watched videos of spoken sentences, in which we varied Auditory Clarity (with noise-vocoding), Visual Clarity (with Gaussian blurring) and Linguistic Predictability. Main effects of enhanced signal with increased auditory and visual clarity were observed in overlapping regions of posterior STS. Two-way interactions of the factors (auditory × visual, auditory × predictability) in the neural data were observed outside temporal cortex, where positive signal change in response to clearer facial information and greater semantic predictability was greatest at intermediate levels of auditory clarity. Overall changes in stimulus intelligibility by condition (as determined using an independent behavioural experiment) were reflected in the neural data by increased activation predominantly in bilateral dorsolateral temporal cortex, as well as inferior frontal cortex and left fusiform gyrus. Specific investigation of intelligibility changes at intermediate auditory clarity revealed a set of regions, including posterior STS and fusiform gyrus, showing enhanced responses to both visual and linguistic information. Finally, an individual differences analysis showed that greater comprehension performance in the scanning participants (measured in a post-scan behavioural test) were associated with
Phifer, Gregg, Ed.
The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…
Ward, Caitlin M; Rogers, Chad S; Van Engen, Kristin J; Peelle, Jonathan E
A common goal during speech comprehension is to remember what we have heard. Encoding speech into long-term memory frequently requires processes such as verbal working memory that may also be involved in processing degraded speech. Here the authors tested whether young and older adult listeners' memory for short stories was worse when the stories were acoustically degraded, or whether the additional contextual support provided by a narrative would protect against these effects. The authors tested 30 young adults (aged 18-28 years) and 30 older adults (aged 65-79 years) with good self-reported hearing. Participants heard short stories that were presented as normal (unprocessed) speech or acoustically degraded using a noise vocoding algorithm with 24 or 16 channels. The degraded stories were still fully intelligible. Following each story, participants were asked to repeat the story in as much detail as possible. Recall was scored using a modified idea unit scoring approach, which included separately scoring hierarchical levels of narrative detail. Memory for acoustically degraded stories was significantly worse than for normal stories at some levels of narrative detail. Older adults' memory for the stories was significantly worse overall, but there was no interaction between age and acoustic clarity or level of narrative detail. Verbal working memory (assessed by reading span) significantly correlated with recall accuracy for both young and older adults, whereas hearing ability (better ear pure tone average) did not. The present findings are consistent with a framework in which the additional cognitive demands caused by a degraded acoustic signal use resources that would otherwise be available for memory encoding for both young and older adults. Verbal working memory is a likely candidate for supporting both of these processes.
Ward, Caitlin M.; Rogers, Chad S.; Van Engen, Kristin J.; Peelle, Jonathan E.
Background A common goal during speech comprehension is to remember what we have heard. Encoding speech into long-term memory frequently requires processes such as verbal working memory that may also be involved in processing degraded speech. Here we tested whether young and older adult listeners’ memory for short stories was worse when the stories were acoustically degraded, or whether the additional contextual support provided by a narrative would protect against these effects. Methods We tested 30 young adults (aged 18–28 years) and 30 older adults (aged 65–79 years) with good self-reported hearing. Participants heard short stories that were presented as normal (unprocessed) speech, or acoustically degraded using a noise vocoding algorithm with 24 or 16 channels. The degraded stories were still fully intelligible. Following each story, participants were asked to repeat the story in as much detail as possible. Recall was scored using a modified idea unit scoring approach, which included separately scoring hierarchical levels of narrative detail. Results Memory for acoustically degraded stories was significantly worse than for normal stories at some levels of narrative detail. Older adults’ memory for the stories was significantly worse overall, but there was no interaction between age and acoustic clarity or level of narrative detail. Verbal working memory (assessed by reading span) significantly correlated with recall accuracy for both young and older adults, whereas hearing ability (better ear pure-tone average) did not. Conclusion Our findings are consistent with a framework in which the additional cognitive demands caused by a degraded acoustic signal use resources that would otherwise be available for memory encoding for both young and older adults. Verbal working memory is a likely candidate for supporting both of these processes. PMID:26683044
Chyl, Katarzyna; Kossowski, Bartosz; Dębska, Agnieszka; Łuniewska, Magdalena; Banaszkiewicz, Anna; Żelechowska, Agata; Frost, Stephen J; Mencl, William Einar; Wypych, Marek; Marchewka, Artur; Pugh, Kenneth R; Jednoróg, Katarzyna
Literacy acquisition is a demanding process that induces significant changes in the brain, especially in the spoken and written language networks. Nevertheless, large-scale paediatric fMRI studies are still limited. We analyzed fMRI data to show how individual differences in reading performance correlate with brain activation for speech and print in 111 children attending kindergarten or first grade and examined group differences between a matched subset of emergent-readers and prereaders. Across the entire cohort, individual differences analysis revealed that reading skill was positively correlated with the magnitude of activation difference between words and symbol strings in left superior temporal, inferior frontal and fusiform gyri. Group comparisons of the matched subset of pre- and emergent-readers showed higher activity for emergent-readers in left inferior frontal, precentral, and postcentral gyri. Individual differences in activation for natural versus vocoded speech were also positively correlated with reading skill, primarily in the left temporal cortex. However, in contrast to studies on adult illiterates, group comparisons revealed higher activity in prereaders compared to readers in the frontal lobes. Print-speech coactivation was observed only in readers and individual differences analyses revealed a positive correlation between convergence and reading skill in the left superior temporal sulcus. These results emphasise that a child's brain undergoes several modifications to both visual and oral language systems in the process of learning to read. They also suggest that print-speech convergence is a hallmark of acquiring literacy. © 2017 Association for Child and Adolescent Mental Health.
Full Text Available One of the major problems concerning the evolution of human language is to understand how sounds became associated to meaningful gestures. It has been proposed that the circuit controlling gestures and speech evolved from a circuit involved in the control of arm and mouth movements related to ingestion. This circuit contributed to the evolution of spoken language, moving from a system of communication based on arm gestures. The discovery of the mirror neurons has provided strong support for the gestural theory of speech origin because they offer a natural substrate for the embodiment of language and create a direct link between sender and receiver of a message. Behavioural studies indicate that manual gestures are linked to mouth movements used for syllable emission. Grasping with the hand selectively affected movement of inner or outer parts of the mouth according to syllable pronunciation and hand postures, in addition to hand actions, influenced the control of mouth grasp and vocalization. Gestures and words are also related to each other. It was found that when producing communicative gestures (emblems the intention to interact directly with a conspecific was transferred from gestures to words, inducing modification in voice parameters. Transfer effects of the meaning of representational gestures were found on both vocalizations and meaningful words. It has been concluded that the results of our studies suggest the existence of a system relating gesture to vocalization which was precursor of a more general system reciprocally relating gesture to word.
Saigusa, Hideto; Yamaguchi, Satoshi; Nakamura, Tsuyoshi; Komachi, Taro; Kadosono, Osamu; Ito, Hiroyuki; Saigusa, Makoto; Niimi, Seiji
Amyotrophic lateral sclerosis (ALS) is a progressive debilitating neurological disease. ALS disturbs the quality of life by affecting speech, swallowing and free mobility of the arms without affecting intellectual function. It is therefore of significance to improve intelligibility and quality of speech sounds, especially for ALS patients with slowly progressive courses. Currently, however, there is no effective or established approach to improve speech disorder caused by ALS. We investigated a surgical procedure to improve speech disorder for some patients with neuromuscular diseases with velopharyngeal closure incompetence. In this study, we performed the surgical procedure for two patients suffering from severe speech disorder caused by slowly progressing ALS. The patients suffered from speech disorder with hypernasality and imprecise and weak articulation during a 6-year course (patient 1) and a 3-year course (patient 2) of slowly progressing ALS. We narrowed bilateral lateral palatopharyngeal wall at velopharyngeal port, and performed this surgery under general anesthesia without muscle relaxant for the two patients. Postoperatively, intelligibility and quality of their speech sounds were greatly improved within one month without any speech therapy. The patients were also able to generate longer speech phrases after the surgery. Importantly, there was no serious complication during or after the surgery. In summary, we performed bilateral narrowing of lateral palatopharyngeal wall as a speech surgery for two patients suffering from severe speech disorder associated with ALS. With this technique, improved intelligibility and quality of speech can be maintained for longer duration for the patients with slowly progressing ALS.
Shahsavarani, Somayeh Bahar
High-level, top-down information such as linguistic knowledge is a salient cortical resource that influences speech perception under most listening conditions. But, are all listeners able to exploit these resources for speech facilitation to the same extent? It was found that children with cochlear implants showed different patterns of benefit from contextual information in speech perception compared with their normal-haring peers. Previous studies have discussed the role of non-acoustic factors such as linguistic and cognitive capabilities to account for this discrepancy. Given the fact that the amount of acoustic information encoded and processed by auditory nerves of listeners with cochlear implants differs from normal-hearing listeners and even varies across individuals with cochlear implants, it is important to study the interaction of specific acoustic properties of the speech signal with contextual cues. This relationship has been mostly neglected in previous research. In this dissertation, we aimed to explore how different acoustic dimensions interact to affect listeners' abilities to combine top-down information with bottom-up information in speech perception beyond the known effects of linguistic and cognitive capacities shown previously. Specifically, the present study investigated whether there were any distinct context effects based on the resolution of spectral versus slowly-varying temporal information in perception of spectrally impoverished speech. To that end, two experiments were conducted. In both experiments, a noise-vocoded technique was adopted to generate spectrally-degraded speech to approximate acoustic cues delivered to listeners with cochlear implants. The frequency resolution was manipulated by varying the number of frequency channels. The temporal resolution was manipulated by low-pass filtering of amplitude envelope with varying low-pass cutoff frequencies. The stimuli were presented to normal-hearing native speakers of American
Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.
Kovacevic, Branko; Veinović, Mladen; Marković, Milan
This book focuses on speech signal phenomena, presenting a robustification of the usual speech generation models with regard to the presumed types of excitation signals, which is equivalent to the introduction of a class of nonlinear models and the corresponding criterion functions for parameter estimation. Compared to the general class of nonlinear models, such as various neural networks, these models possess good properties of controlled complexity, the option of working in “online” mode, as well as a low information volume for efficient speech encoding and transmission. Providing comprehensive insights, the book is based on the authors’ research, which has already been published, supplemented by additional texts discussing general considerations of speech modeling, linear predictive analysis and robust parameter estimation.
Schalling, Ellika; Hartelius, Lena
Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.
Gopi, E S
Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.
Zheng, Zane; Cusack, Rhodri; Johnsrude, Ingrid
The everyday act of speaking involves the complex processes of speech motor control. One important feature of such control is regulation of articulation when auditory concomitants of speech do not correspond to the intended motor gesture. While theoretical accounts of speech monitoring posit...... multiple functional components required for detection of errors in speech planning (e.g., Levelt, 1983), neuroimaging studies generally indicate either single brain regions sensitive to speech production errors, or small, discrete networks. Here we demonstrate that the complex system controlling speech...... is supported by a complex neural network that is involved in linguistic, motoric and sensory processing. With the aid of novel real-time acoustic analyses and representational similarity analyses of fMRI signals, our data show functionally differentiated networks underlying auditory feedback control of speech....
Bailly, G.; Theune, Mariet; Meijs, Koen; Campbell, N.; Hamza, W.; Heylen, Dirk K.J.; Ordelman, Roeland J.F.; Hoge, H.; Jianhua, T.
Work on expressive speech synthesis has long focused on the expression of basic emotions. In recent years, however, interest in other expressive styles has been increasing. The research presented in this paper aims at the generation of a storytelling speaking style, which is suitable for storytelling applications and more in general, for applications aimed at children. Based on an analysis of human storytellers' speech, we designed and implemented a set of prosodic rules for converting "neutr...
Lalonde, Kaylah; Holt, Rachael Frush
This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.
Wijayasiri, Pramudi; Hartley, Douglas E H; Wiggins, Ian M
The purpose of this study was to establish whether functional near-infrared spectroscopy (fNIRS), an emerging brain-imaging technique based on optical principles, is suitable for studying the brain activity that underlies effortful listening. In an event-related fNIRS experiment, normally-hearing adults listened to sentences that were either clear or degraded (noise vocoded). These sentences were presented simultaneously with a non-speech distractor, and on each trial participants were instructed to attend either to the speech or to the distractor. The primary region of interest for the fNIRS measurements was the left inferior frontal gyrus (LIFG), a cortical region involved in higher-order language processing. The fNIRS results confirmed findings previously reported in the functional magnetic resonance imaging (fMRI) literature. Firstly, the LIFG exhibited an elevated response to degraded versus clear speech, but only when attention was directed towards the speech. This attention-dependent increase in frontal brain activation may be a neural marker for effortful listening. Secondly, during attentive listening to degraded speech, the haemodynamic response peaked significantly later in the LIFG than in superior temporal cortex, possibly reflecting the engagement of working memory to help reconstruct the meaning of degraded sentences. The homologous region in the right hemisphere may play an equivalent role to the LIFG in some left-handed individuals. In conclusion, fNIRS holds promise as a flexible tool to examine the neural signature of effortful listening. Copyright © 2017 Elsevier B.V. All rights reserved.
... Your Speech 249 Appendix D- How About the ROI? 265 Index 269 ... 11182$ CNTS 01-19-05 10:15:57 PS PAGE vi AcknowledgmentsAcknowledgments In the general chatter after a major speech given by a client ...
Sandor, Aniko; Moses, Haifa
Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.
An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...
It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the
Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.
Full Text Available The learning-based speech recovery approach using statistical spectral conversion has been used for some kind of distorted speech as alaryngeal speech and body-conducted speech (or bone-conducted speech. This approach attempts to recover clean speech (undistorted speech from noisy speech (distorted speech by converting the statistical models of noisy speech into that of clean speech without the prior knowledge on characteristics and distributions of noise source. Presently, this approach has still not attracted many researchers to apply in general noisy speech enhancement because of some major problems: those are the difficulties of noise adaptation and the lack of noise robust synthesizable features in different noisy environments. In this paper, we adopted the methods of state-of-the-art voice conversions and speaker adaptation in speech recognition to the proposed speech recovery approach applied in different kinds of noisy environment, especially in adverse environments with joint compensation of additive and convolutive noises. We proposed to use the decorrelated wavelet packet coefficients as a low-dimensional robust synthesizable feature under noisy environments. We also proposed a noise adaptation for speech recovery with the eigennoise similar to the eigenvoice in voice conversion. The experimental results showed that the proposed approach highly outperformed traditional nonlearning-based approaches.
Meijers, A.W.M.; Tsohatzidis, S.L.
From its early development in the 1960s, speech act theory always had an individualistic orientation. It focused exclusively on speech acts performed by individual agents. Paradigmatic examples are ‘I promise that p’, ‘I order that p’, and ‘I declare that p’. There is a single speaker and a single
Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…
Kane, Peter E., Ed.
The 11 articles in this collection deal with theoretical and practical freedom of speech issues. The topics covered are (1) the United States Supreme Court and communication theory; (2) truth, knowledge, and a democratic respect for diversity; (3) denial of freedom of speech in Jock Yablonski's campaign for the presidency of the United Mine…
Shearer, William M.
Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…
Kane, Peter E., Ed.
This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds…
Benesty, Jacob; Christensen, Mads Græsbøll; Jensen, Jesper Rindom
In this paper, we provide an overview of some recently introduced principles and ideas for speech enhancement with linear filtering and explore how these are related and how they can be used in various applications. This is done in a general framework where the speech enhancement problem is stated......-to-noise ratio (SNR), and Wiener filters are derived from the conventional speech enhancement approach and the recently introduced orthogonal decomposition approach. For each of the filters, we derive their properties in terms of output SNR and speech distortion. We then demonstrate how the ideas can be applied...
Objective The Internet provides the general public with information about speech pathology services, including client groups and service delivery models, as well as the professionals providing the services. Although this information assists the general public and other professionals to both access and understand speech pathology services, it also potentially provides information about speech pathology as a prospective career, including the types of people who are speech pathologists (i.e. demographics). The aim of the present study was to collect baseline data on how the speech pathology profession was presented via images on the Internet. Methods A pilot prospective observational study using content analysis methodology was conducted to analyse publicly available Internet images related to the speech pathology profession. The terms 'Speech Pathology' and 'speech pathologist' to represent both the profession and the professional were used, resulting in the identification of 200 images. These images were considered across a range of areas, including who was in the image (e.g. professional, client, significant other), the technology used and the types of intervention. Results The majority of images showed both a client and a professional (i.e. speech pathologist). While the professional was predominantly presented as female, the gender of the client was more evenly distributed. The clients were more likely to be preschool or school aged, however male speech pathologists were presented as providing therapy to selected age groups (i.e. school aged and younger adults). Images were predominantly of individual therapy and the few group images that were presented were all paediatric. Conclusion Current images of speech pathology continue to portray narrow professional demographics and client groups (e.g. paediatrics). Promoting images of wider scope to fully represent the depth and breadth of speech pathology professional practice may assist in attracting a more diverse
O'Connell, Daniel C.; Kowal, Sabine; Sabin, Edward J.; Lamia, John F.; Dannevik, Margaret
Our purpose in the following was to investigate the start-up rhetoric employed by U.S. President Barack Obama in his speeches. The initial 5 min from eight of his speeches from May to September of 2009 were selected for their variety of setting, audience, theme, and purpose. It was generally hypothesized that Barack Obama, widely recognized for…
The general topic addressed by this dissertation is that of bilingualism, and more specifically, the topic of bilingual acquisition of speech sounds. The central question in this study is the following: does bilingualism affect children’s perceptual development of speech sounds? The term bilingual
Menegueti, Katia Ignacio; Mangilli, Laura Davison; Alonso, Nivaldo; Andrade, Claudia Regina Furquim de
To characterize the profile and speech characteristics of patients undergoing primary palatoplasty in a Brazilian university hospital, considering the time of intervention (early, before two years of age; late, after two years of age). Participants were 97 patients of both genders with cleft palate and/or cleft and lip palate, assigned to the Speech-language Pathology Department, who had been submitted to primary palatoplasty and presented no prior history of speech-language therapy. Patients were divided into two groups: early intervention group (EIG) - 43 patients undergoing primary palatoplasty before 2 years of age and late intervention group (LIG) - 54 patients undergoing primary palatoplasty after 2 years of age. All patients underwent speech-language pathology assessment. The following parameters were assessed: resonance classification, presence of nasal turbulence, presence of weak intraoral air pressure, presence of audible nasal air emission, speech understandability, and compensatory articulation disorder (CAD). At statistical significance level of 5% (p≤0.05), no significant difference was observed between the groups in the following parameters: resonance classification (p=0.067); level of hypernasality (p=0.113), presence of nasal turbulence (p=0.179); presence of weak intraoral air pressure (p=0.152); presence of nasal air emission (p=0.369), and speech understandability (p=0.113). The groups differed with respect to presence of compensatory articulation disorders (p=0.020), with the LIG presenting higher occurrence of altered phonemes. It was possible to assess the general profile and speech characteristics of the study participants. Patients submitted to early primary palatoplasty present better speech profile.
Başkent, Deniz; Gaudrain, Etienne
Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level
Novelli-Olmstead, Tina; Ling, Daniel
Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…
Hwee Ling eLee
Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.
Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A
Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
Edeal, Denice Michelle; Gildersleeve-Neumann, Christina Elke
Purpose: This study explores the importance of production frequency during speech therapy to determine whether more practice of speech targets leads to increased performance within a treatment session, as well as to motor learning, in the form of generalization to untrained words. Method: Two children with childhood apraxia of speech were treated…
Hussmann, Katja; Grande, Marion; Meffert, Elisabeth; Christoph, Swetlana; Piefke, Martina; Willmes, Klaus; Huber, Walter
Although generally accepted as an important part of aphasia assessment, detailed analysis of spontaneous speech is rarely carried out in clinical practice mostly due to time limitations. The Aachener Sprachanalyse (ASPA; Aachen Speech Analysis) is a computer-assisted method for the quantitative analysis of German spontaneous speech that allows for…
Cole, Roberta; And Others
The 16-unit language and speech improvement handbook for kindergarten and first grade students contains an introductory section which includes a discussion of the child's developmental speech and language characteristics, a sound development chart, a speech and hearing language screening test, the Henja articulation test, and a general outline of…
Harley, Trevor A.
Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…
Mario T Carreon
Full Text Available This paper discusses the Speech and Phoneme Recognition as an Educational Aid for the Deaf and Hearing Impaired (SPREAD application and the ongoing research on its deployment as a tool for motivating deaf and hearing impaired students to learn and appreciate speech. This application uses the Sphinx-4 voice recognition system to analyze the vocalization of the student and provide prompt feedback on their pronunciation. The packaging of the application as an interactive game aims to provide additional motivation for the deaf and hearing impaired student through visual motivation for them to learn and appreciate speech.
Binderup, Lars Grassme
, as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...
Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter
to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences...... between Steve Jobs and Mark Zuckerberg and the investor- and customer-related sections of their speeches support the modern understanding of charisma as a gradual, multiparametric, and context-sensitive concept....
Vích, Robert; Vondra, Martin
Vol. 4775, - (2007), s. 129-137 ISSN 0302-9743. [COST Action 2102 International Workshop. Vietri sul Mare, 29.03.2007-31.03.2007] R&D Projects: GA AV ČR(CZ) 1ET301710509 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech * speech processing * cepstral analysis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.302, year: 2005
Coppens-Hofman, Marjolein C.; Terband, Hayo R.; Maassen, Ben A. M.; Lantman-De Valk, Henny M. J. van Schrojenstein; Hof, Yvonne Van Zaalen-op't; Snik, Ad F. M.
Background: In individuals with an intellectual disability, speech dysfluencies are more common than in the general population. In clinical practice, these fluency disorders are generally diagnosed and treated as stuttering rather than cluttering. Purpose: To characterise the type of dysfluencies in
Speech errors are generally said to accommodate to their new phonological context. This accommodation has been validated by several transcription studies. The transcription methodology is not the best choice for detecting errors at this level, however, as this type of error can be difficult to perceive. This paper presents an acoustic analysis of…
Willems, R.M.; Özyürek, A.; Hagoort, P.
Although generally studied in isolation, action observation and speech comprehension go hand in hand during everyday human communication. That is, people gesture while they speak. From previous research it is known that a tight link exists between spoken language and such hand gestures. This study
Locke, J L; Kutz, K J
Thirty kindergarteners, 15 who substituted /w/ for /r/ and 15 with correct articulation, received two perception tests and a memory test that included /w/ and /r/ in minimally contrastive syllables. Although both groups had nearly perfect perception of the experimenter's productions of /w/ and /r/, misarticulating subjects perceived their own tape-recorded w/r productions as /w/. In the memory task these same misarticulating subjects committed significantly more /w/-/r/ confusions in unspoken recall. The discussion considers why people subvocally rehearse; a developmental period in which children do not rehearse; ways subvocalization may aid recall, including motor and acoustic encoding; an echoic store that provides additional recall support if subjects rehearse vocally, and perception of self- and other- produced phonemes by misarticulating children-including its relevance to a motor theory of perception. Evidence is presented that speech for memory can be sufficiently impaired to cause memory disorder. Conceptions that restrict speech disorder to an impairment of communication are challenged.
Jørgensen, Søren; Dau, Torsten
The speech-based envelope power spectrum model (sEPSM; ) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...
Signoret, Carine; Johnsrude, Ingrid; Classon, Elisabet; Rudner, Mary
The perceptual clarity of speech is influenced by more than just the acoustic quality of the sound; it also depends on contextual support. For example, a degraded sentence is perceived to be clearer when the content of the speech signal is provided with matching text (i.e., form-based predictability) before hearing the degraded sentence. Here, we investigate whether sentence-level semantic coherence (i.e., meaning-based predictability), enhances perceptual clarity of degraded sentences, and if so, whether the mechanism is the same as that underlying enhancement by matching text. We also ask whether form- and meaning-based predictability are related to individual differences in cognitive abilities. Twenty participants listened to spoken sentences that were either clear or degraded by noise vocoding and rated the clarity of each item. The sentences had either high or low semantic coherence. Each spoken word was preceded by the homologous printed word (matching text), or by a meaningless letter string (nonmatching text). Cognitive abilities were measured with a working memory test. Results showed that perceptual clarity was significantly enhanced both by matching text and by semantic coherence. Importantly, high coherence enhanced the perceptual clarity of the degraded sentences even when they were preceded by matching text, suggesting that the effects of form- and meaning-based predictions on perceptual clarity are independent and additive. However, when working memory capacity indexed by the Size-Comparison Span Test was controlled for, only form-based predictions enhanced perceptual clarity, and then only at some sound quality levels, suggesting that prediction effects are to a certain extent dependent on cognitive abilities. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Freitas, João; Dias, Miguel Sales; Silva, Samuel
This book provides a broad and comprehensive overview of the existing technical approaches in the area of silent speech interfaces (SSI), both in theory and in application. Each technique is described in the context of the human speech production process, allowing the reader to clearly understand the principles behind SSI in general and across different methods. Additionally, the book explores the combined use of different data sources, collected from various sensors, in order to tackle the limitations of simpler SSI approaches, addressing current challenges of this field. The book also provides information about existing SSI applications, resources and a simple tutorial on how to build an SSI.
Van Engen, Kristin J
This study investigated whether clear speech reduces the cognitive demands of lexical competition by crossing speaking style with lexical difficulty. Younger and older adults identified more words in clear versus conversational speech and more easy words than hard words. An initial analysis suggested that the effect of lexical difficulty was reduced in clear speech, but more detailed analyses within each age group showed this interaction was significant only for older adults. The results also showed that both groups improved over the course of the task and that clear speech was particularly helpful for individuals with poorer hearing: for younger adults, clear speech eliminated hearing-related differences that affected performance on conversational speech. For older adults, clear speech was generally more helpful to listeners with poorer hearing. These results suggest that clear speech affords perceptual benefits to all listeners and, for older adults, mitigates the cognitive challenge associated with identifying words with many phonological neighbors.
Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie
This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
Peretz, Isabelle; Vuvan, Dominique; Lagrois, Marie-Élaine; Armony, Jorge L
Neural overlap in processing music and speech, as measured by the co-activation of brain regions in neuroimaging studies, may suggest that parts of the neural circuitries established for language may have been recycled during evolution for musicality, or vice versa that musicality served as a springboard for language emergence. Such a perspective has important implications for several topics of general interest besides evolutionary origins. For instance, neural overlap is an important premise for the possibility of music training to influence language acquisition and literacy. However, neural overlap in processing music and speech does not entail sharing neural circuitries. Neural separability between music and speech may occur in overlapping brain regions. In this paper, we review the evidence and outline the issues faced in interpreting such neural data, and argue that converging evidence from several methodologies is needed before neural overlap is taken as evidence of sharing. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Peretz, Isabelle; Vuvan, Dominique; Lagrois, Marie-Élaine; Armony, Jorge L.
Neural overlap in processing music and speech, as measured by the co-activation of brain regions in neuroimaging studies, may suggest that parts of the neural circuitries established for language may have been recycled during evolution for musicality, or vice versa that musicality served as a springboard for language emergence. Such a perspective has important implications for several topics of general interest besides evolutionary origins. For instance, neural overlap is an important premise for the possibility of music training to influence language acquisition and literacy. However, neural overlap in processing music and speech does not entail sharing neural circuitries. Neural separability between music and speech may occur in overlapping brain regions. In this paper, we review the evidence and outline the issues faced in interpreting such neural data, and argue that converging evidence from several methodologies is needed before neural overlap is taken as evidence of sharing. PMID:25646513
Lewis, James R
Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application
Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar
Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...
Sandor, A.; Moses, H. R.
Currently on the International Space Station (ISS) and other space vehicles Caution & Warning (C&W) alerts are represented with various auditory tones that correspond to the type of event. This system relies on the crew's ability to remember what each tone represents in a high stress, high workload environment when responding to the alert. Furthermore, crew receive a year or more in advance of the mission that makes remembering the semantic meaning of the alerts more difficult. The current system works for missions conducted close to Earth where ground operators can assist as needed. On long duration missions, however, they will need to work off-nominal events autonomously. There is evidence that speech alarms may be easier and faster to recognize, especially during an off-nominal event. The Information Presentation Directed Research Project (FY07-FY09) funded by the Human Research Program included several studies investigating C&W alerts. The studies evaluated tone alerts currently in use with NASA flight deck displays along with candidate speech alerts. A follow-on study used four types of speech alerts to investigate how quickly various types of auditory alerts with and without a speech component - either at the beginning or at the end of the tone - can be identified. Even though crew were familiar with the tone alert from training or direct mission experience, alerts starting with a speech component were identified faster than alerts starting with a tone. The current study replicated the results from the previous study in a more rigorous experimental design to determine if the candidate speech alarms are ready for transition to operations or if more research is needed. Four types of alarms (caution, warning, fire, and depressurization) were presented to participants in both tone and speech formats in laboratory settings and later in the Human Exploration Research Analog (HERA). In the laboratory study, the alerts were presented by software and participants were
The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...
Huijbregts, M.A.H.; de Jong, Franciska M.G.
In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because
section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...
Coppens-Hofman, Marjolein C; Terband, Hayo R; Maassen, Ben A M; van Schrojenstein Lantman-De Valk, Henny M J; van Zaalen-op't Hof, Yvonne; Snik, Ad F M
In individuals with an intellectual disability, speech dysfluencies are more common than in the general population. In clinical practice, these fluency disorders are generally diagnosed and treated as stuttering rather than cluttering. To characterise the type of dysfluencies in adults with intellectual disabilities and reported speech difficulties with an emphasis on manifestations of stuttering and cluttering, which distinction is to help optimise treatment aimed at improving fluency and intelligibility. The dysfluencies in the spontaneous speech of 28 adults (18-40 years; 16 men) with mild and moderate intellectual disabilities (IQs 40-70), who were characterised as poorly intelligible by their caregivers, were analysed using the speech norms for typically developing adults and children. The speakers were subsequently assigned to different diagnostic categories by relating their resulting dysfluency profiles to mean articulatory rate and articulatory rate variability. Twenty-two (75%) of the participants showed clinically significant dysfluencies, of which 21% were classified as cluttering, 29% as cluttering-stuttering and 25% as clear cluttering at normal articulatory rate. The characteristic pattern of stuttering did not occur. The dysfluencies in the speech of adults with intellectual disabilities and poor intelligibility show patterns that are specific for this population. Together, the results suggest that in this specific group of dysfluent speakers interventions should be aimed at cluttering rather than stuttering. The reader will be able to (1) describe patterns of dysfluencies in the speech of adults with intellectual disabilities that are specific for this group of people, (2) explain that a high rate of dysfluencies in speech is potentially a major determiner of poor intelligibility in adults with ID and (3) describe suggestions for intervention focusing on cluttering rather than stuttering in dysfluent speakers with ID. Copyright © 2013 Elsevier Inc
Full Text Available This fMRI study examines shared and distinct cortical areas involved in the auditory perception of song and speech at the level of their underlying constituents: words, pitch and rhythm. Univariate and multivariate analyses were performed on the brain activity patterns of six conditions, arranged in a subtractive hierarchy: sung sentences including words, pitch and rhythm; hummed speech prosody and song melody containing only pitch patterns and rhythm; as well as the pure musical or speech rhythm.Systematic contrasts between these balanced conditions following their hierarchical organization showed a great overlap between song and speech at all levels in the bilateral temporal lobe, but suggested a differential role of the inferior frontal gyrus (IFG and intraparietal sulcus (IPS in processing song and speech. The left IFG was involved in word- and pitch-related processing in speech, the right IFG in processing pitch in song.Furthermore, the IPS showed sensitivity to discrete pitch relations in song as opposed to the gliding pitch in speech. Finally, the superior temporal gyrus and premotor cortex coded for general differences between words and pitch patterns, irrespective of whether they were sung or spoken. Thus, song and speech share many features which are reflected in a fundamental similarity of brain areas involved in their perception. However, fine-grained acoustic differences on word and pitch level are reflected in the activity of IFG and IPS.
Proper speech functioning in human being, depends on the precise coordination and timing balances in a series of complex neuro nuscular movements and actions. Starting from the prime organ of energy source of expelled air from respirato y system; deliver such air to trigger vocal cords; swift changes of this phonatory episode to a comprehensible sound in RESONACE and final coordination of all head and neck structures to elicit final speech in ...
The paper contains a transcript of a speech by the chairman of the UKAEA, to mark the publication of the 1985/6 annual report. The topics discussed in the speech include: the Chernobyl accident and its effect on public attitudes to nuclear power, management and disposal of radioactive waste, the operation of UKAEA as a trading fund, and the UKAEA development programmes. The development programmes include work on the following: fast reactor technology, thermal reactors, reactor safety, health and safety aspects of water cooled reactors, the Joint European Torus, and under-lying research. (U.K.)
Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars
Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech. The ....... The artwork is presented at the Re:New festival in May 2008....
Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas
This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a similar level of monaural
Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo
The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained
Yoon, Heakyung C.; Loftness, Vivian
Lack of speech privacy has been reported to be the main dissatisfaction among occupants in open workplaces, according to workplace surveys. Two speech privacy measurements, Articulation Index (AI), standardized by the American National Standards Institute in 1969, and Speech Privacy Noise Isolation Class (NIC', Noise Isolation Class Prime), adapted from Noise Isolation Class (NIC) by U. S. General Services Administration (GSA) in 1979, have been claimed as objective tools to measure speech privacy in open offices. To evaluate which of them, normal privacy for AI or satisfied privacy for NIC', is a better tool in terms of speech privacy in a dynamic open office environment, measurements were taken in the field. AIs and NIC's in the different partition heights and workplace configurations have been measured following ASTM E1130 (Standard Test Method for Objective Measurement of Speech Privacy in Open Offices Using Articulation Index) and GSA test PBS-C.1 (Method for the Direct Measurement of Speech-Privacy Potential (SPP) Based on Subjective Judgments) and PBS-C.2 (Public Building Service Standard Method of Test Method for the Sufficient Verification of Speech-Privacy Potential (SPP) Based on Objective Measurements Including Methods for the Rating of Functional Interzone Attenuation and NC-Background), respectively.
The welcoming speech underlines the fact that any validation process starting with calculation methods and ending with studies on the long-term behaviour of a repository system can only be effected through laboratory, field and natural-analogue studies. The use of natural analogues (NA) is to secure the biosphere and to verify whether this safety really exists. (HP) [de
Ekström, Seth-Reino; Borg, Erik
The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (Ptempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.
Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.
Kane, Peter E., Ed.
The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…
Full Text Available Listeners must accomplish two complementary perceptual feats in extracting a message from speech. They must discriminate linguistically-relevant acoustic variability and generalize across irrelevant variability. Said another way, they must categorize speech. Since the mapping of acoustic variability is language-specific, these categories must be learned from experience. Thus, understanding how, in general, the auditory system acquires and represents categories can inform us about the toolbox of mechanisms available to speech perception. This perspective invites consideration of findings from cognitive neuroscience literatures outside of the speech domain as a means of constraining models of speech perception. Although neurobiological models of speech perception have mainly focused on cerebral cortex, research outside the speech domain is consistent with the possibility of significant subcortical contributions in category learning. Here, we review the functional role of one such structure, the basal ganglia. We examine research from animal electrophysiology, human neuroimaging, and behavior to consider characteristics of basal ganglia processing that may be advantageous for speech category learning. We also present emerging evidence for a direct role for basal ganglia in learning auditory categories in a complex, naturalistic task intended to model the incidental manner in which speech categories are acquired. To conclude, we highlight new research questions that arise in incorporating the broader neuroscience research literature in modeling speech perception, and suggest how understanding contributions of the basal ganglia can inform attempts to optimize training protocols for learning non-native speech categories in adulthood.
Stephen Gichuhi Kimotho
Full Text Available Ethnicity in Kenya permeates all spheres of life. However, it is in politics that ethnicity is most visible. Election time in Kenya often leads to ethnic competition and hatred, often expressed through various media. Ethnic hate speech characterized the 2007 general elections in party rallies and through text messages, emails, posters and leaflets. This resulted in widespread skirmishes that left over 1200 people dead, and many displaced (KNHRC, 2008. In 2013, however, the new battle zone was the war of words on social media platform. More than any other time in Kenyan history, Kenyans poured vitriolic ethnic hate speech through digital media like Facebook, tweeter and blogs. Although scholars have studied the role and effects of the mainstream media like television and radio in proliferating the ethnic hate speech in Kenya (Michael Chege, 2008; Goldstein & Rotich, 2008a; Ismail & Deane, 2008; Jacqueline Klopp & Prisca Kamungi, 2007, little has been done in regard to social media. This paper investigated the nature of digitized hate speech by: describing the forms of ethnic hate speech on social media in Kenya; the effects of ethnic hate speech on Kenyan’s perception of ethnic entities; ethnic conflict and ethics of citizen journalism. This study adopted a descriptive interpretive design, and utilized Austin’s Speech Act Theory, which explains use of language to achieve desired purposes and direct behaviour (Tarhom & Miracle, 2013. Content published between January and April 2013 from six purposefully identified blogs was analysed. Questionnaires were used to collect data from university students as they form a good sample of Kenyan population, are most active on social media and are drawn from all parts of the country. Qualitative data were analysed using NVIVO 10 software, while responses from the questionnaire were analysed using IBM SPSS version 21. The findings indicated that Facebook and Twitter were the main platforms used to
Rasmussen, Morten Højfeldt; Tan, Zheng-Hua; Lindberg, Børge
While miscue detection in general is a well explored research field little attention has so far been paid to miscue detection in dyslexic read speech. This domain differs substantially from the domains that are commonly researched, as for example dyslexic read speech includes frequent regressions...... that the system detects miscues at a false alarm rate of 5.3% and a miscue detection rate of 40.1%. These results are worse than current state of the art reading tutors perhaps indicating that dyslexic read speech is a challenge to handle....
Kimotho, Stephen Gichuhi; Nyaga, Rahab Njeri
Ethnicity in Kenya permeates all spheres of life. However, it is in politics that ethnicity is most visible. Election time in Kenya often leads to ethnic competition and hatred, often expressed through various media. Ethnic hate speech characterized the 2007 general elections in party rallies and through text messages, emails, posters and…
Full Text Available The Nobel Peace Prize has long been considered the premier peace prize in the world. According to Geir Lundestad, Secretary of the Nobel Committee, of the 300 some peace prizes awarded worldwide, “none is in any way as well known and as highly respected as the Nobel Peace Prize” (Lundestad, 2001. Nobel peace speech is a unique and significant international site of public discourse committed to articulating the universal grammar of peace. Spanning over 100 years of sociopolitical history on the world stage, Nobel Peace Laureates richly represent an important cross-section of domestic and international issues increasingly germane to many publics. Communication scholars’ interest in this rhetorical genre has increased in the past decade. Yet, the norm has been to analyze a single speech artifact from a prestigious or controversial winner rather than examine the collection of speeches for generic commonalities of import. In this essay, we analyze the discourse of Nobel peace speech inductively and argue that the organizing principle of the Nobel peace speech genre is the repetitive form of normative liberal principles and values that function as rhetorical topoi. These topoi include freedom and justice and appeal to the inviolable, inborn right of human beings to exercise certain political and civil liberties and the expectation of equality of protection from totalitarian and tyrannical abuses. The significance of this essay to contemporary communication theory is to expand our theoretical understanding of rhetoric’s role in the maintenance and development of an international and cross-cultural vocabulary for the grammar of peace.
Full Text Available Of all the issues in the general theory of language usage, speech act theory has probably aroused the widest interest. Psychologists, forexample, have suggested that the acquisition of the concepts underlying speech acts may be a prerequisite for the acquisition of language in general,literary critics have looked to speech act theory for an illumination of textual subtleties or for an understanding of the nature of literary genres,anthropologists have hoped to find in the theory some account of the nature of magical incantations, philosophers have seen potential applications to,amongst other things, the status of ethical statements, while linguists have seen the notions of speech act theory as variously applicable to problemsin syntax, semantics, second language learning, and elsewhere.
This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.
Bebout, L; Arthur, B
Speech-language pathologists who serve clients from cultural backgrounds that are not familiar to them may encounter culturally influenced attitudinal differences. A questionnaire with statements about 4 speech disorders (dysfluency, cleft pallet, speech of the deaf, and misarticulations) was given to a focus group of Chinese Americans and a comparison group of non-Chinese Americans. The focus group was much more likely to believe that persons with speech disorders could improve their own speech by "trying hard," was somewhat more likely to say that people who use deaf speech and people with cleft palates might be "emotionally disturbed," and generally more likely to view deaf speech as a limitation. The comparison group was more pessimistic about stuttering children's acceptance by their peers than was the focus group. The two subject groups agreed about other items, such as the likelihood that older children with articulation problems are "less intelligent" than their peers.
Full Text Available One receives much more information through a visual sense than through a tactile one. However, most visual aids for hearing-impaired persons are not wearable because it is difficult to make them compact and it is not a best way to mask always their vision.Generally it is difficult to get the integrated patterns by a single mathematical transform of signals, such as a Foruier transform. In order to obtain the integrated pattern speech parameters should be carefully extracted by an analysis according as each parameter, and a visual pattern, which can intuitively be understood by anyone, must be synthesized from them. Successful integration of speech parameters will never disturb understanding of individual features, so that the system can be used for speech training and communication.
Igor V. Nefedov
Full Text Available National radio, like television, is called upon to bring to the masses not only relevant information, but also a high culture of language. There were always serious demands to oral public speech from the point of view of the correctness and uniformity of the pronunciation. However, today the analysis of the language practice of broadcasting often indicates a discrepancy between the use of linguistic resources in existing literary norms. The author of the article from the end of December 2016 to early April 2017 listened and analyzed from the point of view of language correctness the majority of programs on the radio Komsomolskaya Pravda (KP. In general, recognizing the good speech qualification of the workers of this radio, as well as their «guests» (political scientists, lawyers, historians, etc., one can not but note the presence of a significant number of errors in their speech. The material presented in the article allows us to conclude that at present, broadcasting is losing its position in the field of speech culture. Neglect of the rules of the Russian language on the radio «Komsomolskaya Pravda» negatively affects the image of the Russian language, which is formed in the minds of listeners. The language of radio should strive to become a standard of cleanliness and high culture for the population, since it has the enormous power of mass impact and supports the unity of the cultural and linguistic space.
Gordon, Peter C.
Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.
van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.
Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…
Gallardo, L.F.; Möller, S.; Beerends, J.
The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility
Kirrie J Ballard
Full Text Available A critical examination of speech motor control depends on an in-depth understanding of network connectivity associated with Brodmann areas 44 and 45 and surrounding cortices. Damage to these areas has been associated with two conditions - the speech motor programming disorder apraxia of speech (AOS and the linguistic / grammatical disorder of Broca’s aphasia. Here we focus on AOS, which is most commonly associated with damage to posterior Broca's area and adjacent cortex. We provide an overview of our own studies into the nature of AOS, including behavioral and neuroimaging methods, to explore components of the speech motor network that are associated with normal and disordered speech motor programming in AOS. Behavioral, neuroimaging, and computational modeling studies are indicating that AOS is associated with impairment in learning feedforward models and/or implementing feedback mechanisms and with the functional contribution of BA6. While functional connectivity methods are not yet routinely applied to the study of AOS, we highlight the need for focusing on the functional impact of localised lesions throughout the speech network, as well as larger scale comparative studies to distinguish the unique behavioral and neurological signature of AOS. By coupling these methods with neural network models, we have a powerful set of tools to improve our understanding of the neural mechanisms that underlie AOS, and speech production generally.
Pedro Tiago Martins
Full Text Available There is still no categorical answer for why humans, and no other species, have speech, or why speech is the way it is. Several purely anatomical arguments have been put forward, but they have been shown to be false, biologically implausible, or of limited scope. This perspective paper supports the idea that evolutionary theories of speech could benefit from a focus on the cognitive mechanisms that make speech possible, for which antecedents in evolutionary history and brain correlates can be found. This type of approach is part of a very recent, but rapidly growing tradition, which has provided crucial insights on the nature of human speech by focusing on the biological bases of vocal learning. Here, we call attention to what might be an important ingredient for speech. We contend that a general mechanism of attention, which manifests itself not only in visual but also auditory (and possibly other modalities, might be one of the key pieces of human speech, in addition to the mechanisms underlying vocal learning, and the pairing of facial gestures with vocalic units.
Rusz, Jan; Benova, Barbora; Ruzickova, Hana; Novotny, Michal; Tykalova, Tereza; Hlavnicka, Jan; Uher, Tomas; Vaneckova, Manuela; Andelova, Michaela; Novotna, Klara; Kadrnozkova, Lucie; Horakova, Dana
Motor speech disorders in multiple sclerosis (MS) are poorly understood and their quantitative, objective acoustic characterization remains limited. Additionally, little data regarding relationships between the severity of speech disorders and neurological involvement in MS, as well as the contribution of pyramidal and cerebellar functional systems on speech phenotypes, is available. Speech data were acquired from 141 MS patients with Expanded Disability Status Scale (EDSS) ranging from 1 to 6.5 and 70 matched healthy controls. Objective acoustic speech assessment including subtests on phonation, oral diadochokinesis, articulation and prosody was performed. The prevalence of dysarthria in our MS cohort was 56% while the severity was generally mild and primarily consisted of a combination of spastic and ataxic components. Prosodic-articulatory disorder presenting with monopitch, articulatory decay, excess loudness variations and slow rate was the most salient. Speech disorders reflected subclinical motor impairment with 78% accuracy in discriminating between a subgroup of asymptomatic MS (EDSS oral diadochokinesis and the 9-Hole Peg Test (r = - 0.65, p oral diadochokinesis and excess loudness variations significantly separated pure pyramidal and mixed pyramidal-cerebellar MS subgroups. Automated speech analyses may provide valuable biomarkers of disease progression in MS as dysarthria represents common and early manifestation that reflects disease disability and underlying pyramidal-cerebellar pathophysiology. Copyright © 2017 Elsevier B.V. All rights reserved.
Juel Henrichsen, Peter
on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around......Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...... the speech technology challenge, they have formulated a number of joint questions and new requirements to be met by suppliers and have deliberately worked towards formulating tendering material which will allow fair competition. Public researchers have contributed to this work, including the author...
Kevin Michael Trewartha
Full Text Available It has been argued that speech production errors are monitored by the same neural system involved in monitoring other types of action errors. Behavioral evidence has shown that speech errors can be detected and corrected prior to articulation, yet the neural basis for such pre-articulatory speech error monitoring is poorly understood. The current study investigated speech error monitoring using a phoneme-substitution task known to elicit speech errors. Stimulus-locked event-related potential (ERP analyses comparing correct and incorrect utterances were used to assess pre-articulatory error monitoring and response-locked ERP analyses were used to assess post-articulatory monitoring. Our novel finding in the stimulus-locked analysis revealed that words that ultimately led to a speech error were associated with a larger P2 component at midline sites (FCz, Cz, and CPz. This early positivity may reflect the detection of an error in speech formulation, or a predictive mechanism to signal the potential for an upcoming speech error. The data also revealed that general conflict monitoring mechanisms are involved during this task as both correct and incorrect responses elicited an anterior N2 component typically associated with conflict monitoring. The response-locked analyses corroborated previous observations that self-produced speech errors led to a fronto-central ERN. These results demonstrate that speech errors can be detected prior to articulation, and that speech error monitoring relies on a central error monitoring mechanism.
Guddattu, Vasudeva; Krishna, Y.
The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...
Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A
The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg
Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole
Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.
McClain, Matthew; Romanowski, Brian
Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Full Text Available AbstractObjectiveMany tinnitus patients complain about difficulties regarding speech comprehension. In spite of the high clinical relevance little is known about underlying mechanisms and predisposing factors. Here, we performed an exploratory investigation in a large sample of tinnitus patients to (1 estimate the prevalence of speech comprehension difficulties among tinnitus patients, to (2 compare subjective reports of speech comprehension difficulties with objective measurements in a standardized speech comprehension test and to (3 explore underlying mechanisms by analyzing the relationship between speech comprehension difficulties and peripheral hearing function (pure tone audiogram, as well as with co-morbid hyperacusis as a central auditory processing disorder. Subjects and MethodsSpeech comprehension was assessed in 361 tinnitus patients presenting between 07/2012 and 08/2014 at the Interdisciplinary Tinnitus Clinic at the University of Regensburg. The assessment included standard audiological assessment (pure tone audiometry, tinnitus pitch and loudness matching, the Goettingen sentence test (in quiet for speech audiometric evaluation, two questions about hyperacusis, and two questions about speech comprehension in quiet and noisy environments (How would you rate your ability to understand speech?; How would you rate your ability to follow a conversation when multiple people are speaking simultaneously?. Results Subjectively reported speech comprehension deficits are frequent among tinnitus patients, especially in noisy environments (cocktail party situation. 74.2% of all investigated patients showed disturbed speech comprehension (indicated by values above 21.5 dB SPL in the Goettingen sentence test. Subjective speech comprehension complaints (both in general and in noisy environment were correlated with hearing level and with audiologically-assessed speech comprehension ability. In contrast, co-morbid hyperacusis was only correlated
Viswanathan, Navin; Kelty-Stephen, Damian G
Context effects are ubiquitous in speech perception and reflect the ability of human listeners to successfully perceive highly variable speech signals. In the study of how listeners compensate for coarticulatory variability, past studies have used similar effects speech and tone analogues of speech as strong support for speech-neutral, general auditory mechanisms for compensation for coarticulation. In this manuscript, we revisit compensation for coarticulation by replacing standard button-press responses with mouse-tracking responses and examining both standard geometric measures of uncertainty as well as newer information-theoretic measures that separate fast from slow mouse movements. We found that when our analyses were restricted to end-state responses, tones and speech contexts appeared to produce similar effects. However, a more detailed time-course analysis revealed systematic differences between speech and tone contexts such that listeners' responses to speech contexts, but not to tone contexts, changed across the experimental session. Analyses of the time course of effects within trials using mouse tracking indicated that speech contexts elicited fewer x-position flips but more area under the curve (AUC) and maximum deviation (MD), and they did so in the slower portions of mouse-tracking movements. Our results indicate critical differences between the time course of speech and nonspeech context effects and that general auditory explanations, motivated by their apparent similarity, be reexamined.
Charlop, M H; Milstein, J P
We assessed the effects of video modeling on acquisition and generalization of conversational skills among autistic children. Three autistic boys observed videotaped conversations consisting of two people discussing specific toys. When criterion for learning was met, generalization of conversational skills was assessed with untrained topics of conversation; new stimuli (toys); unfamiliar persons, siblings, and autistic peers; and other settings. The results indicated that the children learned through video modeling, generalized their conversational skills, and maintained conversational speech over a 15-month period. Video modeling shows much promise as a rapid and effective procedure for teaching complex verbal skills such as conversational speech. PMID:2793634
Rodriguez Moreno, Diana; Schiff, Nicholas D; Hirsch, Joy
Speech comprehension studies have generally focused on the isolation and function of regions with positive blood oxygen level dependent (BOLD) signals with respect to a resting baseline. Although regions with negative BOLD signals in comparison to a resting baseline have been reported in language-related tasks, their relationship to regions of positive signals is not fully appreciated. Based on the emerging notion that the negative signals may represent an active function in language tasks, the authors test the hypothesis that negative BOLD signals during receptive language are more associated with comprehension than content-free versions of the same stimuli. Regions associated with comprehension of speech were isolated by comparing responses to passive listening to natural speech to two incomprehensible versions of the same speech: one that was digitally time reversed and one that was muffled by removal of high frequencies. The signal polarity was determined by comparing the BOLD signal during each speech condition to the BOLD signal during a resting baseline. As expected, stimulation-induced positive signals relative to resting baseline were observed in the canonical language areas with varying signal amplitudes for each condition. Negative BOLD responses relative to resting baseline were observed primarily in frontoparietal regions and were specific to the natural speech condition. However, the BOLD signal remained indistinguishable from baseline for the unintelligible speech conditions. Variations in connectivity between brain regions with positive and negative signals were also specifically related to the comprehension of natural speech. These observations of anticorrelated signals related to speech comprehension are consistent with emerging models of cooperative roles represented by BOLD signals of opposite polarity.
Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading. © 2015 American Society of Law, Medicine & Ethics, Inc.
Tan, Zheng-Hua; Lindberg, Børge
in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within......The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...
Dorman, Michael F; Natale, Sarah Cook; Butts, Austin M; Zeitler, Daniel M; Carlson, Matthew L
The goal of the present study was to assess the sound quality of a cochlear implant for single-sided deaf (SSD) patients fit with a cochlear implant (CI). One of the fundamental, unanswered questions in CI research is "what does an implant sound like?" Conventional CI patients must use the memory of a clean signal, often decades old, to judge the sound quality of their CIs. In contrast, SSD-CI patients can rate the similarity of a clean signal presented to the CI ear and candidate, CI-like signals presented to the ear with normal hearing. For Experiment 1 four types of stimuli were created for presentation to the normal hearing ear: noise vocoded signals, sine vocoded signals, frequency shifted, sine vocoded signals and band-pass filtered, natural speech signals. Listeners rated the similarity of these signals to unmodified signals sent to the CI on a scale of 0 to 10 with 10 being a complete match to the CI signal. For Experiment 2 multitrack signal mixing was used to create natural speech signals that varied along multiple dimensions. In Experiment 1 for eight adult SSD-CI listeners, the best median similarity rating to the sound of the CI for noise vocoded signals was 1.9; for sine vocoded signals 2.9; for frequency upshifted signals, 1.9; and for band pass filtered signals, 5.5. In Experiment 2 for three young listeners, combinations of band pass filtering and spectral smearing lead to ratings of 10. The sound quality of noise and sine vocoders does not generally correspond to the sound quality of cochlear implants fit to SSD patients. Our preliminary conclusion is that natural speech signals that have been muffled to one degree or another by band pass filtering and/or spectral smearing provide a close, but incomplete, match to CI sound quality for some patients.
Full Text Available Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests.Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study.Forty-four listeners aged between 50-74 years with mild SNHL were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet, to medium (digit triplet perception in speech-shaped noise to high (sentence perception in modulated noise; cognitive tests of attention, memory, and nonverbal IQ; and self-report questionnaires of general health-related and hearing-specific quality of life.Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that auditory environments pose on
Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.
Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests. Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study. Forty-four listeners aged between 50 and 74 years with mild sensorineural hearing loss were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet), to medium (digit triplet perception in speech-shaped noise) to high (sentence perception in modulated noise); cognitive tests of attention, memory, and non-verbal intelligence quotient; and self-report questionnaires of general health-related and hearing-specific quality of life. Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that
; speech-to-speech translation; language identiﬁcation. ... interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers.
Full Text Available Increasingly, speech and language pathologists (SLPs around the world are faced with the unique set of issues presented by their bilingual clients. Some professional associations in different countries have presented recommendations when assessing and treating bilingual populations. In children, most of the studies have focused on intervention for language and phonology/ articulation impairments and very few focus on stuttering. In general, studies of language intervention tend to agree that intervention in the first language (L1 either increase performance on L2 or does not hinder it. In bilingual adults, monolingual versus bilingual intervention is especially relevant in cases of aphasia; dysarthria in bilinguals has been barely approached. Most studies of cross-linguistic effects in bilingual aphasics have focused on lexical retrieval training. It has been noted that even though a majority of studies have disclosed a cross-linguistic generalization from one language to the other, some methodological weaknesses are evident. It is concluded that even though speech and language intervention in bilinguals represents a most important clinical area in speech language pathology, much more research using larger samples and controlling for potentially confounding variables is evidently required.
Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam
Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.
Ossewaarde, Roelant; Jonkers, Roel; Jalvingh, Fedor; Bastiaanse, Yvonne
Measurement of speech parameters in casual speech of dementia patients Roelant Adriaan Ossewaarde1,2, Roel Jonkers1, Fedor Jalvingh1,3, Roelien Bastiaanse1 1CLCG, University of Groningen (NL); 2HU University of Applied Sciences Utrecht (NL); 33St. Marienhospital - Vechta, Geriatric Clinic Vechta
Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas
Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Rosenblum, Lawrence D.
Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...
Malmasi, Shervin; Zampieri, Marcos
In this study, we approach the problem of distinguishing general profanity from hate speech in social media, something which has not been widely considered. Using a new dataset annotated specifically for this task, we employ supervised classification along with a set of features that includes ?-grams, skip-grams and clustering-based word representations. We apply approaches based on single classifiers as well as more advanced ensemble classifiers and stacked generalisation, achieving the best result of ? accuracy for this 3-class classification task. Analysis of the results reveals that discriminating hate speech and profanity is not a simple task, which may require features that capture a deeper understanding of the text not always possible with surface ?-grams. The variability of gold labels in the annotated data, due to differences in the subjective adjudications of the annotators, is also an issue. Other directions for future work are discussed.
... digital filtering for noise cancellation which interfaces to speech recognition software. It uses auditory features in speech recognition training, and provides applications to multilingual spoken language translation...
Teaching Speech Acts
Full Text Available In this paper I argue that pragmatic ability must become part of what we teach in the classroom if we are to realize the goals of communicative competence for our students. I review the research on pragmatics, especially those articles that point to the effectiveness of teaching pragmatics in an explicit manner, and those that posit methods for teaching. I also note two areas of scholarship that address classroom needs—the use of authentic data and appropriate assessment tools. The essay concludes with a summary of my own experience teaching speech acts in an advanced-level Portuguese class.
Parkinson, Michael G.; Dobkins, David H.
Using a computerized content analysis, the authors demonstrate changes in speech behaviors of prison inmates. They conclude that two to four hours of public speaking training can have only limited effect on students who live in a culture in which "prison speech" is the expected and rewarded form of behavior. (PD)
Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan
a method for underdetermined blind source separation of convolutive mixtures. The proposed framework is applicable for separation of instantaneous as well as convolutive speech mixtures. It is possible to iteratively extract each speech signal from the mixture by combining blind source separation...
Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) ﬁltering. Next, the frequency ...
Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.
Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…
This PhD thesis in human-computer interfaces (informatics) studies the case of the anaesthesia record used during medical operations and the possibility to supplement it with speech recognition facilities. Problems and limitations have been identified with the traditional paper-based anaesthesia...... and inaccuracies in the anaesthesia record. Supplementing the electronic anaesthesia record interface with speech input facilities is proposed as one possible solution to a part of the problem. The testing of the various hypotheses has involved the development of a prototype of an electronic anaesthesia record...... interface with speech input facilities in Danish. The evaluation of the new interface was carried out in a full-scale anaesthesia simulator. This has been complemented by laboratory experiments on several aspects of speech recognition for this type of use, e.g. the effects of noise on speech recognition...
May, Tobias; Dau, Torsten
A monaural speech segregation system is presented that estimates the ideal binary mask from noisy speech based on the supervised learning of amplitude modulation spectrogram (AMS) features. Instead of using linearly scaled modulation filters with constant absolute bandwidth, an auditory- inspired...... about speech activity present in neighboring time-frequency units. In order to evaluate the generalization performance of the system to unseen acoustic conditions, the speech segregation system is trained with a limited set of low signal-to-noise ratio (SNR) conditions, but tested over a wide range...
Pope, Diana S; Miller-Klein, Erik T
Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.
Diana S Pope
Full Text Available Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient′s bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s′ standard hospital construction and the other was newly refurbished (2013 with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.
Pope, Diana S.; Miller-Klein, Erik T.
Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959
Nijland, L.; Maassen, B.A.M.; Hulstijn, W.; Peters, H.F.M.
Developmental apraxia of speech (DAS) is generally classified as a 'speech motor' disorder. Direct measurement of articulatory movement is, however, virtually non-existent. In the present study we investigated the coordination between articulators in children with DAS using kinematic measurements.
Abuya, Eromosele John
The study was an examination through the pragma-stylistic approach to meaning of the linguistic acts that manifest in the Inaugural Speech of Goodluck Ebele Jonathan as the democratically elected president in May 2011 General Elections in Nigeria. Hence, the study focused on speech acts type of locution, illocutionary and perlocutionary in the…
Medhurst, Martin J.
Examines speech delivered by President Eisenhower to General Assembly of the United Nations in December 1953. Demonstrates how a complex rhetorical situation resulted in the crafting and exploitation of a public policy address. Speech bolstered international image of the United States as peacemaker, warned the Soviets against a preemptive nuclear…
Camarata, Stephen; Yoder, Paul; Camarata, Mary
Children with Down syndrome often display speech-comprehensibility and grammatical deficits beyond what would be predicted based upon general mental age. Historically, speech-comprehensibility has often been treated using traditional articulation therapy and oral-motor training so there may be little or no coordination of grammatical and…
Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.
According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…
Al-Ghazalli, Mehdi Falih; Al-Shammary, Mohanad A. Amert
The purpose of this paper is to investigate the pragmatic difficulties encountered by Iraqi EFL university students in producing the speech act of apology. Although the act of apology is easy to recognize or use by native speakers of English, non-native speakers generally encounter difficulties in discriminating one speech act from another. The…
...-language pathology services. 409.17 Section 409.17 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES..., and speech-language pathology services. (a) General rules. (1) Except as specified in this section, physical therapy, occupational therapy, or speech-language pathology services must be furnished by...
Jelinek, H. J.
This is the Final Report of Electronic Design Associates on its Phase I SBIR project. The purpose of this project is to develop a method for correcting helium speech, as experienced in diver-surface communication. The goal of the Phase I study was to design, prototype, and evaluate a real time helium speech corrector system based upon digital signal processing techniques. The general approach was to develop hardware (an IBM PC board) to digitize helium speech and software (a LAMBDA computer based simulation) to translate the speech. As planned in the study proposal, this initial prototype may now be used to assess expected performance from a self contained real time system which uses an identical algorithm. The Final Report details the work carried out to produce the prototype system. Four major project tasks were: a signal processing scheme for converting helium speech to normal sounding speech was generated. The signal processing scheme was simulated on a general purpose (LAMDA) computer. Actual helium speech was supplied to the simulation and the converted speech was generated. An IBM-PC based 14 bit data Input/Output board was designed and built. A bibliography of references on speech processing was generated.
Monson, Brian Bruce
While human speech and the human voice generate acoustical energy up to (and beyond) 20 kHz, the energy above approximately 5 kHz has been largely neglected. Evidence is accruing that this high-frequency energy contains perceptual information relevant to speech and voice, including percepts of quality, localization, and intelligibility. The present research was an initial step in the long-range goal of characterizing high-frequency energy in singing voice and speech, with particular regard for its perceptual role and its potential for modification during voice and speech production. In this study, a database of high-fidelity recordings of talkers was created and used for a broad acoustical analysis and general characterization of high-frequency energy, as well as specific characterization of phoneme category, voice and speech intensity level, and mode of production (speech versus singing) by high-frequency energy content. Directionality of radiation of high-frequency energy from the mouth was also examined. The recordings were used for perceptual experiments wherein listeners were asked to discriminate between speech and voice samples that differed only in high-frequency energy content. Listeners were also subjected to gender discrimination tasks, mode-of-production discrimination tasks, and transcription tasks with samples of speech and singing that contained only high-frequency content. The combination of these experiments has revealed that (1) human listeners are able to detect very subtle level changes in high-frequency energy, and (2) human listeners are able to extract significant perceptual information from high-frequency energy.
freedom of speech .1 In America, where freedom of speech is an inalienable right and civilian control of the military is a...military. 2 This paper will examine the Generals’ Revolt as well as review legal restrictions on the freedom of speech of retired officers. It will also...Spring of 2006 does not correct the problem, nor does it bring comfort to the families who lost servicemembers in Iraq. 8 Freedom of Speech
Hjortkjaer, Jens; Märcher-Rørsted, Jonatan; Fuglsang, Søren A; Dau, Torsten
Neuronal oscillations are thought to play an important role in working memory (WM) and speech processing. Listening to speech in real-life situations is often cognitively demanding but it is unknown whether WM load influences how auditory cortical activity synchronizes to speech features. Here, we developed an auditory n-back paradigm to investigate cortical entrainment to speech envelope fluctuations under different degrees of WM load. We measured the electroencephalogram, pupil dilations and behavioural performance from 22 subjects listening to continuous speech with an embedded n-back task. The speech stimuli consisted of long spoken number sequences created to match natural speech in terms of sentence intonation, syllabic rate and phonetic content. To burden different WM functions during speech processing, listeners performed an n-back task on the speech sequences in different levels of background noise. Increasing WM load at higher n-back levels was associated with a decrease in posterior alpha power as well as increased pupil dilations. Frontal theta power increased at the start of the trial and increased additionally with higher n-back level. The observed alpha-theta power changes are consistent with visual n-back paradigms suggesting general oscillatory correlates of WM processing load. Speech entrainment was measured as a linear mapping between the envelope of the speech signal and low-frequency cortical activity (level) decreased cortical speech envelope entrainment. Although entrainment persisted under high load, our results suggest a top-down influence of WM processing on cortical speech entrainment. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Stephen Gichuhi Kimotho; Rahab Njeri Nyaga
Ethnicity in Kenya permeates all spheres of life. However, it is in politics that ethnicity is most visible. Election time in Kenya often leads to ethnic competition and hatred, often expressed through various media. Ethnic hate speech characterized the 2007 general elections in party rallies and through text messages, emails, posters and leaflets. This resulted in widespread skirmishes that left over 1200 people dead, and many displaced (KNHRC, 2008). In 2013, however, the new battle zone wa...
Speech processing by human listeners derives meaning from acoustic input via intermediate steps involving abstract representations of what has been heard. Recent results from several lines of research are here brought together to shed light on the nature and role of these representations. In spoken-word recognition, representations of phonological form and of conceptual content are dissociable. This follows from the independence of patterns of priming for a word's form and its meaning. The nature of the phonological-form representations is determined not only by acoustic-phonetic input but also by other sources of information, including metalinguistic knowledge. This follows from evidence that listeners can store two forms as different without showing any evidence of being able to detect the difference in question when they listen to speech. The lexical representations are in turn separate from prelexical representations, which are also abstract in nature. This follows from evidence that perceptual learning about speaker-specific phoneme realization, induced on the basis of a few words, generalizes across the whole lexicon to inform the recognition of all words containing the same phoneme. The efficiency of human speech processing has its basis in the rapid execution of operations over abstract representations.
Davidow, Jason H; Grossman, Heather L; Edge, Robin L
Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.
Kent, Ray D; Vorperian, Houri K
This review summarizes research on disorders of speech production in Down syndrome (DS) for the purposes of informing clinical services and guiding future research. Review of the literature was based on searches using MEDLINE, Google Scholar, PsycINFO, and HighWire Press, as well as consideration of reference lists in retrieved documents (including online sources). Search terms emphasized functions related to voice, articulation, phonology, prosody, fluency, and intelligibility. The following conclusions pertain to four major areas of review: voice, speech sounds, fluency and prosody, and intelligibility. The first major area is voice. Although a number of studies have reported on vocal abnormalities in DS, major questions remain about the nature and frequency of the phonatory disorder. Results of perceptual and acoustic studies have been mixed, making it difficult to draw firm conclusions or even to identify sensitive measures for future study. The second major area is speech sounds. Articulatory and phonological studies show that speech patterns in DS are a combination of delayed development and errors not seen in typical development. Delayed (i.e., developmental) and disordered (i.e., nondevelopmental) patterns are evident by the age of about 3 years, although DS-related abnormalities possibly appear earlier, even in infant babbling. The third major area is fluency and prosody. Stuttering and/or cluttering occur in DS at rates of 10%-45%, compared with about 1% in the general population. Research also points to significant disturbances in prosody. The fourth major area is intelligibility. Studies consistently show marked limitations in this area, but only recently has the research gone beyond simple rating scales.
Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve
To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
Naidu, D.H.R.; Srinivasan, S.
Several speech enhancement approaches utilize trained models of clean speech data, such as codebooks, Gaussian mixtures, and hidden Markov models. These models are typically trained on neutral clean speech data, without any emotion. However, in practical scenarios, emotional speech is a common
Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc
Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…
Shannon, Robert V
Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. Copyright © 2016 Elsevier Ltd. All rights reserved.
Allen, Winfred G., Jr., Ed.
The Freedom of Speech Newsletter is the communication medium for the Freedom of Speech Interest Group of the Western Speech Communication Association. The newsletter contains such features as a statement of concern by the National Ad Hoc Committee Against Censorship; Reticence and Free Speech, an article by James F. Vickrey discussing the subtle…
Frankle, Christen M.
There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.
Frankle, Christen M.
There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.
Johnson, Micah K.; Lyu, Siwei; Farid, Hany
Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Loizou, Philipos C
With the proliferation of mobile devices and hearing devices, including hearing aids and cochlear implants, there is a growing and pressing need to design algorithms that can improve speech intelligibility without sacrificing quality. Responding to this need, Speech Enhancement: Theory and Practice, Second Edition introduces readers to the basic problems of speech enhancement and the various algorithms proposed to solve these problems. Updated and expanded, this second edition of the bestselling textbook broadens its scope to include evaluation measures and enhancement algorithms aimed at impr
Nešić Blagoje V.
Full Text Available On the territory of the northern part of Kosovo and Metohija (Kosovo municipalities Mitrovica, Zvecan, Leposavic and Zubin Potok a study is conducted in primary schools in order to determine the presence of speech-language disorders in children of early school age. Data were collected from the teachers of the third and fourth grades of primary schools in these municipalities (n = 36, which include a total number of 641 student. The results show that the number of children with speech and language disorders represented in the different municipalities of the research vary (the largest is in Leposavic, the smallest is in Zvecan, then 3/4 the total number of children with speech and language disorders are boys. It is also found that the speech-language disorders usually appear from the very beginning of schooling and that the examined teachers recognize 12 types of speech-language disorders in their students. Teachers recognize dyslexia as the most common speech-language disorder, while dysphasia and distortion are the least common, in the opinion of the teachers. The results show that children are generally accepted by their peers, but only during schooling; then, there is a difference in school success between children with speech and language disorders and children without any speech-language disorders. It also found that the teachers' work is generally not affected by the children with speech and language disorders, and that there is generally an intensive cooperation between teachers and parents of children with speech and language disorders. The research and the results on prevalence of speech-language disorders in children in northern Kosovo and Metohija can be considered as an important guidelines in future work.
Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.
Błeszyński, Jacek Jarosław
Speech of people with autism is recognised as one of the basic diagnostic, therapeutic and theoretical problems. One of the most common symptoms of autism in children is echolalia, described here as being of different types and severity. This paper presents the results of studies into different levels of echolalia, both in normally developing children and in children diagnosed with autism, discusses the differences between simple echolalia and echolalic speech - which can be considered to b...
tecture, are either wrapped natural-language processing ( NLP ) components or objects developed from scratch using the architecture’s API. GATE is...framework, we put together a demonstration Arabic -to- English speech translation system using both internally developed ( Arabic speech recognition and MT...conditions of our Arabic S2S demonstration system described earlier. Once again, the data size was varied and eighty identical requests were
Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C
Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E
Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Morgan, Lewis B.
This article focuses on speech mannerisms often employed by clients in a helping relationship. Eight mannerisms are presented and discussed, as well as possible interpretations. Suggestions are given to help counselors respond to them. (Author)
Carrier nature of speech; modulation spectrum; spectral dynamics ... the relationships between phonetic values of sounds and their short-term spectral envelopes .... the number of free parameters that need to be estimated from training data.
Veenendaal, Nathalie J; Groen, Margriet A; Verhoeven, Ludo
Text reading prosody has been associated with reading comprehension. However, text reading prosody is a reading-dependent measure that relies heavily on decoding skills. Investigation of the contribution of speech prosody - which is independent from reading skills - in addition to text reading prosody, to reading comprehension could provide more insight into the general role of prosody in reading comprehension. The current study investigates how much variance in reading comprehension scores is explained by speech prosody and text reading prosody, after controlling for decoding, vocabulary, and syntactic awareness. A battery of reading and language assessments was performed by 106 Dutch fourth-grade primary school children. Speech prosody was assessed using a storytelling task and text reading prosody by oral text reading performance. Decoding skills, vocabulary, syntactic awareness, and reading comprehension were assessed using standardized tests. Hierarchical regression analyses showed that text reading prosody explained 6% of variance and that speech prosody explained 8% of variance in reading comprehension scores, after controlling for decoding, vocabulary, and syntactic awareness. Phrasing was the significant factor in both speech and text reading. When added in consecutive order, phrasing in speech added 5% variance to phrasing in reading. In contrast, phrasing in reading added only 3% variance to phrasing in speech. The variance that speech prosody explained in reading comprehension scores should not be neglected. Speech prosody seems to facilitate the construction of meaning in written language. © 2014 The British Psychological Society.
Liss, Julie M.; Utianski, Rene; Lansford, Kaitlin
Background Rhythmic disturbances are a hallmark of motor speech disorders, in which the motor control deficits interfere with the outward flow of speech and by extension speech understanding. As the functions of rhythm are language-specific, breakdowns in rhythm should have language-specific consequences for communication. Objective The goals of this paper are to (i) provide a review of the cognitive- linguistic role of rhythm in speech perception in a general sense and crosslinguistically; (ii) present new results of lexical segmentation challenges posed by different types of dysarthria in American English, and (iii) offer a framework for crosslinguistic considerations for speech rhythm disturbances in the diagnosis and treatment of communication disorders associated with motor speech disorders. Summary This review presents theoretical and empirical reasons for considering speech rhythm as a critical component of communication deficits in motor speech disorders, and addresses the need for crosslinguistic research to explore language-universal versus language-specific aspects of motor speech disorders. PMID:24157596
Roche, Laura; Sigafoos, Jeff; Lancioni, Giulio E; O'Reilly, Mark F; Schlosser, Ralf W; Stevens, Michelle; van der Meer, Larah; Achmadi, Donna; Kagohara, Debora; James, Ruth; Carnett, Amarie; Hodis, Flaviu; Green, Vanessa A; Sutherland, Dean; Lang, Russell; Rispoli, Mandy; Machalicek, Wendy; Marschik, Peter B
Children with neurodevelopmental disorders often present with little or no speech. Augmentative and alternative communication (AAC) aims to promote functional communication using non-speech modes, but it might also influence natural speech production. To investigate this possibility, we provided AAC intervention to two boys with neurodevelopmental disorders and severe communication impairment. Intervention focused on teaching the boys to use a tablet computer-based speech-generating device (SGD) to request preferred stimuli. During SGD intervention, both boys began to utter relevant single words. In an effort to induce more speech, and investigate the relation between SGD availability and natural speech production, the SGD was removed during some requesting opportunities. With intervention, both participants learned to use the SGD to request preferred stimuli. After learning to use the SGD, both participants began to respond more frequently with natural speech when the SGD was removed. The results suggest that a rehabilitation program involving initial SGD intervention, followed by subsequent withdrawal of the SGD, might increase the frequency of natural speech production in some children with neurodevelopmental disorders. This effect could be an example of response generalization. Copyright © 2014 ISDN. Published by Elsevier Ltd. All rights reserved.
Full Text Available In previous studies, we found that the social interactions infants experience in their everyday lives at 11- and 14-months of age affect language ability at 24 months of age. These studies investigated relationships between the speech style (i.e., parentese speech vs. standard speech and social context [i.e., one-on-one (1:1 vs. group] of language input in infancy and later speech development (i.e., at 24 months of age, controlling for socioeconomic status (SES. Results showed that the amount of exposure to parentese speech-1:1 in infancy was related to productive vocabulary at 24 months. The general goal of the present study was to investigate changes in (1 the pattern of social interactions between caregivers and their children from infancy to childhood and (2 relationships among speech style, social context, and language learning across time. Our study sample consisted of 30 participants from the previously published infant studies, evaluated at 33 months of age. Social interactions were assessed at home using digital first-person perspective recordings of the auditory environment. We found that caregivers use less parentese speech-1:1, and more standard speech-1:1, as their children get older. Furthermore, we found that the effects of parentese speech-1:1 in infancy on later language development at 24 months persist at 33 months of age. Finally, we found that exposure to standard speech-1:1 in childhood was the only social interaction that related to concurrent word production/use. Mediation analyses showed that standard speech-1:1 in childhood fully mediated the effects of parentese speech-1:1 in infancy on language development in childhood, controlling for SES. This study demonstrates that engaging in one-on-one interactions in infancy and later in life has important implications for language development.
This study asks how speakers adjust their speech to their addressees, focusing on the potential roles of cognitive representations such as partner models, automatic processes such as interactive alignment, and social processes such as interactional negotiation. The nature of addressee orientation......, psycholinguistics and conversation analysis, and offers both overviews of child-directed, foreigner-directed and robot-directed speech and in-depth analyses of the processes involved in adjusting to a communication partner....
The article shows the differences between the speech etiquette of different peoples. The most important thing is to find a common language with this or that interlocutor. Knowledge of national etiquette, national character helps to learn the principles of speech of another nation. The article indicates in which cases certain forms of etiquette considered acceptable. At the same time, the rules of etiquette emphasized in the conduct of a dialogue in official meetings and for example, in the ex...
What happens to a person who speaks out about corruption in their organization, and finds themselves excluded from their profession? In this article, I argue that whistleblowers experience exclusions because they have engaged in ‘impossible speech’, that is, a speech act considered to be unacceptable or illegitimate. Drawing on Butler’s theories of recognition and censorship, I show how norms of acceptable speech working through recruitment practices, alongside the actions of colleagues, can ...
Rehana, Ridha; Silitonga, Sortha
One aim of this article is to show through a concrete example how speech function and speech role used in movie. The illustrative example is taken from the dialogue of Up movie. Central to the analysis proper form of dialogue on Up movie that contain of speech function and speech role; i.e. statement, offer, question, command, giving, and demanding. 269 dialogue were interpreted by actor, and it was found that the use of speech function and speech role.
Michael P. Towey
Full Text Available Much of speech therapy involves the clinician guiding the therapeutic process (e.g., presenting stimuli and eliciting client responses; however, this Brief Communication describes a different approach to speech therapy delivery. Clinicians at Waldo County General Hospital (WCGH use high definition audio and video to engage clients in telepractice using interactive web-based virtual environments. This technology enables clients and their clinicians to co-create salient treatment activities using authentic materials captured via digital cameras, video and/or curricular materials. Both therapists and clients manipulate the materials and interact online in real-time. The web-based technology engenders highly personalized and engaging activities, such that clients’ interactions with these high interest tasks often continue well beyond the therapy sessions.
Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T
Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.
M. Yu. Bobylova
Full Text Available There has been recently an increase in the number of children diagnosed with delayed speech development. There is delay compensation with age, but mild deficiency often remains for life. Delayed speech development is more common in boys than in girls. Its etiology is unknown in most cases, so a child should be followed up to make an accurate diagnosis. Genetic predisposition or environmental factors frequently influence speech development. The course of its delays is various. In the history of a number of disorders (childhood disintegrative disorder, Landau–Kleffner syndrome, there is evidence for the normal development of speech to a certain period and then stops or even regresses. By way of comparison, there are generally speech developmental changes in autism even during the preverbal stage (a complex of revival fails to form; babbling is poor, low emotional, gibberish; at the same time, the baby recipes whole phrases without using them to communicate. These speech disorders are considered not only as a delay, but also as a developmental abnormality. Speech disorders in children should be diagnosed as early as possible in order to initiative corrective measures in time. In this case, a physician makes a diagnosis and a special education teacher does corrective work. The successful collaboration and mutual understanding of the specialists in these areas will determine quality of life for a child in the future. This paper focusses on the terminology and classification of delays, which are necessary for physicians and teachers to speak the same language.
Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn
To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The
Moore, Brian C J; Tyler, Lorraine K; Marslen-Wilson, William
Spoken language communication is arguably the most important activity that distinguishes humans from non-human species. This paper provides an overview of the review papers that make up this theme issue on the processes underlying speech communication. The volume includes contributions from researchers who specialize in a wide range of topics within the general area of speech perception and language processing. It also includes contributions from key researchers in neuroanatomy and functional neuro-imaging, in an effort to cut across traditional disciplinary boundaries and foster cross-disciplinary interactions in this important and rapidly developing area of the biological and cognitive sciences.
in cross-language and second language speech perception research: The mapping issue (the perceptual relationship of sounds of the native and the nonnative language in the mind of the native listener and the L2 learner), the perceptual and learning difficulty/ease issue (how this relationship may or may...... not cause perceptual and learning difficulty), and the plasticity issue (whether and how experience with the nonnative language affects the perceptual organization of speech sounds in the mind of L2 learners). One important general conclusion from this research is that perceptual learning is possible at all...
Elmahdy, Mohamed; Minker, Wolfgang
Novel Techniques for Dialectal Arabic Speech describes approaches to improve automatic speech recognition for dialectal Arabic. Since speech resources for dialectal Arabic speech recognition are very sparse, the authors describe how existing Modern Standard Arabic (MSA) speech data can be applied to dialectal Arabic speech recognition, while assuming that MSA is always a second language for all Arabic speakers. In this book, Egyptian Colloquial Arabic (ECA) has been chosen as a typical Arabic dialect. ECA is the first ranked Arabic dialect in terms of number of speakers, and a high quality ECA speech corpus with accurate phonetic transcription has been collected. MSA acoustic models were trained using news broadcast speech. In order to cross-lingually use MSA in dialectal Arabic speech recognition, the authors have normalized the phoneme sets for MSA and ECA. After this normalization, they have applied state-of-the-art acoustic model adaptation techniques like Maximum Likelihood Linear Regression (MLLR) and M...
A. Goedegebure (Andre)
textabstractHearing-aid users often continue to have problems with poor speech understanding in difficult acoustical conditions. Another generally accounted problem is that certain sounds become too loud whereas other sounds are still not audible. Dynamic range compression is a signal processing
The results demonstrate that in general, NF proved more effective than NNF. With regard to individual processes of connected speech, NF was more effective in production, whereas no such effect was found for perception.
Larm, Petra; Hongisto, Valtteri
During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.
Full Text Available In most of the world, people have regular exposure to multiple accents. Therefore, learning to quickly process accented speech is a prerequisite to successful communication. In this paper, we examine work on the perception of accented speech across the lifespan, from early infancy to late adulthood. Unfamiliar accents initially impair linguistic processing by infants, children, younger adults, and older adults, but listeners of all ages come to adapt to accented speech. Emergent research also goes beyond these perceptual abilities, by assessing links with production and the relative contributions of linguistic knowledge and general cognitive skills. We conclude by underlining points of convergence across ages, and the gaps left to face in future work.
Full Text Available The current study explored the correlation between speakers' Eysenck personality traits and speech spectrum parameters. Forty-six subjects completed the Eysenck Personality Questionnaire. They were instructed to verbally answer the questions shown on a computer screen and their responses were recorded by the computer. Spectrum parameters of /sh/ and /i/ were analyzed by Praat voice software. Formant frequencies of the consonant /sh/ in lying responses were significantly lower than that in truthful responses, whereas no difference existed on the vowel /i/ speech spectrum. The second formant bandwidth of the consonant /sh/ speech spectrum was significantly correlated with the personality traits of Psychoticism, Extraversion, and Neuroticism, and the correlation differed between truthful and lying responses, whereas the first formant frequency of the vowel /i/ speech spectrum was negatively correlated with Neuroticism in both response types. The results suggest that personality characteristics may be conveyed through the human voice, although the extent to which these effects are due to physiological differences in the organs associated with speech or to a general Pygmalion effect is yet unknown.
Full Text Available The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA. We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC and the hidden Markov Model Toolkit (HTK are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.
Full Text Available The automatic recognition of foreign-accented Arabic speech is a challenging task since it involves a large number of nonnative accents. As well, the nonnative speech data available for training are generally insufficient. Moreover, as compared to other languages, the Arabic language has sparked a relatively small number of research efforts. In this paper, we are concerned with the problem of nonnative speech in a speaker independent, large-vocabulary speech recognition system for modern standard Arabic (MSA. We analyze some major differences at the phonetic level in order to determine which phonemes have a significant part in the recognition performance for both native and nonnative speakers. Special attention is given to specific Arabic phonemes. The performance of an HMM-based Arabic speech recognition system is analyzed with respect to speaker gender and its native origin. The WestPoint modern standard Arabic database from the language data consortium (LDC and the hidden Markov Model Toolkit (HTK are used throughout all experiments. Our study shows that the best performance in the overall phoneme recognition is obtained when nonnative speakers are involved in both training and testing phases. This is not the case when a language model and phonetic lattice networks are incorporated in the system. At the phonetic level, the results show that female nonnative speakers perform better than nonnative male speakers, and that emphatic phonemes yield a significant decrease in performance when they are uttered by both male and female nonnative speakers.
Botha, Hugo; Duffy, Joseph R; Whitwell, Jennifer L; Strand, Edythe A; Machulda, Mary M; Spychalla, Anthony J; Tosakulwong, Nirubol; Senjem, Matthew L; Knopman, David S; Petersen, Ronald C; Jack, Clifford R; Lowe, Val J; Josephs, Keith A
In recent years a large and growing body of research has greatly advanced our understanding of primary progressive apraxia of speech. Handedness has emerged as one potential marker of selective vulnerability in degenerative diseases. This study evaluated the clinical and imaging findings in non-right handed compared to right handed participants in a prospective cohort diagnosed with primary progressive apraxia of speech. A total of 30 participants were included. Compared to the expected rate in the population, there was a higher prevalence of non-right handedness among those with primary progressive apraxia of speech (6/30, 20%). Small group numbers meant that these results did not reach statistical significance, although the effect sizes were moderate-to-large. There were no clinical differences between right handed and non-right handed participants. Bilateral hypometabolism was seen in primary progressive apraxia of speech compared to controls, with non-right handed participants showing more right hemispheric involvement. This is the first report of a higher rate of non-right handedness in participants with isolated apraxia of speech, which may point to an increased vulnerability for developing this disorder among non-right handed participants. This challenges prior hypotheses about a relative protective effect of non-right handedness for tau-related neurodegeneration. We discuss potential avenues for future research to investigate the relationship between handedness and motor disorders more generally. Copyright © 2018 Elsevier B.V. All rights reserved.
Ou, Jinghua; Law, Sam-Po; Fung, Roxana
A growing body of research has suggested that cognitive abilities may play a role in individual differences in speech processing. The present study took advantage of a widespread linguistic phenomenon of sound change to systematically assess the relationships between speech processing and various components of attention and working memory in the auditory and visual modalities among typically developed Cantonese-speaking individuals. The individual variations in speech processing are captured in an ongoing sound change-tone merging in Hong Kong Cantonese, in which typically developed native speakers are reported to lose the distinctions between some tonal contrasts in perception and/or production. Three groups of participants were recruited, with a first group of good perception and production, a second group of good perception but poor production, and a third group of good production but poor perception. Our findings revealed that modality-independent abilities of attentional switching/control and working memory might contribute to individual differences in patterns of speech perception and production as well as discrimination latencies among typically developed speakers. The findings not only have the potential to generalize to speech processing in other languages, but also broaden our understanding of the omnipresent phenomenon of language change in all languages.
Partanen, Eino; Kujala, Teija; Näätänen, Risto; Liitola, Auli; Sambeth, Anke; Huotilainen, Minna
Learning, the foundation of adaptive and intelligent behavior, is based on plastic changes in neural assemblies, reflected by the modulation of electric brain responses. In infancy, auditory learning implicates the formation and strengthening of neural long-term memory traces, improving discrimination skills, in particular those forming the prerequisites for speech perception and understanding. Although previous behavioral observations show that newborns react differentially to unfamiliar sounds vs. familiar sound material that they were exposed to as fetuses, the neural basis of fetal learning has not thus far been investigated. Here we demonstrate direct neural correlates of human fetal learning of speech-like auditory stimuli. We presented variants of words to fetuses; unlike infants with no exposure to these stimuli, the exposed fetuses showed enhanced brain activity (mismatch responses) in response to pitch changes for the trained variants after birth. Furthermore, a significant correlation existed between the amount of prenatal exposure and brain activity, with greater activity being associated with a higher amount of prenatal speech exposure. Moreover, the learning effect was generalized to other types of similar speech sounds not included in the training material. Consequently, our results indicate neural commitment specifically tuned to the speech features heard before birth and their memory representations.
Arslan, Seçkin; Bamyacı, Elif; Bastiaanse, Roelien
This study investigates the characteristics of narrative-speech production and the use of verbs in Turkish agrammatic speakers (n = 10) compared to non-brain-damaged controls (n = 10). To elicit narrative-speech samples, personal interviews and storytelling tasks were conducted. Turkish has a large and regular verb inflection paradigm where verbs are inflected for evidentiality (i.e. direct versus indirect evidence available to the speaker). Particularly, we explored the general characteristics of the speech samples (e.g. utterance length) and the uses of lexical, finite and non-finite verbs and direct and indirect evidentials. The results show that speech rate is slow, verbs per utterance are lower than normal and the verb diversity is reduced in the agrammatic speakers. Verb inflection is relatively intact; however, a trade-off pattern between inflection for direct evidentials and verb diversity is found. The implications of the data are discussed in connection with narrative-speech production studies on other languages.
Mehta, G.; Cutler, A.
Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ
Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav
A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).
Fuglsang, Søren; Dau, Torsten; Hjortkjær, Jens
Selectively attending to one speaker in a multi-speaker scenario is thought to synchronize low-frequency cortical activity to the attended speech signal. In recent studies, reconstruction of speech from single-trial electroencephalogram (EEG) data has been used to decode which talker a listener...... is attending to in a two-talker situation. It is currently unclear how this generalizes to more complex sound environments. Behaviorally, speech perception is robust to the acoustic distortions that listeners typically encounter in everyday life, but it is unknown whether this is mirrored by a noise......-robust neural tracking of attended speech. Here we used advanced acoustic simulations to recreate real-world acoustic scenes in the laboratory. In virtual acoustic realities with varying amounts of reverberation and number of interfering talkers, listeners selectively attended to the speech stream...
Kavé, Gitit; Goral, Mira
We conducted a comprehensive literature review of studies of word retrieval in connected speech in healthy aging and reviewed relevant aphasia research that could shed light on the aging literature. Four main hypotheses guided the review: (1) Significant retrieval difficulties would lead to reduced output in connected speech. (2) Significant retrieval difficulties would lead to a more limited lexical variety in connected speech. (3) Significant retrieval difficulties would lead to an increase in word substitution errors and in pronoun use as well as to greater dysfluency and hesitation in connected speech. (4) Retrieval difficulties on tests of single-word production would be associated with measures of word retrieval in connected speech. Studies on aging did not confirm these four hypotheses, unlike studies on aphasia that generally did. The review suggests that future research should investigate how context facilitates word production in old age.
Brännström, K Jonas; Lantz, Johannes; Nielsen, Lars Holme
reported results from American studies. Generally, significant differences were seen between test conditions using different types of noise within ears in each population. Significant differences were seen for ANL across populations, also when the non-semantic ISTS was used as speech signal. Conclusions......Abstract Objective: Acceptable noise level (ANL) has been established as a method to quantify the acceptance of background noise while listening to speech presented at the most comfortable level. The aim of the present study was to generate Danish, Swedish, and a non-semantic version of the ANL...... test and investigate normal-hearing Danish and Swedish subjects' performance on these tests. Design: ANL was measured using Danish and Swedish running speech with two different noises: Speech-weighted amplitude-modulated noise, and multitalker speech babble. ANL was also measured using the non...
E. V. Zhulina
Full Text Available This article presents with the author's understanding of approaches to studying a child's external speech and its influence on formation of his or her mentality. The authors defin structure of this phenomenon, components and levels of those parts of mental activity whose development is directly dependent on the external speech. The authors show in theoretical aspect the structure of assimilation of oral speech includs three main subsystems of regulation (speech, emotional, communicative regulation which are based on resources of an individual and are tightly integrated, creating a specific pattern of regulation. Because of the structure of a child's mentality in some cases the delay of the external speech leads to affective and communicative violations that, in turn, negatively affect personal development. Without rendering early help, a deviation in development becomes more noticeable, affect all spheres of mentality, communication, social and psychological adaptation of f child in general.
Ramirez, J.; Gorriz, J. M.; Segura, J. C.
This chapter has shown an overview of the main challenges in robust speech detection and a review of the state of the art and applications. VADs are frequently used in a number of applications including speech coding, speech enhancement and speech recognition. A precise VAD extracts a set of discriminative speech features from the noisy speech and formulates the decision in terms of well defined rule. The chapter has summarized three robust VAD methods that yield high speech/non-speech discri...
Iuzzini-Seigel, Jenya; Hogan, Tiffany P.; Green, Jordan R.
Purpose: The current research sought to determine (a) if speech inconsistency is a core feature of childhood apraxia of speech (CAS) or if it is driven by comorbid language impairment that affects a large subset of children with CAS and (b) if speech inconsistency is a sensitive and specific diagnostic marker that can differentiate between CAS and…
Jensen, Jesper Rindom; Benesty, Jacob; Christensen, Mads Græsbøll
In this work, we consider enhancement of multichannel speech recordings. Linear filtering and subspace approaches have been considered previously for solving the problem. The current linear filtering methods, although many variants exist, have limited control of noise reduction and speech...
Represented speech refers to speech where we reference somebody. Represented speech is an important phenomenon in everyday conversation, health care communication, and qualitative research. This case will draw first from a case study on physicians’ workplace learning and second from a case study...... on nurses’ apprenticeship learning. The aim of the case is to guide the qualitative researcher to use own and others’ voices in the interview and to be sensitive to represented speech in everyday conversation. Moreover, reported speech matters to health professionals who aim to represent the voice...... of their patients. Qualitative researchers and students might learn to encourage interviewees to elaborate different voices or perspectives. Qualitative researchers working with natural speech might pay attention to how people talk and use represented speech. Finally, represented speech might be relevant...
... here Home » Health Info » Statistics and Epidemiology Quick Statistics About Voice, Speech, Language Voice, Speech, Language, and ... no 205. Hyattsville, MD: National Center for Health Statistics. 2015. Hoffman HJ, Li C-M, Losonczy K, ...
Alim Sabur Ajibola
Full Text Available Stuttered speech is a dysfluency rich speech, more prevalent in males than females. It has been associated with insufficient air pressure or poor articulation, even though the root causes are more complex. The primary features include prolonged speech and repetitive speech, while some of its secondary features include, anxiety, fear, and shame. This study used LPC analysis and synthesis algorithms to reconstruct the stuttered speech. The results were evaluated using cepstral distance, Itakura-Saito distance, mean square error, and likelihood ratio. These measures implied perfect speech reconstruction quality. ASR was used for further testing, and the results showed that all the reconstructed speech samples were perfectly recognized while only three samples of the original speech were perfectly recognized.
Poole, Matthew L.; Brodtmann, Amy; Darby, David; Vogel, Adam P.
Purpose: Our purpose was to create a comprehensive review of speech impairment in frontotemporal dementia (FTD), primary progressive aphasia (PPA), and progressive apraxia of speech in order to identify the most effective measures for diagnosis and monitoring, and to elucidate associations between speech and neuroimaging. Method: Speech and…
Lee, Jimin; Hustad, Katherine C.; Weismer, Gary
Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…
An experimental Dutch keyboard-to-speech system has been developed to explor the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired. The system uses diphones and a formant synthesizer chip for speech synthesis. Input to the system is in
Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike
Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…
Zuk, Jennifer; Iuzzini-Seigel, Jenya; Cabbage, Kathryn; Green, Jordan R.; Hogan, Tiffany P.
Purpose: Childhood apraxia of speech (CAS) is hypothesized to arise from deficits in speech motor planning and programming, but the influence of abnormal speech perception in CAS on these processes is debated. This study examined speech perception abilities among children with CAS with and without language impairment compared to those with…
Hurkmans, Josephus Johannes Stephanus
Apraxia of Speech (AoS) is a neurogenic speech disorder. A wide variety of behavioural methods have been developed to treat AoS. Various therapy programmes use musical elements to improve speech production. A unique therapy programme combining elements of speech therapy and music therapy is called
Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.
Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech…
Issard, Cécile; Gervain, Judit
Humans can adapt to a wide range of variations in the speech signal, maintaining an invariant representation of the linguistic information it contains. Among them, adaptation to rapid or time-compressed speech has been well studied in adults, but the developmental origin of this capacity remains unknown. Does this ability depend on experience with speech (if yes, as heard in utero or as heard postnatally), with sounds in general or is it experience-independent? Using near-infrared spectroscopy, we show that the newborn brain can discriminate between three different compression rates: normal, i.e. 100% of the original duration, moderately compressed, i.e. 60% of original duration and highly compressed, i.e. 30% of original duration. Even more interestingly, responses to normal and moderately compressed speech are similar, showing a canonical hemodynamic response in the left temporoparietal, right frontal and right temporal cortex, while responses to highly compressed speech are inverted, showing a decrease in oxyhemoglobin concentration. These results mirror those found in adults, who readily adapt to moderately compressed, but not to highly compressed speech, showing that adaptation to time-compressed speech requires little or no experience with speech, and happens at an auditory, and not at a more abstract linguistic level. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
T. E. Braudo
Full Text Available The purpose of this article is to acquaint the specialists, working with children having developmental disorders, with age-related norms for speech development. Many well-known linguists and psychologists studied speech ontogenesis (logogenesis. Speech is a higher mental function, which integrates many functional systems. Speech development in infants during the first months after birth is ensured by the innate hearing and emerging ability to fix the gaze on the face of an adult. Innate emotional reactions are also being developed during this period, turning into nonverbal forms of communication. At about 6 months a baby starts to pronounce some syllables; at 7–9 months – repeats various sounds combinations, pronounced by adults. At 10–11 months a baby begins to react on the words, referred to him/her. The first words usually appear at an age of 1 year; this is the start of the stage of active speech development. At this time it is acceptable, if a child confuses or rearranges sounds, distorts or misses them. By the age of 1.5 years a child begins to understand abstract explanations of adults. Significant vocabulary enlargement occurs between 2 and 3 years; grammatical structures of the language are being formed during this period (a child starts to use phrases and sentences. Preschool age (3–7 y. o. is characterized by incorrect, but steadily improving pronunciation of sounds and phonemic perception. The vocabulary increases; abstract speech and retelling are being formed. Children over 7 y. o. continue to improve grammar, writing and reading skills. The described stages may not have strict age boundaries, as soon as they are dependent not only on environment, but also on the child’s mental constitution, heredity and character.
Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Poletto, Christopher J.; Ludlow, Christy L.
The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech sylla...
Full Text Available Dereverberation is required in various speech processing applications such as handsfree telephony and voice-controlled systems, especially when signals are applied that are recorded in a moderately or highly reverberant environment. In this paper, we compare a number of classical and more recently developed multimicrophone dereverberation algorithms, and validate the different algorithmic settings by means of two performance indices and a speech recognition system. It is found that some of the classical solutions obtain a moderate signal enhancement. More advanced subspace-based dereverberation techniques, on the other hand, fail to enhance the signals despite their high-computational load.
In this book, we introduce the background and mainstream methods of probabilistic modeling and discriminative parameter optimization for speech recognition. The specific models treated in depth include the widely used exponential-family distributions and the hidden Markov model. A detailed study is presented on unifying the common objective functions for discriminative learning in speech recognition, namely maximum mutual information (MMI), minimum classification error, and minimum phone/word error. The unification is presented, with rigorous mathematical analysis, in a common rational-functio
Doubé, Wendy; Carding, Paul; Flanagan, Kieran; Kaufman, Jordy; Armitage, Hannah
Children with speech sound disorders benefit from feedback about the accuracy of sounds they make. Home practice can reinforce feedback received from speech pathologists. Games in mobile device applications could encourage home practice, but those currently available are of limited value because they are unlikely to elaborate "Correct"/"Incorrect" feedback with information that can assist in improving the accuracy of the sound. This protocol proposes a "Wizard of Oz" experiment that aims to provide evidence for the provision of effective multimedia feedback for speech sound development. Children with two common speech sound disorders will play a game on a mobile device and make speech sounds when prompted by the game. A human "Wizard" will provide feedback on the accuracy of the sound but the children will perceive the feedback as coming from the game. Groups of 30 young children will be randomly allocated to one of five conditions: four types of feedback and a control which does not play the game. The results of this experiment will inform not only speech sound therapy, but also other types of language learning, both in general, and in multimedia applications. This experiment is a cost-effective precursor to the development of a mobile application that employs pedagogically and clinically sound processes for speech development in young children.
Doubé, Wendy; Carding, Paul; Flanagan, Kieran; Kaufman, Jordy; Armitage, Hannah
Children with speech sound disorders benefit from feedback about the accuracy of sounds they make. Home practice can reinforce feedback received from speech pathologists. Games in mobile device applications could encourage home practice, but those currently available are of limited value because they are unlikely to elaborate “Correct”/”Incorrect” feedback with information that can assist in improving the accuracy of the sound. This protocol proposes a “Wizard of Oz” experiment that aims to provide evidence for the provision of effective multimedia feedback for speech sound development. Children with two common speech sound disorders will play a game on a mobile device and make speech sounds when prompted by the game. A human “Wizard” will provide feedback on the accuracy of the sound but the children will perceive the feedback as coming from the game. Groups of 30 young children will be randomly allocated to one of five conditions: four types of feedback and a control which does not play the game. The results of this experiment will inform not only speech sound therapy, but also other types of language learning, both in general, and in multimedia applications. This experiment is a cost-effective precursor to the development of a mobile application that employs pedagogically and clinically sound processes for speech development in young children. PMID:29674986
Full Text Available Children with speech sound disorders benefit from feedback about the accuracy of sounds they make. Home practice can reinforce feedback received from speech pathologists. Games in mobile device applications could encourage home practice, but those currently available are of limited value because they are unlikely to elaborate “Correct”/”Incorrect” feedback with information that can assist in improving the accuracy of the sound. This protocol proposes a “Wizard of Oz” experiment that aims to provide evidence for the provision of effective multimedia feedback for speech sound development. Children with two common speech sound disorders will play a game on a mobile device and make speech sounds when prompted by the game. A human “Wizard” will provide feedback on the accuracy of the sound but the children will perceive the feedback as coming from the game. Groups of 30 young children will be randomly allocated to one of five conditions: four types of feedback and a control which does not play the game. The results of this experiment will inform not only speech sound therapy, but also other types of language learning, both in general, and in multimedia applications. This experiment is a cost-effective precursor to the development of a mobile application that employs pedagogically and clinically sound processes for speech development in young children.
Holzrichter, J.F.; Ng, L.C.
The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs
Holzrichter, John F.; Ng, Lawrence C.
The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
Fontan, Lionel; Tardieu, Julien; Gaillard, Pascal; Woisard, Virginie; Ruiz, Robert
Purpose: The authors investigated the relationship between the intelligibility and comprehension of speech presented in babble noise. Method: Forty participants listened to French imperative sentences (commands for moving objects) in a multitalker babble background for which intensity was experimentally controlled. Participants were instructed to…
Standard articulation tests are not always sensitive enough to discriminate between speech samples which are of high intelligibility. One can increase the sensitivity of such tests by presenting the test materials in noise. In this way, small differences in intelligibility can be magnified into
Larson, Martha; Ordelman, Roeland J.F.; de Jong, Franciska M.G.; Kohler, Joachim; Kraaij, Wessel
After two successful years at SIGIR in 2007 and 2008, the third workshop on Searching Spontaneous Conversational Speech (SSCS 2009) was held conjunction with the ACM Multimedia 2009. The goal of the SSCS series is to serve as a forum that brings together the disciplines that collaborate on spoken
from movements of certain organs with his (man‟s) throat and mouth…. By means ... In other words, government engages language; and how this affects the ... address the audience in a social gathering in order to have a new dawn. ..... Agbedo, C. U. Speech Act Analysis of Political discourse in the Nigerian Print Media in.
Nijland, Lian; Terband, Hayo; Maassen, Ben
Purpose: Childhood apraxia of speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional problems. Method: Cognitive functions were investigated…
Keetels, M.N.; Schakel, L.; de Bonte, M.; Vroomen, J.
Listeners adjust their phonetic categories to cope with variations in the speech signal (phonetic recalibration). Previous studies have shown that lipread speech (and word knowledge) can adjust the perception of ambiguous speech and can induce phonetic adjustments (Bertelson, Vroomen, & de Gelder in
Hogan, J. Michael; Kurr, Jeffrey A.; Johnson, Jeremy D.; Bergmaier, Michael J.
In light of the U.S. Senate's designation of March 15, 2016 as "National Speech and Debate Education Day" (S. Res. 398, 2016), it only seems fitting that "Communication Education" devote a special section to the role of speech and debate in civic education. Speech and debate have been at the heart of the communication…
The experimental addition of speech output to computer-based Esperanto lessons using speech synthesized from text is described. Because of Esperanto's phonetic spelling and simple rhythm, it is particularly easy to describe the mechanisms of Esperanto synthesis. Attention is directed to how the text-to-speech conversion is performed and the ways…
on speech production characteristics, but also helps in accurate analysis of speech. .... include time delay estimation, speech enhancement from single and multi- ...... log. (. E[k]. ∑K−1 l=0. E[l]. ) ,. (7) where K is the number of samples in the ...
Minifie, Fred. D., Ed.; And Others
This book is written as a guide to the understanding of the processes involved in human speech communication. Ten authorities contributed material to provide an introduction to the physiological aspects of speech production and reception, the acoustical aspects of speech production and transmission, the psychophysics of sound reception, the nature…
Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta
Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…
The aim of this article is to describe and classify a number of different forms of English reported speech (or thought), and subsequently to analyze and represent them within the theory of FDG. First, the most prototypical forms of reported speech are discussed (direct and indirect speech);
Nijland, L.; Terband, H.; Maassen, B.
Purpose: Childhood Apraxia of Speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional
Maussen, M.; Grillo, R.
This book focuses on the way in which public debate and legal practice intersect when it comes to the value of free speech and the need to regulate "offensive", "blasphemous" or "hate" speech, especially, though not exclusively where such speech is thought to be offensive to members of ethnic and
Carney, John J., Jr.
The exercise of freedom of speech within our nation has deteriorated. A practical value in teaching free speech is the possibility of restoring a commitment to its principles by educators. What must be taught is why freedom of speech is important, why it has been compromised, and the extent to which it has been compromised. Every technological…
Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.
With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…
Farouk, Mohamed Hesham
This book provides a survey on wide-spread of employing wavelets analysis in different applications of speech processing. The author examines development and research in different application of speech processing. The book also summarizes the state of the art research on wavelet in speech processing.
KARLIN, ISAAC W.; AND OTHERS
THE GROWTH, DEVELOPMENT, AND ABNORMALITIES OF SPEECH IN CHILDHOOD ARE DESCRIBED IN THIS TEXT DESIGNED FOR PEDIATRICIANS, PSYCHOLOGISTS, EDUCATORS, MEDICAL STUDENTS, THERAPISTS, PATHOLOGISTS, AND PARENTS. THE NORMAL DEVELOPMENT OF SPEECH AND LANGUAGE IS DISCUSSED, INCLUDING THEORIES ON THE ORIGIN OF SPEECH IN MAN AND FACTORS INFLUENCING THE NORMAL…
Curtin, Suzanne; Vouloumanos, Athena
We examined whether infants' preference for speech at 12 months is associated with autistic-like behaviors at 18 months in infants who are at increased risk for autism spectrum disorder (ASD) because they have an older sibling diagnosed with ASD and in low-risk infants. Only low-risk infants listened significantly longer to speech than to nonspeech at 12 months. In both groups, relative preference for speech correlated positively with general cognitive ability at 12 months. However, in high-risk infants only, preference for speech was associated with autistic-like behavior at 18 months, while in low-risk infants, preference for speech correlated with language abilities. This suggests that in children at risk for ASD an atypical species-specific bias for speech may underlie atypical social development.
Full Text Available The neural basis of speech perception has been debated for over a century. While it is generally agreed that the superior temporal lobes are critical for the perceptual analysis of speech, a major current topic is whether the motor system contributes to speech perception, with several conflicting findings attested. In a dorsal-ventral speech stream framework (Hickok & Poeppel 2007, this debate is essentially about the roles of the dorsal versus ventral speech processing streams. A major roadblock in characterizing the neuroanatomy of speech perception is task-specific effects. For example, much of the evidence for dorsal stream involvement comes from syllable discrimination type tasks, which have been found to behaviorally doubly dissociate from auditory comprehension tasks (Baker et al. 1981. Discrimination task deficits could be a result of difficulty perceiving the sounds themselves, which is the typical assumption, or it could be a result of failures in temporary maintenance of the sensory traces, or the comparison and/or the decision process. Similar complications arise in perceiving sentences: the extent of inferior frontal (i.e. dorsal stream activation during listening to sentences increases as a function of increased task demands (Love et al. 2006. Another complication is the stimulus: much evidence for dorsal stream involvement uses speech samples lacking semantic context (CVs, non-words. The present study addresses these issues in a large-scale lesion-symptom mapping study. 158 patients with focal cerebral lesions from the Mutli-site Aphasia Research Consortium underwent a structural MRI or CT scan, as well as an extensive psycholinguistic battery. Voxel-based lesion symptom mapping was used to compare the neuroanatomy involved in the following speech perception tasks with varying phonological, semantic, and task loads: (i two discrimination tasks of syllables (non-words and words, respectively, (ii two auditory comprehension tasks
Martínez-Montes, Eduardo; Hernández-Pérez, Heivet; Chobert, Julie; Morgado-Rodríguez, Lisbet; Suárez-Murias, Carlos; Valdés-Sosa, Pedro A; Besson, Mireille
The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN) design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT) or equivalent that were either far from (Large deviants) or close to (Small deviants) the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception are discussed.
Full Text Available The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT or equivalent that were either far from (Large deviants or close to (Small deviants the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception is discussed.
Full Text Available We distinguish two general approaches to inner speech (IS –the ‘format’ and the ‘activity’ views–, and defend the activity view. The format view grounds the utility of IS on features of the representational format of language, and is related to the thesis that the proper function of IS is to make conscious thinking possible. IS appears typically as a product constituted by representations of phonological features. The view also has implications for the idea that passivity phenomena in cognition may be misattributed IS. The activity view sees IS as a speaking activity that does not have a proper function in cognition. It simply inherits the array of functions of outer speech. We argue that it is methodologically advisable to start from this variety of uses, which suggests commonalities between internal and external activities. The format view has several problems; it has to deny unsymbolized thinking; it cannot easily explain how IS makes thoughts available to consciousness, and it cannot explain those uses of IS where its format features apparently play no role. The activity view not only lacks these problems but also has explanatory advantages: construing IS as an activity allows it to be integrally constituted by its content; the view is able to construe unsymbolized thinking as part of a continuum of phenomena that exploit the same mechanisms, and it offers a simple explanation for the variety of uses of IS.
Fourie, Robert J.
Background: Considerations of the negotiated therapeutic relationship in speech and language therapy are somewhat scarce, with specific therapeutic factors generally framed from psycholinguistic, behavioural, or neurological perspectives. Aims: To explore the therapeutic relationship in speech and language therapy, focusing on the personal…
This study tested the hypothesis that children with speech sound disorder have generalized slowed motor speeds. It evaluated associations among oral and hand motor speeds and measures of speech (articulation and phonology) and language (receptive vocabulary, sentence comprehension, sentence imitation), in 11 children with moderate to severe SSD…
Pontoppidan, Niels Henrik; Dyrholm, Mads
a Factorial Hidden Markov Model, with non-stationary assumptions on the source autocorrelations modelled through the Factorial Hidden Markov Model, leads to separation in the monaural case. By extending Hansens work we find that Roweis' assumptions are necessary for monaural speech separation. Furthermore we...
... for stuttering to change over time or for emotions and attitudes about your speech to change as you have new experiences. It is important for you to have a clear idea about your motivation for going to therapy because your reasons for ...
Dunin-Kȩplicz, Barbara; Strachocka, Alina; Szałas, Andrzej; Verbrugge, Rineke
This paper discusses an implementation of four speech acts: assert, concede, request and challenge in a paraconsistent framework. A natural four-valued model of interaction yields multiple new cognitive situations. They are analyzed in the context of communicative relations, which partially replace
Bradley, Bert E.
Argues for the continuation of liberal education over career-oriented programs. Defines liberal education as one that develops abilities that transcend occupational concerns, and that enables individuals to cope with shifts in values, vocations, careers, and the environment. Argues that speech communication makes a significant contribution to…
Mar 4, 2014 ... In reflecting on possible responses to this ... Through the actions of a prophet, as Philip Wogamen (1998:4) reasons, people are supposed to have a ... The main argument in this article is that the person called to prophetic speech needs to become ..... were like dumb bricks and blocks to be forcefully moved.
White, Keith S.
Continuous speech recognition (SR) is an emerging technology that allows direct digital transcription of dictated radiology reports. The SR systems are being widely deployed in the radiology community. This is a review of technical and practical issues that should be considered when implementing an SR system. (orig.)
Bryant, Gregory A.
Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…
Verbal communication does not rely only on the simple perception of auditory signals. It is rather a parallel and integrative processing of linguistic and non-linguistic information, involving temporal and frontal areas in particular. This review describes the inherent complexity of auditory speech comprehension from a functional-neuroanatomical perspective. The review is divided into two parts. In the first part, structural and functional asymmetry of language relevant structures will be discus. The second part of the review will discuss recent neuroimaging studies, which coherently demonstrate that speech comprehension processes rely on a hierarchical network involving the temporal, parietal, and frontal lobes. Further, the results support the dual-stream model for speech comprehension, with a dorsal stream for auditory-motor integration, and a ventral stream for extracting meaning but also the processing of sentences and narratives. Specific patterns of functional asymmetry between the left and right hemisphere can also be demonstrated. The review article concludes with a discussion on interactions between the dorsal and ventral streams, particularly the involvement of motor related areas in speech perception processes, and outlines some remaining unresolved issues. This article is part of a Special Issue entitled Human Auditory Neuroimaging. Copyright © 2013 Elsevier B.V. All rights reserved.
Mar 4, 2014 ... It is expected that people will be drawn into the reality of God by authentic prophetic speech, .... strands of the DNA molecule show themselves to be arranged ... explains, chemical patterns act like the letters of a code, .... viewing the self-reflection regarding the ministry of renewal from the .... Irresistible force.
Full Text Available Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, transformations performed on audio, visual, or joint audiovisual feature spaces, and the actual measure of correspondence between audio and visual speech. Finally, the use of synchrony measure for biometric identity verification based on talking faces is experimented on the BANCA database.
Massaro, Dominic W; Chen, Trevor H
Galantucci, Fowler, and Turvey (2006) have claimed that perceiving speech is perceiving gestures and that the motor system is recruited for perceiving speech. We make the counter argument that perceiving speech is not perceiving gestures, that the motor system is not recruitedfor perceiving speech, and that speech perception can be adequately described by a prototypical pattern recognition model, the fuzzy logical model of perception (FLMP). Empirical evidence taken as support for gesture and motor theory is reconsidered in more detail and in the framework of the FLMR Additional theoretical and logical arguments are made to challenge gesture and motor theory.
Full Text Available In this paper a method for speech quality estimation is evaluated by simulating the transfer of speech over packet switched and mobile networks. The proposed system uses Dynamic Time Warping algorithm for test and received speech comparison. Several tests have been made on a test speech sample of a single speaker with simulated packet (frame loss effects on the perceived speech. The achieved results have been compared with measured PESQ values on the used transmission channel and their correlation has been observed.
SCHIPOR, O. A.
Full Text Available In this paper we highlight the relations between LOGOMON - a Computer Based Speech Therapy System and dyslalia's training steps. Dyslalia is a speech disorder that affects pronunciation of one or many sounds. This presentation of the system is completed by a research regarding end-user (i.e. teachers and parents attitude about the speech assisted therapy in general and about LOGOMON System in particular. The results of this research allow the improvement of our CBST system because the obtained information can be a source of adaptability to different expectations of the beneficiaries.
Full Text Available . In addition, speech-based access to information may empower illiterate or semi-literate peo- ple, 98% of whom live in the developing world. SDSs can play a useful role in a wide range of applications. Of particular importance in Africa are applications... speech (i.e. appropriate for the recognition task in terms of the language used, the profile of the speakers, speaking style, etc.) This speech generally needs to be curated and transcribed prior to the development of ASR sys- tems, and for most...
Greene, Beth G; Logan, John S; Pisoni, David B
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.
GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.
We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916
Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris
Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443
Светлана Викторовна Иванова
Full Text Available Discourse and media communication researchers pay attention to the fact that popular discursive and communicative practices have a tendency to hybridization and convergence. Discourse which is understood as language in use is flexible. Consequently, it turns out that one and the same text can represent several types of discourses. A vivid example of this tendency is revealed in American commencement speech / commencement address / graduation speech. A commencement speech is a speech university graduates are addressed with which in compliance with the modern trend is delivered by outstanding media personalities (politicians, athletes, actors, etc.. The objective of this study is to define the specificity of the realization of polydiscursive practices within commencement speech. The research involves discursive, contextual, stylistic and definitive analyses. Methodologically the study is based on the discourse analysis theory, in particular the notion of a discursive practice as a verbalized social practice makes up the conceptual basis of the research. This research draws upon a hundred commencement speeches delivered by prominent representatives of American society since 1980s till now. In brief, commencement speech belongs to institutional discourse public speech embodies. Commencement speech institutional parameters are well represented in speeches delivered by people in power like American and university presidents. Nevertheless, as the results of the research indicate commencement speech institutional character is not its only feature. Conceptual information analysis enables to refer commencement speech to didactic discourse as it is aimed at teaching university graduates how to deal with challenges life is rich in. Discursive practices of personal discourse are also actively integrated into the commencement speech discourse. More than that, existential discursive practices also find their way into the discourse under study. Commencement
Zhao, T. Christina; Kuhl, Patricia K.
Musicians show enhanced musical pitch and meter processing, effects that generalize to speech. Yet potential differences between musicians and nonmusicians limit conclusions. We examined the effects of a randomized laboratory-controlled music intervention on music and speech processing in 9-mo-old infants. The Intervention exposed infants to music in triple meter (the waltz) in a social environment. Controls engaged in similar social play without music. After 12 sessions, infants’ temporal in...
Besson, Mireille; Chobert, Julie; Marie, Céline
After a brief historical perspective of the relationship between language and music, we review our work on transfer of training from music to speech that aimed at testing the general hypothesis that musicians should be more sensitive than non-musicians to speech sounds. In light of recent results in the literature, we argue that when long-term experience in one domain influences acoustic processing in the other domain, results can be interpreted as common acoustic processing. But when long-te...
Rimvall, M K; Clemmensen, L; Munkholm, A; Rask, C U; Larsen, J T; Skovgaard, A M; Simons, C J P; van Os, J; Jeppesen, P
Auditory verbal hallucinations (AVH) are common during development and may arise due to dysregulation in top-down processing of sensory input. This study was designed to examine the frequency and correlates of speech illusions measured using the White Noise (WN) task in children from the general population. Associations between speech illusions and putative risk factors for psychotic disorder and negative affect were examined. A total of 1486 children aged 11-12 years of the Copenhagen Child Cohort 2000 were examined with the WN task. Psychotic experiences and negative affect were determined using the Kiddie-SADS-PL. Register data described family history of mental disorders. Exaggerated Theory of Mind functioning (hyper-ToM) was measured by the ToM Storybook Frederik. A total of 145 (10%) children experienced speech illusions (hearing speech in the absence of speech stimuli), of which 102 (70%) experienced illusions perceived by the child as positive or negative (affectively salient). Experiencing hallucinations during the last month was associated with affectively salient speech illusions in the WN task [general cognitive ability: adjusted odds ratio (aOR) 2.01, 95% confidence interval (CI) 1.03-3.93]. Negative affect, both last month and lifetime, was also associated with affectively salient speech illusions (aOR 2.01, 95% CI 1.05-3.83 and aOR 1.79, 95% CI 1.11-2.89, respectively). Speech illusions were not associated with delusions, hyper-ToM or family history of mental disorders. Speech illusions were elicited in typically developing children in a WN-test paradigm, and point to an affective pathway to AVH mediated by dysregulation in top-down processing of sensory input.
Preston, Jonathan L; Brick, Nickole; Landi, Nicole
The purpose of this study was to evaluate the efficacy of a treatment program that includes ultrasound biofeedback for children with persisting speech sound errors associated with childhood apraxia of speech (CAS). Six children ages 9-15 years participated in a multiple baseline experiment for 18 treatment sessions during which treatment focused on producing sequences involving lingual sounds. Children were cued to modify their tongue movements using visual feedback from real-time ultrasound images. Probe data were collected before, during, and after treatment to assess word-level accuracy for treated and untreated sound sequences. As participants reached preestablished performance criteria, new sequences were introduced into treatment. All participants met the performance criterion (80% accuracy for 2 consecutive sessions) on at least 2 treated sound sequences. Across the 6 participants, performance criterion was met for 23 of 31 treated sequences in an average of 5 sessions. Some participants showed no improvement in untreated sequences, whereas others showed generalization to untreated sequences that were phonetically similar to the treated sequences. Most gains were maintained 2 months after the end of treatment. The percentage of phonemes correct increased significantly from pretreatment to the 2-month follow-up. A treatment program including ultrasound biofeedback is a viable option for improving speech sound accuracy in children with persisting speech sound errors associated with CAS.
Yi, Han-Gyol; Maddox, W Todd; Mumford, Jeanette A; Chandrasekaran, Bharath
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: email@example.com.
François, Clément; Chobert, Julie; Besson, Mireille; Schön, Daniele
The role of music training in fostering brain plasticity and developing high cognitive skills, notably linguistic abilities, is of great interest from both a scientific and a societal perspective. Here, we report results of a longitudinal study over 2 years using both behavioral and electrophysiological measures and a test-training-retest procedure to examine the influence of music training on speech segmentation in 8-year-old children. Children were pseudo-randomly assigned to either music or painting training and were tested on their ability to extract meaningless words from a continuous flow of nonsense syllables. While no between-group differences were found before training, both behavioral and electrophysiological measures showed improved speech segmentation skills across testing sessions for the music group only. These results show that music training directly causes facilitation in speech segmentation, thereby pointing to the importance of music for speech perception and more generally for children's language development. Finally these results have strong implications for promoting the development of music-based remediation strategies for children with language-based learning impairments.
Nørholm, Sidsel Marie
This thesis deals with speech enhancement, i.e., noise reduction in speech signals. This has applications in, e.g., hearing aids and teleconference systems. We consider a signal-driven approach to speech enhancement where a model of the speech is assumed and filters are generated based...... on this model. The basic model used in this thesis is the harmonic model which is a commonly used model for describing the voiced part of the speech signal. We show that it can be beneficial to extend the model to take inharmonicities or the non-stationarity of speech into account. Extending the model...
Agus, Trevor R; Akeroyd, Michael A; Noble, William; Bhullar, Navjot
Many of the items in the "Speech, Spatial, and Qualities of Hearing" scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively higher scores for competing speech (six items), energetic masking (one item), and no masking (three items). The results suggest significant masking by competing speech in everyday listening situations.
Full Text Available This present study investigated the link between speech-in-speech perception capacities and four executive function components: response suppression, inhibitory control, switching and working memory. We constructed a cross-modal semantic priming paradigm using a written target word and a spoken prime word, implemented in one of two concurrent auditory sentences (cocktail party situation. The prime and target were semantically related or unrelated. Participants had to perform a lexical decision task on visual target words and simultaneously listen to only one of two pronounced sentences. The attention of the participant was manipulated: The prime was in the pronounced sentence listened to by the participant or in the ignored one. In addition, we evaluate the executive function abilities of participants (switching cost, inhibitory-control cost and response-suppression cost and their working memory span. Correlation analyses were performed between the executive and priming measurements. Our results showed a significant interaction effect between attention and semantic priming. We observed a significant priming effect in the attended but not in the ignored condition. Only priming effects obtained in the ignored condition were significantly correlated with some of the executive measurements. However, no correlation between priming effects and working memory capacity was found. Overall, these results confirm, first, the role of attention for semantic priming effect and, second, the implication of executive functions in speech-in-noise understanding capacities.
Gao, Pei-pei; Liu, Feng
With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.
Chevret , Patrick; EBISSOU , Ange; Parizet , Etienne
International audience; In open-plan offices, ambient noise made of intelligible conversations is generally perceived as one of the most important annoyance for tasks requiring concentration efforts. This annoyance has been proved to lead to a decrease of task performance and to health troubles for people in the mean and long term (tiredness, stress, etc.) Consequently, the improvement of working conditions should pass by the evaluation of speech annoyance that could give rise to recommendati...
Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F
The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.
Mauszycki, Shannon C; Wright, Sandra; Dingus, Nicole; Wambaugh, Julie L
This investigation was designed to examine the effects of an articulatory-kinematic treatment in conjunction with visual biofeedback (VBFB) via electropalatography (EPG) on the accuracy of articulation for acquired apraxia of speech (AOS). A multiple-baseline design across participants and behaviors was used with 4 individuals with chronic AOS and aphasia. Accuracy of target speech sounds in treated and untreated phrases in probe sessions served as the dependent variable. Participants received an articulatory-kinematic treatment in combination with VBFB, which was sequentially applied to 3 stimulus sets composed of 2-word phrases with a target speech sound for each set. Positive changes in articulatory accuracy were observed for participants for the majority of treated speech sounds. Also, there was generalization to untreated phrases for most trained speech sounds. Two participants had better long-term maintenance of treated speech sounds in both trained and untrained stimuli. Findings indicate EPG may be a potential treatment tool for AOS. It appears that individuals with AOS can benefit from VBFB via EPG in improving articulatory accuracy. However, further research is needed to determine if VBFB is more advantageous than behavioral treatments that have been proven effective in improving speech production for speakers with AOS.
Judge, Simon; Robertson, Zoë; Hawley, Mark; Enderby, Pam
To explore users' experiences and perceptions of speech-driven environmental control systems (SPECS) as part of a larger project aiming to develop a new SPECS. The motivation for this part of the project was to add to the evidence base for the use of SPECS and to determine the key design specifications for a new speech-driven system from a user's perspective. Semi-structured interviews were conducted with 12 users of SPECS from around the United Kingdom. These interviews were transcribed and analysed using a qualitative method based on framework analysis. Reliability is the main influence on the use of SPECS. All the participants gave examples of occasions when their speech-driven system was unreliable; in some instances, this unreliability was reported as not being a problem (e.g., for changing television channels); however, it was perceived as a problem for more safety critical functions (e.g., opening a door). Reliability was cited by participants as the reason for using a switch-operated system as back up. Benefits of speech-driven systems focused on speech operation enabling access when other methods were not possible; quicker operation and better aesthetic considerations. Overall, there was a perception of increased independence from the use of speech-driven environmental control. In general, speech was considered a useful method of operating environmental controls by the participants interviewed; however, their perceptions regarding reliability often influenced their decision to have backup or alternative systems for certain functions.
Full Text Available The speech is a tool for accurate communication of ideas. When we talk about speech prevention as a practical realization of the language, we are referring to the fact that it should be comprised of the elements of the criteria as viewed from the perspective of the standards. This criteria, in the broad sense of the word, presupposes an exact realization of the thought expressed between the speaker and the recipient.The absence of this criterion catches the eye through the practical realization of the language and brings forth consequences, often hidden very deeply in the human psyche. Their outer manifestation already represents a delayed reaction of the social environment. The foundation for overcoming and standardization of this phenomenon must be the anatomy-physiological patterns of the body, accomplished through methods in concordance with the nature of the body.
Asadi, Sima; Wexler, Anthony S.; Cappa, Christopher D.; Bouvier, Nicole M.; Barreda-Castanon, Santiago; Ristenpart, William D.
We show that the rate of aerosol particle emission during healthy human speech is strongly correlated with the loudness (amplitude) of vocalization. Emission rates range from approximately 1 to 50 particles per second for quiet to loud amplitudes, regardless of language spoken (English, Spanish, Mandarin, or Arabic). Intriguingly, a small fraction of individuals behave as ``super emitters,'' consistently emitting an order of magnitude more aerosol particles than their peers. We interpret the results in terms of the eggressive flowrate during vocalization, which is known to vary significantly for different types of vocalization and for different individuals. The results suggest that individual speech patterns could affect the probability of airborne disease transmission. The results also provide a possible explanation for the existence of ``super spreaders'' who transmit pathogens much more readily than average and who play a key role in the spread of epidemics.
The Speech Cycling Task is a novel experimental paradigm developed together with Robert Port and Keiichi Tajima at Indiana University. In a task of this sort, subjects repeat a phrase containing multiple prominent, or stressed, syllables in time with an auditory metronome, which can be simple or complex. A phase-based collective variable is defined in the acoustic speech signal. This paper reports on two experiments using speech cycling which together reveal many of the hallmarks of hierarchically coupled oscillatory processes. The first experiment requires subjects to place the final stressed syllable of a small phrase at specified phases within the overall Phrase Repetition Cycle (PRC). It is clearly demonstrated that only three patterns, characterized by phases around 1/3, 1/2 or 2/3 are reliably produced, and these points are attractors for other target phases. The system is thus multistable, and the attractors correspond to stable couplings between the metrical foot and the PRC. A second experiment examines the behavior of these attractors at increased rates. Faster rates lead to mode jumps between attractors. Previous experiments have also illustrated hysteresis as the system moves from one mode to the next. The dynamical organization is particularly interesting from a modeling point of view, as there is no single part of the speech production system which cycles at the level of either the metrical foot or the phrase repetition cycle. That is, there is no continuous kinematic observable in the system. Nonetheless, there is strong evidence that the oscopic behavior of the entire production system is correctly described as hierarchically coupled oscillators. There are many parallels between this organization and the forms of inter-limb coupling observed in locomotion and rhythmic manual tasks.
Full Text Available It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i the Episodic Theory (ET of speech perception and production (Goldinger, 1998; (ii the Motor Theory (MT of speech perception (Liberman and Whalen, 2000;Galantucci et al., 2006 ; (iii Communication Accommodation Theory (CAT; Giles et al., 1991;Giles and Coupland, 1991. We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT and higher-level accounts (like CAT. We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering & Garrod, in press. Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers’ utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e. the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker’s and listener’s social identities, their conversational roles, the listener’s intention to imitate.
enough from the truth. Subjects were then interviewed individually in a sound booth to obtain “norming” speech data, pre- interview. We also...e.g. pitch, intensity, speaking rate, voice quality), gender, ethnicity and personality information, our machine learning experiments can classify...Have you ever been in trouble with the police?” vs. open-ended (e.g. “What is the last movie you saw that you really hated ?”) DISTRIBUTION A
Schaadt, Gesa; van der Meer, Elke; Pannekamp, Ann; Oberecker, Regine; Männel, Claudia
During information processing, individuals benefit from bimodally presented input, as has been demonstrated for speech perception (i.e., printed letters and speech sounds) or the perception of emotional expressions (i.e., facial expression and voice tuning). While typically developing individuals show this bimodal benefit, school children with dyslexia do not. Currently, it is unknown whether the bimodal processing deficit in dyslexia also occurs for visual-auditory speech processing that is independent of reading and spelling acquisition (i.e., no letter-sound knowledge is required). Here, we tested school children with and without spelling problems on their bimodal perception of video-recorded mouth movements pronouncing syllables. We analyzed the event-related potential Mismatch Response (MMR) to visual-auditory speech information and compared this response to the MMR to monomodal speech information (i.e., auditory-only, visual-only). We found a reduced MMR with later onset to visual-auditory speech information in children with spelling problems compared to children without spelling problems. Moreover, when comparing bimodal and monomodal speech perception, we found that children without spelling problems showed significantly larger responses in the visual-auditory experiment compared to the visual-only response, whereas children with spelling problems did not. Our results suggest that children with dyslexia exhibit general difficulties in bimodal speech perception independently of letter-speech sound knowledge, as apparent in altered bimodal speech perception and lacking benefit from bimodal information. This general deficit in children with dyslexia may underlie the previously reported reduced bimodal benefit for letter-speech sound combinations and similar findings in emotion perception. Copyright © 2018 Elsevier Ltd. All rights reserved.
Van Bree, K.C.
For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will givefalse positives when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach
Shriberg, Lawrence D.; Fourakis, Marios; Hall, Sheryl D.; Karlsson, Heather B.; Lohmeier, Heather L.; McSweeny, Jane L.; Potter, Nancy L.; Scheer-Cohen, Alison R.; Strand, Edythe A.; Tilkens, Christie M.; Wilson, David L.
This report describes three extensions to a classification system for paediatric speech sound disorders termed the Speech Disorders Classification System (SDCS). Part I describes a classification extension to the SDCS to differentiate motor speech disorders from speech delay and to differentiate among three sub-types of motor speech disorders.…
In his speech to the 21st session of the General Conference of the IAEA the Director General presented the history of establishment of the IAEA, main objectives and main achievements during its 20-year operation
Badalamenti, A F
This paper presents evidence that six of the seven parts of speech occur in written text as Poisson processes, simple or recurring. The six major parts are nouns, verbs, adjectives, adverbs, prepositions, and conjunctions, with the interjection occurring too infrequently to support a model. The data consist of more than the first 5000 words of works by four major authors coded to label the parts of speech, as well as periods (sentence terminators). Sentence length is measured via the period and found to be normally distributed with no stochastic model identified for its occurrence. The models for all six speech parts but the noun significantly distinguish some pairs of authors and likewise for the joint use of all words types. Any one author is significantly distinguished from any other by at least one word type and sentence length very significantly distinguishes each from all others. The variety of word type use, measured by Shannon entropy, builds to about 90% of its maximum possible value. The rate constants for nouns are close to the fractions of maximum entropy achieved. This finding together with the stochastic models and the relations among them suggest that the noun may be a primitive organizer of written text.
Malik, H.; Darma, S.; Soekirno, S.
This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
Purpose: The goal of this article is to introduce the pause marker (PM), a single-sign diagnostic marker proposed to discriminate early or persistent childhood apraxia of speech (CAS) from speech delay.
Full Text Available The authors analyse the effect of speech rate variation on Afrikaans phone stability from an acoustic perspective. Specifically they introduce two techniques for the acoustic analysis of speech rate variation, apply these techniques to an Afrikaans...
Jørgensen, Søren; Cubick, Jens; Dau, Torsten
In the development process of modern telecommunication systems, such as mobile phones, it is common practice to use computer models to objectively evaluate the transmission quality of the system, instead of time-consuming perceptual listening tests. Such models have typically focused on the quality...... of the transmitted speech, while little or no attention has been provided to speech intelligibility. The present study investigated to what extent three state-of-the art speech intelligibility models could predict the intelligibility of noisy speech transmitted through mobile phones. Sentences from the Danish...... Dantale II speech material were mixed with three different kinds of background noise, transmitted through three different mobile phones, and recorded at the receiver via a local network simulator. The speech intelligibility of the transmitted sentences was assessed by six normal-hearing listeners...
Jung, Youngsin; Duffy, Joseph R; Josephs, Keith A
Primary progressive aphasia is a neurodegenerative syndrome characterized by progressive language dysfunction. The majority of primary progressive aphasia cases can be classified into three subtypes: nonfluent/agrammatic, semantic, and logopenic variants. Each variant presents with unique clinical features, and is associated with distinctive underlying pathology and neuroimaging findings. Unlike primary progressive aphasia, apraxia of speech is a disorder that involves inaccurate production of sounds secondary to impaired planning or programming of speech movements. Primary progressive apraxia of speech is a neurodegenerative form of apraxia of speech, and it should be distinguished from primary progressive aphasia given its discrete clinicopathological presentation. Recently, there have been substantial advances in our understanding of these speech and language disorders. The clinical, neuroimaging, and histopathological features of primary progressive aphasia and apraxia of speech are reviewed in this article. The distinctions among these disorders for accurate diagnosis are increasingly important from a prognostic and therapeutic standpoint. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Faundez-Zanuy, Marcos; Esposito, Antonietta; Cordasco, Gennaro; Drugman, Thomas; Solé-Casals, Jordi; Morabito, Francesco
This book presents recent advances in nonlinear speech processing beyond nonlinear techniques. It shows that it exploits heuristic and psychological models of human interaction in order to succeed in the implementations of socially believable VUIs and applications for human health and psychological support. The book takes into account the multifunctional role of speech and what is “outside of the box” (see Björn Schuller’s foreword). To this aim, the book is organized in 6 sections, each collecting a small number of short chapters reporting advances “inside” and “outside” themes related to nonlinear speech research. The themes emphasize theoretical and practical issues for modelling socially believable speech interfaces, ranging from efforts to capture the nature of sound changes in linguistic contexts and the timing nature of speech; labors to identify and detect speech features that help in the diagnosis of psychological and neuronal disease, attempts to improve the effectiveness and performa...
Lü, Tao; Guo, Jin; Zhang, He-yong; Yan, Chun-hui; Wang, Can-jin
To address the challenges of non-cooperative and remote acoustic detection, an all-fiber laser Doppler vibrometer (LDV) is established. The all-fiber LDV system can offer the advantages of smaller size, lightweight design and robust structure, hence it is a better fit for remote speech detection. In order to improve the performance and the efficiency of LDV for long-range hearing, the speech enhancement technology based on optimally modified log-spectral amplitude (OM-LSA) algorithm is used. The experimental results show that the comprehensible speech signals within the range of 150 m can be obtained by the proposed LDV. The signal-to-noise ratio ( SNR) and mean opinion score ( MOS) of the LDV speech signal can be increased by 100% and 27%, respectively, by using the speech enhancement technology. This all-fiber LDV, which combines the speech enhancement technology, can meet the practical demand in engineering.
Mobile Speech and Advanced Natural Language Solutions provides a comprehensive and forward-looking treatment of natural speech in the mobile environment. This fourteen-chapter anthology brings together lead scientists from Apple, Google, IBM, AT&T, Yahoo! Research and other companies, along with academicians, technology developers and market analysts. They analyze the growing markets for mobile speech, new methodological approaches to the study of natural language, empirical research findings on natural language and mobility, and future trends in mobile speech. Mobile Speech opens with a challenge to the industry to broaden the discussion about speech in mobile environments beyond the smartphone, to consider natural language applications across different domains. Among the new natural language methods introduced in this book are Sequence Package Analysis, which locates and extracts valuable opinion-related data buried in online postings; microintonation as a way to make TTS truly human-like; and se...
The role of speech production mechanisms in difficult speech comprehension is the subject of on-going debate in speech science. Two Activation Likelihood Estimation (ALE) analyses were conducted on neuroimaging studies investigating difficult speech comprehension or speech production. Meta-analysis 1 included 10 studies contrasting comprehension…
...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...; speech-to-speech (STS); pay-per-call (900) calls; types of calls; and equal access to interexchange... of a report, due April 16, 2011, addressing whether it is necessary for the waivers to remain in...
...] Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities... proposed compensation rates for Interstate TRS, Speech-to-Speech Services (STS), Captioned Telephone... costs reported in the data submitted to NECA by VRS providers. In this regard, document DA 10-761 also...
H. B. Kekre; Tanuja K. Sarode
Mostly transforms are used for speech data compressions which are lossy algorithms. Such algorithms are tolerable for speech data compression since the loss in quality is not perceived by the human ear. However the vector quantization (VQ) has a potential to give more data compression maintaining the same quality. In this paper we propose speech data compression algorithm using vector quantization technique. We have used VQ algorithms LBG, KPE and FCG. The results table s...
Full Text Available This paper presents new Czech language two-channel (stereo speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch re-estimation were realised. The noise analysis of the car background environment was done.
Pollak, P.; Vopicka, J.; Hanzl, V.; Sovka, Pavel
This paper presents new Czech language two-channel (stereo) speech database recorded in car environment. The created database was designed for experiments with speech enhancement for communication purposes and for the study and the design of a robust speech recognition systems. Tools for automated phoneme labelling based on Baum-Welch re-estimation were realised. The noise analysis of the car background environment was done.
Agus, Trevor R.; Akeroyd, Michael A.; Noble, William; Bhullar, Navjot
Many of the items in the “Speech, Spatial, and Qualities of Hearing” scale questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol.43, 85–99 (2004)] are concerned with speech understanding in a variety of backgrounds, both speech and nonspeech. To study if this self-report data reflected informational masking, previously collected data on 414 people were analyzed. The lowest scores (greatest difficulties) were found for the two items in which there were two speech targets, with successively ...
Barnes, T; Burnett, G; Gable, T; Holzrichter, J F; Ng, L
Low power Electromagnetic (EM) Wave sensors can measure general properties of human speech articulator motions, as speech is produced. See Holzrichter, Burnett, Ng, and Lea, J.Acoust.Soc.Am. 103 (1) 622 (1998). Experiments have demonstrated extremely accurate pitch measurements ( and lt; 1 Hz per pitch cycle) and accurate onset of voiced speech. Recent measurements of pressure-induced tracheal motions enable very good spectra and amplitude estimates of a voiced excitation function. The use of the measured excitation functions and pitch synchronous processing enable the determination of each pitch cycle of an accurate transfer function and, indirectly, of the corresponding articulator motions. In addition, direct measurements have been made of EM wave reflections from articulator interfaces, including jaw, tongue, and palate, simultaneously with acoustic and glottal open/close signals. While several types of EM sensors are suitable for speech articulator measurements, the homodyne sensor has been found to provide good spatial and temporal resolution for several applications
Sabine van der Ham
Full Text Available When learning language, humans have a tendency to produce more extreme distributions of speech sounds than those observed most frequently: In rapid, casual speech, vowel sounds are centralized, yet cross-linguistically, peripheral vowels occur almost universally. We investigate whether adults’ generalization behavior reveals selective pressure for communication when they learn skewed distributions of speech-like sounds from a continuous signal space. The domain-specific hypothesis predicts that the emergence of sound categories is driven by a cognitive bias to make these categories maximally distinct, resulting in more skewed distributions in participants’ reproductions. However, our participants showed more centered distributions, which goes against this hypothesis, indicating that there are no strong innate linguistic biases that affect learning these speech-like sounds. The centralization behavior can be explained by a lack of communicative pressure to maintain categories.
Maysa Luchesi Cera
Full Text Available Abstract The literature on apraxia of speech describes the types and characteristics of phonological errors in this disorder. In general, phonemes affected by errors are described, but the distinctive features involved have not yet been investigated. Objective: To analyze the features involved in substitution errors produced by Brazilian-Portuguese speakers with apraxia of speech. Methods: 20 adults with apraxia of speech were assessed. Phonological analysis of the distinctive features involved in substitution type errors was carried out using the protocol for the evaluation of verbal and non-verbal apraxia. Results: The most affected features were: voiced, continuant, high, anterior, coronal, posterior. Moreover, the mean of the substitutions of marked to markedness features was statistically greater than the markedness to marked features. Conclusions: This study contributes toward a better characterization of the phonological errors found in apraxia of speech, thereby helping to diagnose communication disorders and the selection criteria of phonemes for rehabilitation in these patients.
Macdonald, Ewen N; Raufer, Stefan
The Lombard effect refers to the phenomenon where talkers automatically increase their level of speech in a noisy environment. While many studies have characterized how the Lombard effect influences different measures of speech production (e.g., F0, spectral tilt, etc.), few have investigated...... the consequences of temporally fluctuating noise. In the present study, 20 talkers produced speech in a variety of noise conditions, including both steady-state and amplitude-modulated white noise. While listening to noise over headphones, talkers produced randomly generated five word sentences. Similar...... of noisy environments and will alter their speech accordingly....
In this paper I argue that one way of explaining what is wrong with hate speech is by critically assessing what kind of freedom free speech involves and, relatedly, what kind of freedom hate speech undermines. More specifically, I argue that the main arguments for freedom of speech (e.g. from truth, from autonomy, and from democracy) rely on a “positive” conception of freedom intended as autonomy and self-mastery (Berlin, 2006), and can only partially help us to understand what is wrong with ...
Polyanskaya, Leona; Ordin, Mikhail
Analysis of English rhythm in speech produced by children and adults revealed that speech rhythm becomes increasingly more stress-timed as language acquisition progresses. Children reach the adult-like target by 11 to 12 years. The employed speech elicitation paradigm ensured that the sentences produced by adults and children at different ages were comparable in terms of lexical content, segmental composition, and phonotactic complexity. Detected differences between child and adult rhythm and between rhythm in child speech at various ages cannot be attributed to acquisition of phonotactic language features or vocabulary, and indicate the development of language-specific phonetic timing in the course of acquisition.
M. L. Henry
Full Text Available There is a growing body of literature examining the utility of behavioral treatment in primary progressive aphasia (PPA. There are, however, no studies exploring treatment approaches to improve speech production in individuals with apraxia of speech (AOS associated with the nonfluent variant of PPA. The purpose of this study was to examine a novel approach to treatment of AOS in nonfluent PPA. We implemented a treatment method using structured oral reading as a tool for improving production of multisyllabic words in an individual with mild AOS and nonfluent variant PPA. Our participant showed a reduction in speech errors during reading of novel text that was maintained at one year post-treatment. Generalization of improved speech production was observed on repetition of words and sentences and the participant showed stability of speech production over time in connected speech. Results suggest that oral reading treatment may offer an efficient and effective means of addressing multisyllabic word production in AOS associated with nonfluent PPA, with lasting and generalized treatment effects.
Developmental apraxia of speech (DAS) in children is a speech disorder, supposed to have a neurological origin, which is commonly considered to result from particular deficits in speech processing (i.e., phonological planning, motor programming). However, the label DAS has often been used as
Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.
Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike
Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in 'healthy' communication direct speech constructions contribute to the liveliness, and indirectly to
Imad Hayif Sameer
Full Text Available The theory of speech acts, which clarifies what people do when they speak, is not about individual words or sentences that form the basic elements of human communication, but rather about particular speech acts that are performed when uttering words. A speech act is the attempt at doing something purely by speaking. Many things can be done by speaking. Speech acts are studied under what is called speech act theory, and belong to the domain of pragmatics. In this paper, two Egyptian inaugural speeches from El-Sadat and El-Sisi, belonging to different periods were analyzed to find out whether there were differences within this genre in the same culture or not. The study showed that there was a very small difference between these two speeches which were analyzed according to Searle’s theory of speech acts. In El Sadat’s speech, commissives came to occupy the first place. Meanwhile, in El–Sisi’s speech, assertives occupied the first place. Within the speeches of one culture, we can find that the differences depended on the circumstances that surrounded the elections of the Presidents at the time. Speech acts were tools they used to convey what they wanted and to obtain support from their audiences.
Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.
De Keyser, Kim; Santens, Patrick; Bockstael, Annelies; Botteldooren, Dick; Talsma, Durk; De Vos, Stefanie; Van Cauwenberghe, Mieke; Verheugen, Femke; Corthals, Paul; De Letter, Miet
Purpose: This study investigated the possible relationship between hypokinetic speech production and speech intensity perception in patients with Parkinson's disease (PD). Method: Participants included 14 patients with idiopathic PD and 14 matched healthy controls (HCs) with normal hearing and cognition. First, speech production was objectified…
Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH
This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.
Soskey, Laura N; Allen, Paul D; Bennetto, Loisa
One of the earliest observable impairments in autism spectrum disorder (ASD) is a failure to orient to speech and other social stimuli. Auditory spatial attention, a key component of orienting to sounds in the environment, has been shown to be impaired in adults with ASD. Additionally, specific deficits in orienting to social sounds could be related to increased acoustic complexity of speech. We aimed to characterize auditory spatial attention in children with ASD and neurotypical controls, and to determine the effect of auditory stimulus complexity on spatial attention. In a spatial attention task, target and distractor sounds were played randomly in rapid succession from speakers in a free-field array. Participants attended to a central or peripheral location, and were instructed to respond to target sounds at the attended location while ignoring nearby sounds. Stimulus-specific blocks evaluated spatial attention for simple non-speech tones, speech sounds (vowels), and complex non-speech sounds matched to vowels on key acoustic properties. Children with ASD had significantly more diffuse auditory spatial attention than neurotypical children when attending front, indicated by increased responding to sounds at adjacent non-target locations. No significant differences in spatial attention emerged based on stimulus complexity. Additionally, in the ASD group, more diffuse spatial attention was associated with more severe ASD symptoms but not with general inattention symptoms. Spatial attention deficits have important implications for understanding social orienting deficits and atypical attentional processes that contribute to core deficits of ASD. Autism Res 2017, 10: 1405-1416. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.
Vivien Arief Wardhany
Full Text Available The purpose of multimedia devices development is controlling through voice. Nowdays voice that can be recognized only in English. To overcome the issue, then recognition using Indonesian language model and accousticc model and dictionary. Automatic Speech Recognizier is build using engine CMU Sphinx with modified english language to Indonesian Language database and XBMC used as the multimedia player. The experiment is using 10 volunteers testing items based on 7 commands. The volunteers is classifiedd by the genders, 5 Male & 5 female. 10 samples is taken in each command, continue with each volunteer perform 10 testing command. Each volunteer also have to try all 7 command that already provided. Based on percentage clarification table, the word “Kanan” had the most recognize with percentage 83% while “pilih” is the lowest one. The word which had the most wrong clarification is “kembali” with percentagee 67%, while the word “kanan” is the lowest one. From the result of Recognition Rate by male there are several command such as “Kembali”, “Utama”, “Atas “ and “Bawah” has the low Recognition Rate. Especially for “kembali” cannot be recognized as the command in the female voices but in male voice that command has 4% of RR this is because the command doesn’t have similar word in english near to “kembali” so the system unrecognize the command. Also for the command “Pilih” using the female voice has 80% of RR but for the male voice has only 4% of RR. This problem is mostly because of the different voice characteristic between adult male and female which male has lower voice frequencies (from 85 to 180 Hz than woman (165 to 255 Hz.The result of the experiment showed that each man had different number of recognition rate caused by the difference tone, pronunciation, and speed of speech. For further work needs to be done in order to improving the accouracy of the Indonesian Automatic Speech Recognition system
Montgomery, Judy K.
This article is an interview with Dr. Pam Enderby--a speech language therapist and professor at the Institute of General Practice and Primary Care at the University of Sheffield, Community Sciences Centre, Northern General Hospital, in the United Kingdom--conducted by Judy Montgomery, Editor in Chief, of "Communication Disorders…
ABSTRACT: The aim of this article is two-folded. First, it offers a general overview of the legal repression of various categories of expressive behaviours ascribed to the notion of hate speech. Especially in Europe, such repression often passes through criminalization, long since used by law to suppress thoughts or points of view for some reason considered harmful. The social groups nowadays protected by such repressive measures, however, are different from the past: weak and/or minoritarian groups rather than strong and/or majoritarian ones. The second aim of this work is to identify the reasons for the legal repression of expressions, beliefs or doctrines certainly in many cases morally deplorable, yet protected by other high-ranking legal reasons: the principles of freedom of thought and speech. It will be noted that especially for untargeted hate speech not connected to a clear and present danger of unlawful actions, there is no general agreement about those reasons. Furthermore, they tend to be structured in justifications showing some typical flaws of, respectively, deontological and teleological reasoning: the former tend to be superficial as apodictic, generic and vacuous; the latter tend to evade the argumentative burdens of factual order from which their validity depends.
The Ethnic Hate Speech was Networked: What Social Media Online Political Discussions Reveal about the 2013 General Elections in Kenya / El discurso étnico del odio se difundió por las redes. Análisis de las discusiones políticas en las redes sociales dur
Jacinta Mwende Maweu
Full Text Available This chapter examines if the increased political discussions on social media especially Twitter and Face Book before and after the March 4th, 2013 general elections in Kenya translated to a more robust Alternative Public Sphere that broke the hegemony of the traditional media as Agenda setters or an alternative space for the audience to vent out their frustrations and grievances about the election. In the last most contentious elections in 2007, in Kenya, both new and old media were blamed for fueling ethnic hate speech which culminated into the 2007/ 2008 post election violence. It is argued in this chapter that although voting patterns in the March 2013 elections were clearly along ethnic lines just like in 2007, there was no physical post election violence like was the case in 2008. What was clearly evident was Ethnic hate speech before and after the general elections on social media networks. We therefore observe that unlike in 2008 where ethnic violence was fought in the streets, in the 2013 general elections, the ‘Ethnic War was Networked’. The chapter analyses the content of some of the messages send on Twitter and Face Book to argue that social media platforms only acted as alternative spaces for Kenyans to fight out their ethnic political wars and not alternative public spheres for constructive political deliberation. The chapter concludes by observing that social media networks in the 2013 general elections in Kenya acted as ‘Opiums of the Masses’ only serving the function of keeping Kenya ‘quiet and peaceful’ to prevent a repeat of the 2008 post election violence, but not alternative public spheres to facilitate constructive political deliberation. En este artículo se analiza si el aumento de las discusiones políticas sobre las redes sociales, especialmente Twitter y Facebook, antes y después de las elecciones generales en Kenia del 04 de marzo de 2013, se tradujeron en una esfera pública alternativa sólida que
Linear prediction (LP) analysis has been applied to speech system over the last few decades. LP technique is well-suited for speech analysis due to its ability to model speech production process approximately. Hence LP analysis has been widely used for speech enhancement, low-bit-rate speech coding in cellular telephony, speech recognition, characteristic parameter extraction (vocal tract resonances frequencies, fundamental frequency called pitch) and so on. However, the performance of the co...
abridging the freedom of speech .” Speech is construed broadly and includes both oral and written speech, as well as expressive conduct and displays when...intended to convey a message that is likely to be understood.7 Religious speech is certainly included. As a bedrock constitutional right, freedom of speech has...to good order and discipline or of a nature to bring discredit upon the armed forces)—the First Amendment’s freedom of speech will not provide them
Maas, Edwin; Mailend, Marja-Liisa
Purpose: The purpose of this article is to present an argument for the use of online reaction time (RT) methods to the study of apraxia of speech (AOS) and to review the existing small literature in this area and the contributions it has made to our fundamental understanding of speech planning (deficits) in AOS. Method: Following a brief…
Stekelenburg, J.J.; Keetels, M.N.; Vroomen, J.H.M.
Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect.
Full Text Available Understanding Political Influence in Modern-Era Conflict: A Qualitative Historical Analysis of Hassan Nasrallah’s Speeches 'Abstract' 'This research examines and closely analyzes speeches delivered by Hezbollah’s secretary general and spokesman, Hassan Nasrallah, from a content analysis perspective. We reveal that several significant political phenomena that have occurred in Lebanon were impacted by the intensity of speeches delivered by Nasrallah; these three events being the 2006 War, the Doha Agreement, and the 2008 prisoner exchange. Data has been collected from transcribed speeches and analyzed using a qualitative historical analysis. Furthermore, we use latent analysis to assess Nasrallah’s underlying implications of his speeches and identify the themes he uses to influence his audience.'
Chabot-Leclerc, Alexandre; Jørgensen, Søren; Dau, Torsten
Speech intelligibility models typically consist of a preprocessing part that transforms stimuli into some internal (auditory) representation and a decision metric that relates the internal representation to speech intelligibility. The present study analyzed the role of modulation filtering...... in the preprocessing of different speech intelligibility models by comparing predictions from models that either assume a spectro-temporal (i.e., two-dimensional) or a temporal-only (i.e., one-dimensional) modulation filterbank. Furthermore, the role of the decision metric for speech intelligibility was investigated...... subtraction. The results suggested that a decision metric based on the SNRenv may provide a more general basis for predicting speech intelligibility than a metric based on the MTF. Moreover, the one-dimensional modulation filtering process was found to be sufficient to account for the data when combined...
Full Text Available One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processingd with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or
Peelle, Jonathan E; Sommers, Mitchell S
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
Peelle, Jonathan E.; Sommers, Mitchell S.
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported
Varley, Rosemary; Cowell, Patricia E; Dyson, Lucy; Inglis, Lesley; Roper, Abigail; Whiteside, Sandra P
There is currently little evidence on effective interventions for poststroke apraxia of speech. We report outcomes of a trial of self-administered computer therapy for apraxia of speech. Effects of speech intervention on naming and repetition of treated and untreated words were compared with those of a visuospatial sham program. The study used a parallel-group, 2-period, crossover design, with participants receiving 2 interventions. Fifty participants with chronic and stable apraxia of speech were randomly allocated to 1 of 2 order conditions: speech-first condition versus sham-first condition. Period 1 design was equivalent to a randomized controlled trial. We report results for this period and profile the effect of the period 2 crossover. Period 1 results revealed significant improvement in naming and repetition only in the speech-first group. The sham-first group displayed improvement in speech production after speech intervention in period 2. Significant improvement of treated words was found in both naming and repetition, with little generalization to structurally similar and dissimilar untreated words. Speech gains were largely maintained after withdrawal of intervention. There was a significant relationship between treatment dose and response. However, average self-administered dose was modest for both groups. Future software design would benefit from incorporation of social and gaming components to boost motivation. Single-word production can be improved in chronic apraxia of speech with behavioral intervention. Self-administered computerized therapy is a promising method for delivering high-intensity speech/language rehabilitation. URL: http://orcid.org/0000-0002-1278-0601. Unique identifier: ISRCTN88245643. © 2016 American Heart Association, Inc.
Full Text Available listener simply commented that “this speaker comes from a different region”. We therefore believe that we can improve the quality of synthesis substantially by explicitly aiming for monotone recordings. Of course, the eventual aim is to produce “natural... an explicit duration model, but weigh the syllable position heavily in the calculation of the target costs during synthesis. Again, listeners find this to be an acceptable compromise. Development of appropriate target-cost function To select...
of reduction levels and perceived speaker attributes in which moderate reduction can make a better impression on listeners than no reduction. In addition to its relevance in reduction models and theories, this interplay is instructive for various fields of speech application from social robotics to charisma...... whether variation in the degree of reduction also has a systematic effect on the attributes we ascribe to the speaker who produces the speech signal. A perception experiment was carried out for German in which 46 listeners judged whether or not speakers showing 3 different combinations of segmental...... and prosodic reduction levels (unreduced, moderately reduced, strongly reduced) are appropriately described by 13 physical, social, and cognitive attributes. The experiment shows that clear speech is not mere speech, and less clear speech is not just reduced either. Rather, results revealed a complex interplay...
Oard, Douglas W
Libraries and archives collect recorded speech and multimedia objects that contain recorded speech, and such material may comprise a substantial portion of the collection in future digital libraries...
.... To aid in the assessment of various commercially available speech recognition systems, several aircraft speech databases have been developed at the Air Force Research Laboratory's Human Effectiveness Directorate...
... speech intelligibility. Speech intelligibility for signals generated by an acoustic microphone, a throat microphone, and the two microphones together was assessed using the Modified Rhyme Test (MRT...
Weinstein, C. J.
This report documents work performed during FY 1981 on the DCA-sponsored Network Speech Systems Technology Program. The two areas of work reported are: (1) communication system studies in support of the evolving Defense Switched Network (DSN) and (2) design and implementation of satellite/terrestrial interfaces for the Experimental Integrated Switched Network (EISN). The system studies focus on the development and evaluation of economical and endurable network routing procedures. Satellite/terrestrial interface development includes circuit-switched and packet-switched connections to the experimental wideband satellite network. Efforts in planning and coordination of EISN experiments are reported in detail in a separate EISN Experiment Plan.
Actual spoken language of man developed only approximately 200,000 to 100,000 years ago. As a result of natural selection, man has developed hearing, which is most sensitive in the frequency regions of 200 to 4000 Hz, corresponding to those of spoken sounds. Functional hearing has been one of the prerequisites for the development of speech, although according to current opinion the language itself may have evolved by mimicking gestures with the so-called mirror neurons. Due to hearing, gesticulation was no longer necessary, and the hands became available for other purposes.
Schönmeyr, Björn; Wendby, Lisa; Sharma, Mitali; Jacobson, Lia; Restrepo, Carolina; Campbell, Alex
Many patients with cleft palate deformities worldwide receive treatment at a later age than is recommended for normal speech to develop. The outcomes after late palate repairs in terms of speech and quality of life (QOL) still remain largely unstudied. In the current study, questionnaires were used to assess the patients' perception of speech and QOL before and after primary palate repair. All of the patients were operated at a cleft center in northeast India and had a cleft palate with a normal lip or with a cleft lip that had been previously repaired. A total of 134 patients (7-35 years) were interviewed preoperatively and 46 patients (7-32 years) were assessed in the postoperative survey. The survey showed that scores based on the speech handicap index, concerning speech and speech-related QOL, did not improve postoperatively. In fact, the questionnaires indicated that the speech became more unpredictable (P reported that their self-confidence had improved after the operation. Thus, the majority of interviewed patients who underwent late primary palate repair were satisfied with the surgery. At the same time, speech and speech-related QOL did not improve according to the speech handicap index-based survey. Speech predictability may even become worse and nasal regurgitation may increase after late palate repair, according to these results.
Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias
Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs......) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded...... with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed...
Balasubramanian, Venu; Max, Ludo
The present study reports on the first case of crossed apraxia of speech (CAS) in a 69-year-old right-handed female (SE). The possibility of occurrence of apraxia of speech (AOS) following right hemisphere lesion is discussed in the context of known occurrences of ideomotor apraxias and acquired neurogenic stuttering in several cases with right…
Yegnanarayana, B.; Satyanarayana Murthy, P.; Eggen, J.H.
In this paper we propose a speech-analysis method to bring out characteristics of the vocal tract system in short segments which are much less than a pitch period. The method performs windowing in the source and system components of the speech signal and recombines them to obtain a signal reflecting
RUTHERFORD, DAVID; WESTLAKE, HAROLD
DESIGNED TO PROVIDE AN ESSENTIAL CORE OF INFORMATION, THIS BOOK TREATS NORMAL AND ABNORMAL DEVELOPMENT, STRUCTURE, AND FUNCTION OF THE LIPS AND PALATE AND THEIR RELATIONSHIPS TO CLEFT LIP AND CLEFT PALATE SPEECH. PROBLEMS OF PERSONAL AND SOCIAL ADJUSTMENT, HEARING, AND SPEECH IN CLEFT LIP OR CLEFT PALATE INDIVIDUALS ARE DISCUSSED. NASAL RESONANCE…
Barker, Larry L.; And Others
The purposes of this paper are (1) to review the background and nature of hypnosis, (2) to synthesize research on hypnosis related to speech communication, and (3) to delineate and compare two potential techniques for reducing speech anxiety--hypnosis and systematic desensitization. Hypnosis has been defined as a mental state characterised by…
not only affect the listener of speech communication in a noisy environment, HPDs can also affect the speaker . Tufts and Frank (2003) found that...of hearing protection on speech intelligibility in noise. Sound and Vibration . 20(10): 12-14. Berger, E. H. 1980. EARLog #4 – The
Namasivayam, Aravind K.; Pukonen, Margit; Goshulak, Debra; Hard, Jennifer; Rudzicz, Frank; Rietveld, Toni; Maassen, Ben; Kroll, Robert; van Lieshout, Pascal
Background: Intensive treatment has been repeatedly recommended for the treatment of speech deficits in childhood apraxia of speech (CAS). However, differences in treatment outcomes as a function of treatment intensity have not been systematically studied in this population. Aim: To investigate the effects of treatment intensity on outcome…
Full Text Available Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants’ attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4-13 months of age were exposed to happy-sounding infant-directed speech versus hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children’s song spoken versus sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children’s song versus a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing was the principal contributor to infant attention, regardless of age.
A keynote speech outlining the importance of collaboration and diversity in the workplace. The 20-minute speech describes NASA's challenges and accomplishments over the years and what lies ahead. Topics include: diversity and inclusion principles, international cooperation, Kennedy Space Center planning and development, opportunities for cooperation, and NASA's vision for exploration.
Riis, Søren Kamaric
We evaluate the hidden neural network HMM/NN hybrid on two speech recognition benchmark tasks; (1) task independent isolated word recognition on the Phonebook database, and (2) recognition of broad phoneme classes in continuous speech from the TIMIT database. It is shown how hidden neural networks...
Holzrichter, John F.
A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.
Lalonde, Kaylah; Holt, Rachael Frush
Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…
Harris, Katherine Safford
Speaking is universally acknowledged as an important human talent, yet as a topic of educated common knowledge, it is peculiarly neglected. Partly, this is a consequence of the relatively recent growth of research on speech perception, production, and development, but also a function of the way that information is sliced up by undergraduate colleges. Although the basic acoustic mechanism of vowel production was known to Helmholtz, the ability to view speech production as a physiological event is evolving even now with such techniques as fMRI. Intensive research on speech perception emerged only in the early 1930s as Fletcher and the engineers at Bell Telephone Laboratories developed the transmission of speech over telephone lines. The study of speech development was revolutionized by the papers of Eimas and his colleagues on speech perception in infants in the 1970s. Dissemination of knowledge in these fields is the responsibility of no single academic discipline. It forms a center for two departments, Linguistics, and Speech and Hearing, but in the former, there is a heavy emphasis on other aspects of language than speech and, in the latter, a focus on clinical practice. For psychologists, it is a rather minor component of a very diverse assembly of topics. I will focus on these three fields in proposing possible remedies.
Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret
The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
Purpose: Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method: PM and other…
Morgan, Lewis B.
Introduces counselors to 10 commonly used mannerisms of speech and the part that each mannerism plays in the communication process, especially in the counseling context. Offers suggestions on how to respond to these speech mannerisms in a straightforward and effective manner. (Author)
This paper presents a study of pulmonic ingressive speech, a severely understudied phenomenon within varieties of English. While ingressive speech has been reported for several parts of the British Isles, New England, and eastern Canada, thus far Newfoundland appears to be the only locality where researchers have managed to provide substantial…
Groenewold, Rimke; Bastiaanse, Roelien; Huiskes, Mike
Background: Previous studies have shown that individuals with aphasia are usually able to produce direct reported speech constructions. So far these studies have mainly been conducted in English. The results show that direct speech is beneficial for aphasic speakers for various reasons. In Dutch the
Prelock, Patricia A.; Deppe, Janet
The purpose of this article is to explain the role of speech-language pathology in early intervention. The expected credentials of professionals in the field are described, and the current numbers of practitioners serving young children are identified. Several resource documents available from the American Speech-Language Hearing Association are…
Full Text Available Hidden Markov Model (HMM)-based synthesis in combination with speaker adaptation has proven to be an approach that is well-suited for child speech synthesis. This paper describes the development and evaluation of different HMM-based child speech...
Bender, Brenda K.; Cannito, Michael P.; Murry, Thomas; Woodson, Gayle E.
This study compared speech intelligibility in nondisabled speakers and speakers with adductor spasmodic dysphonia (ADSD) before and after botulinum toxin (Botox) injection. Standard speech samples were obtained from 10 speakers diagnosed with severe ADSD prior to and 1 month following Botox injection, as well as from 10 age- and gender-matched…
Marcus, Laurence R.
This book explores issues typified by a series of hateful speech events at Kean College (New Jersey) and on other U.S. campuses in the early 1990s, by examining the dichotomies that exist between the First and the Fourteenth Amendments and between civil liberties and civil rights, and by contrasting the values of free speech and academic freedom…
New York State Education Dept., Albany. Office of the Professions.
The handbook contains State Education Department rules and regulations that govern speech-language pathology and audiology in New York State. The handbook also describes licensure and first registration as a licensed speech-language pathologist or audiologist. The introduction discusses professional regulation in New York State while the second…
Namasivayam, Aravind K.; Pukonen, Margit; Goshulak, Debra; Hard, Jennifer; Rudzicz, Frank; Rietveld, Toni; Maassen, Ben; Kroll, Robert; van Lieshout, Pascal
BackgroundIntensive treatment has been repeatedly recommended for the treatment of speech deficits in childhood apraxia of speech (CAS). However, differences in treatment outcomes as a function of treatment intensity have not been systematically studied in this population. AimTo investigate the
Theune, Mariet; Walker, M.; Rambow, O.
In concept-to-speech systems, spoken output is generated on the basis of a text that has been produced by the system itself. In such systems, linguistic information from the text generation component may be exploited to achieve a higher prosodic quality of the speech output than can be obtained in a
Jonathan L Preston
Full Text Available Ultrasound imaging is an adjunct to traditional speech therapy that has shown to be beneficial in the remediation of speech sound errors. Ultrasound biofeedback can be utilized during therapy to provide clients additional knowledge about their tongue shapes when attempting to produce sounds that are in error. The additional feedback may assist children with childhood apraxia of speech in stabilizing motor patterns, thereby facilitating more consistent and accurate productions of sounds and syllables. However, due to its specialized nature, ultrasound visual feedback is a technology that is not widely available to clients. Short-term intensive treatment programs are one option that can be utilized to expand access to ultrasound biofeedback. Schema-based motor learning theory suggests that short-term intensive treatment programs (massed practice may assist children in acquiring more accurate motor patterns. In this case series, three participants ages 10-14 diagnosed with childhood apraxia of speech attended 16 hours of speech therapy over a two-week period to address residual speech sound errors. Two participants had distortions on rhotic sounds, while the third participant demonstrated lateralization of sibilant sounds. During therapy, cues were provided to assist participants in obtaining a tongue shape that facilitated a correct production of the erred sound. Additional practice without ultrasound was also included. Results suggested that all participants showed signs of acquisition of sounds in error. Generalization and retention results were mixed. One participant showed generalization and retention of sounds that were treated; one showed generalization but limited retention; and the third showed no evidence of generalization or retention. Individual characteristics that may facilitate generalization are discussed. Short-term intensive treatment programs using ultrasound biofeedback may result in the acquisition of more accurate motor
Basharirad, Babak; Moradhaseli, Mohammadreza
Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.
of the effectiveness of their application in audio processing. The second part of the thesis deals with introducing sparsity directly in the linear prediction analysis-by-synthesis (LPAS) speech coding paradigm. We first propose a novel near-optimal method to look for a sparse approximate excitation using a compressed...... one with direct applications to coding but also consistent with the speech production model of voiced speech, where the excitation of the all-pole filter can be modeled as an impulse train, i.e., a sparse sequence. Introducing sparsity in the LP framework will also bring to de- velop the concept...... sensing formulation. Furthermore, we define a novel re-estimation procedure to adapt the predictor coefficients to the given sparse excitation, balancing the two representations in the context of speech coding. Finally, the advantages of the compact parametric representation of a segment of speech, given...
Jørgensen, Søren; Dau, Torsten
Jørgensen and Dau (J Acoust Soc Am 130:1475-1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech...... subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted -successfully by the spectro-temporal modulation...... suggest that the SNRenv might reflect a powerful decision metric, while some explicit across-frequency analysis seems crucial in some conditions. How such across-frequency analysis is "realized" in the auditory system remains unresolved....
Atmaja, Bagus Tris; Farid, Mifta Nur; Arifianto, Dhany
Speech enhancement is challenging task in audio signal processing to enhance the quality of targeted speech signal while suppress other noises. In the beginning, the speech enhancement algorithm growth rapidly from spectral subtraction, Wiener filtering, spectral amplitude MMSE estimator to Non-negative Matrix Factorization (NMF). Smartphone as revolutionary device now is being used in all aspect of life including journalism; personally and professionally. Although many smartphones have two microphones (main and rear) the only main microphone is widely used for voice recording. This is why the NMF algorithm widely used for this purpose of speech enhancement. This paper evaluate speech enhancement on smartphone voice recording by using some algorithms mentioned previously. We also extend the NMF algorithm to Kulback-Leibler NMF with supervised separation. The last algorithm shows improved result compared to others by spectrogram and PESQ score evaluation. (paper)
Full Text Available Huntington’s disease (HD has been described as a genetic condition caused by a mutation in the CAG (cytosine-adenine-guanine nucleotide sequence. Depending on the stage of the disease, people may have difficulties in speech, language and swallowing. The purpose of this paper is to describe these difficulties in detail, as well as to provide an account on speech and language therapy approach to this condition. Regarding speech, it is worth noticing that characteristics typical of hyperkinetic dysarthria can be found due to underlying choreic movements. The speech of people with HD tends to show shorter sentences, with much simpler syntactic structures, and difficulties in tasks that require complex cognitive processing. Moreover, swallowing may present dysphagia that progresses as the disease develops. A timely, comprehensive and effective speech-language intervention is essential to improve the quality of life of people and contribute to their communicative welfare.
Christopher W Bishop
Full Text Available Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.
) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting...... understanding speech when more than one person is talking, even when reduced audibility has been fully compensated for by a hearing aid. The reasons for these difficulties are not well understood. This presentation highlights recent concepts of the monaural and binaural signal processing strategies employed...... by the normal as well as impaired auditory system. Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII...
Faizel Amri, Umar; Nur Wahidah Nik Hashim, Nik; Hazrin Hany Mohamad Hanif, Noor
In the department of engineering, students are required to fulfil at least 80 percent of class attendance. Conventional method requires student to sign his/her initial on the attendance sheet. However, this method is prone to cheating by having another student signing for their fellow classmate that is absent. We develop our hypothesis according to a verse in the Holy Qur’an (95:4), “We have created men in the best of mould”. Based on the verse, we believe each psychological characteristic of human being is unique and thus, their speech characteristic should be unique. In this paper we present the development of speech biometric-based attendance system. The system requires user’s voice to be installed in the system as trained data and it is saved in the system for registration of the user. The following voice of the user will be the test data in order to verify with the trained data stored in the system. The system uses PSD (Power Spectral Density) and Transition Parameter as the method for feature extraction of the voices. Euclidean and Mahalanobis distances are used in order to verified the user’s voice. For this research, ten subjects of five females and five males were chosen to be tested for the performance of the system. The system performance in term of recognition rate is found to be 60% correct identification of individuals.
Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris
A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production
It is becoming crucial to accurately estimate and monitor speech quality in various ambient environments to guarantee high quality speech communication. This practical hands-on book shows speech intelligibility measurement methods so that the readers can start measuring or estimating speech intelligibility of their own system. The book also introduces subjective and objective speech quality measures, and describes in detail speech intelligibility measurement methods. It introduces a diagnostic rhyme test which uses rhyming word-pairs, and includes: An investigation into the effect of word familiarity on speech intelligibility. Speech intelligibility measurement of localized speech in virtual 3-D acoustic space using the rhyme test. Estimation of speech intelligibility using objective measures, including the ITU standard PESQ measures, and automatic speech recognizers.
There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were
Full Text Available In the past few years, several studies have been directed to understanding the complexity of functional interactions between different brain regions during various human behaviors. Among these, neuroimaging research installed the notion that speech and language require an orchestration of brain regions for comprehension, planning, and integration of a heard sound with a spoken word. However, these studies have been largely limited to mapping the neural correlates of separate speech elements and examining distinct cortical or subcortical circuits involved in different aspects of speech control. As a result, the complexity of the brain network machinery controlling speech and language remained largely unknown. Using graph theoretical analysis of functional MRI (fMRI data in healthy subjects, we quantified the large-scale speech network topology by constructing functional brain networks of increasing hierarchy from the resting state to motor output of meaningless syllables to complex production of real-life speech as well as compared to non-speech-related sequential finger tapping and pure tone discrimination networks. We identified a segregated network of highly connected local neural communities (hubs in the primary sensorimotor and parietal regions, which formed a commonly shared core hub network across the examined conditions, with the left area 4p playing an important role in speech network organization. These sensorimotor core hubs exhibited features of flexible hubs based on their participation in several functional domains across different networks and ability to adaptively switch long-range functional connectivity depending on task content, resulting in a distinct community structure of each examined network. Specifically, compared to other tasks, speech production was characterized by the formation of six distinct neural communities with specialized recruitment of the prefrontal cortex, insula, putamen, and thalamus, which collectively
Parbery-Clark, Alexandra; Strait, Dana L.; Anderson, Samira; Hittner, Emily; Kraus, Nina
Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18–30), we asked whether musical experience benefits an older cohort of musicians (ages 45–65), potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory). Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline. PMID:21589653
Bergmann, Christopher; Sprenger, Simone A; Schmid, Monika S
Fluent speech depends on the availability of well-established linguistic knowledge and routines for speech planning and articulation. A lack of speech fluency in late second-language (L2) learners may point to a deficiency of these representations, due to incomplete acquisition. Experiments on bilingual language processing have shown, however, that there are strong reasons to believe that multilingual speakers experience co-activation of the languages they speak. We have studied to what degree language co-activation affects fluency in the speech of bilinguals, comparing a monolingual German control group with two bilingual groups: 1) first-language (L1) attriters, who have fully acquired German before emigrating to an L2 English environment, and 2) immersed L2 learners of German (L1: English). We have analysed the temporal fluency and the incidence of disfluency markers (pauses, repetitions and self-corrections) in spontaneous film retellings. Our findings show that learners to speak more slowly than controls and attriters. Also, on each count, the speech of at least one of the bilingual groups contains more disfluency markers than the retellings of the control group. Generally speaking, both bilingual groups-learners and attriters-are equally (dis)fluent and significantly more disfluent than the monolingual speakers. Given that the L1 attriters are unaffected by incomplete acquisition, we interpret these findings as evidence for language competition during speech production. Copyright © 2015. Published by Elsevier B.V.
Full Text Available Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18-30, we asked whether musical experience benefits an older cohort of musicians (ages 45-65, potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory. Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline.
Full Text Available Textbooks play an important role in English Language Teaching (ELT, particularly in the English as a Foreign Language (EFL context where it provides the primary linguistic input. The present research was an attempt to comparatively evaluate the Touchstone series in terms of compliment and complaint speech acts. Four Touchstone textbooks (Book 1, Book 2, Book 3, and Book 4 were selected and content analysis was done using Olshtain and Weinbach’s (1993 complaint strategies and Wolfson and Manes’ (1980 classification of compliment. The frequencies and percentages of compliments and complaint speech acts were obtained. Data analysis showed that, first, the total frequency of the complaint speech act was higher in Touchstone, Book 4 than the other three textbooks; second, the frequency of complaint and compliment speech acts in the Writing section was quite low, but the Conversation section had a high frequency of compliment speech act in the Touchstone series; third, the expression of annoyance or disapproval complaint strategy was frequently used in the Touchstone series; fourth, the compliment strategy of ‘noun phrase + looks/is (intensifier adjective’ was very frequent in the Touchstone series; finally, there was a significant difference between the frequencies of the two speech acts, in general, in the four Touchstone textbooks. Considering the weaknesses and strengthens of Touchstone series, implications for teachers, material developers, and textbook writers are provided.
Nanda, Aditi; Koli, Dheeraj; Sharma, Sunanda; Suryavanshi, Shalini; Verma, Mahesh
Surgical resection of soft palate due to cancer affects the effective functioning of the velopharyngeal mechanism (speech and deglutition). With the loss of speech intelligibility, hyper resonance in voice and impaired function of swallowing (due to nasal regurgitation), there is a depreciation in the quality of life of such an individual. In a multidisciplinary setup, the role of a prosthodontist has been described to rehabilitate such patients by fabrication of speech aid prosthesis. The design and method of fabrication of the prosthesis are simple and easy to perform. The use of prosthesis, together with training (of speech) by a speech pathologist resulted in improvement in speech. Furthermore, an improvement in swallowing had been noted, resulting in an improved nutritional intake and general well-being of an individual. The take-home message is that in the treatment of oral cancer, feasible, and rapid rehabilitation should be endeavored in order to make the patient socially more acceptable. The onus lies on the prosthodontist to practise the same in a rapid manner before the moral of the patient becomes low due to the associated stigma of cancer.
Parbery-Clark, Alexandra; Strait, Dana L; Anderson, Samira; Hittner, Emily; Kraus, Nina
Much of our daily communication occurs in the presence of background noise, compromising our ability to hear. While understanding speech in noise is a challenge for everyone, it becomes increasingly difficult as we age. Although aging is generally accompanied by hearing loss, this perceptual decline cannot fully account for the difficulties experienced by older adults for hearing in noise. Decreased cognitive skills concurrent with reduced perceptual acuity are thought to contribute to the difficulty older adults experience understanding speech in noise. Given that musical experience positively impacts speech perception in noise in young adults (ages 18-30), we asked whether musical experience benefits an older cohort of musicians (ages 45-65), potentially offsetting the age-related decline in speech-in-noise perceptual abilities and associated cognitive function (i.e., working memory). Consistent with performance in young adults, older musicians demonstrated enhanced speech-in-noise perception relative to nonmusicians along with greater auditory, but not visual, working memory capacity. By demonstrating that speech-in-noise perception and related cognitive function are enhanced in older musicians, our results imply that musical training may reduce the impact of age-related auditory decline.
Saletta, Meredith; Goffman, Lisa; Ward, Caitlin; Oleson, Jacob
Children with specific language impairment (SLI) show particular deficits in the generation of sequenced action: the quintessential procedural task. Practiced imitation of a sequence may become rote and require reduced procedural memory. This study explored whether speech motor deficits in children with SLI occur generally or only in conditions of high linguistic load, whether speech motor deficits diminish with practice, and whether it is beneficial to incorporate conditions of high load to understand speech production. Children with SLI and typical development participated in a syntactic priming task during which they generated sentences (high linguistic load) and, then, practiced repeating a sentence (low load) across 3 sessions. We assessed phonetic accuracy, speech movement variability, and duration. Children with SLI produced more variable articulatory movements than peers with typical development in the high load condition. The groups converged in the low load condition. Children with SLI continued to show increased articulatory stability over 3 practice sessions. Both groups produced generated sentences with increased duration and variability compared with repeated sentences. Linguistic demands influence speech motor production. Children with SLI show reduced speech motor performance in tasks that require language generation but not when task demands are reduced in rote practice.
Huettig, Falk; Hartsuiker, Robert J.
Theories of verbal self-monitoring generally assume an internal (pre-articulatory) monitoring channel, but there is debate about whether this channel relies on speech perception or on production-internal mechanisms. Perception-based theories predict that listening to one's own inner speech has similar behavioural consequences as listening to…
Heinzen, Christina Carolyn
The current study employed behavioral and event-related potential (ERP) measures to investigate brain plasticity associated with second-language (L2) phonetic learning based on an adaptive computer training program. The program utilized the acoustic characteristics of Infant-Directed Speech (IDS) to train monolingual American English-speaking listeners to perceive Mandarin lexical tones. Behavioral identification and discrimination tasks were conducted using naturally recorded speech, carefully controlled synthetic speech, and non-speech control stimuli. The ERP experiments were conducted with selected synthetic speech stimuli in a passive listening oddball paradigm. Identical pre- and post- tests were administered on nine adult listeners, who completed two-to-three hours of perceptual training. The perceptual training sessions used pair-wise lexical tone identification, and progressed through seven levels of difficulty for each tone pair. The levels of difficulty included progression in speaker variability from one to four speakers and progression through four levels of acoustic exaggeration of duration, pitch range, and pitch contour. Behavioral results for the natural speech stimuli revealed significant training-induced improvement in identification of Tones 1, 3, and 4. Improvements in identification of Tone 4 generalized to novel stimuli as well. Additionally, comparison between discrimination of across-category and within-category stimulus pairs taken from a synthetic continuum revealed a training-induced shift toward more native-like categorical perception of the Mandarin lexical tones. Analysis of the Mismatch Negativity (MMN) responses in the ERP data revealed increased amplitude and decreased latency for pre-attentive processing of across-category discrimination as a result of training. There were also laterality changes in the MMN responses to the non-speech control stimuli, which could reflect reallocation of brain resources in processing pitch patterns
Farshid Tayari Ashtiani
Full Text Available The present study was an attempt to investigate the impact of English verbal songs on connected speech aspects of adult English learners’ speech production. 40 participants were selected based on the results of their performance in a piloted and validated version of NELSON test given to 60 intermediate English learners in a language institute in Tehran. Then they were equally distributed in two control and experimental groups and received a validated pretest of reading aloud and speaking in English. Afterward, the treatment was performed in 18 sessions by singing preselected songs culled based on some criteria such as popularity, familiarity, amount, and speed of speech delivery, etc. In the end, the posttests of reading aloud and speaking in English were administered. The results revealed that the treatment had statistically positive effects on the connected speech aspects of English learners’ speech production at statistical .05 level of significance. Meanwhile, the results represented that there was not any significant difference between the experimental group’s mean scores on the posttests of reading aloud and speaking. It was thus concluded that providing the EFL learners with English verbal songs could positively affect connected speech aspects of both modes of speech production, reading aloud and speaking. The Findings of this study have pedagogical implications for language teachers to be more aware and knowledgeable of the benefits of verbal songs to promote speech production of language learners in terms of naturalness and fluency. Keywords: English Verbal Songs, Connected Speech, Speech Production, Reading Aloud, Speaking
Alsius, Agnès; Navarra, Jordi; Campbell, Ruth; Soto-Faraco, Salvador
One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands.
Spierings, Michelle J; ten Cate, Carel
Variation in pitch, amplitude and rhythm adds crucial paralinguistic information to human speech. Such prosodic cues can reveal information about the meaning or emphasis of a sentence or the emotional state of the speaker. To examine the hypothesis that sensitivity to prosodic cues is language independent and not human specific, we tested prosody perception in a controlled experiment with zebra finches. Using a go/no-go procedure, subjects were trained to discriminate between speech syllables arranged in XYXY patterns with prosodic stress on the first syllable and XXYY patterns with prosodic stress on the final syllable. To systematically determine the salience of the various prosodic cues (pitch, duration and amplitude) to the zebra finches, they were subjected to five tests with different combinations of these cues. The zebra finches generalized the prosodic pattern to sequences that consisted of new syllables and used prosodic features over structural ones to discriminate between stimuli. This strong sensitivity to the prosodic pattern was maintained when only a single prosodic cue was available. The change in pitch was treated as more salient than changes in the other prosodic features. These results show that zebra finches are sensitive to the same prosodic cues known to affect human speech perception. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang
Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
Transcortical aphasia accompanied by echolalia occurs with malacias involving the postero-median part of the frontal lobe which includes the supplementary motor field of Penfield and is nourished by the anterior cerebral artery. The syndrome manifests itself in such cases even in fine detials in the same form as does in Pick's atrophy. The same also holds true for cases in which a tumour involves the region mentioned. Sentences or fragments of sentences are echolalised; tendency to perseveration is very marked. It is hardly, if at all, possible to evaluate the verbal understanding of these patients. Analysis of their behaviour supports the assumption that they have not lost the adaptation to some situations. Echolalia is often associated with forced grasping and other compulsory phenomena. Therefore, it may be interpreted as a sign of disinhibition of the acusticomotor reflex present during the development of the speech. Competition between the intentionality and the appearance of compulsory phenomena greatly depends on the general condition of the patient, particularly on the clarity of consciousness. The integrity of the postero-median part of the frontal lobe is indespensable for a normal reaction by speech to stimuli received from the sensory areas. The influence of the supplementary motor field on speech intention seems to be linked to the dominant hemisphere. In case lesions of the territory of the anterior cerebral artery and the cortico-bulbar neuron system are coexisting in the dominant hemisphere, the speech disturbance shifts to complete motor aphasia. In such cases the pathomechanism is analogous to that of the syndrome of Liepmann, i.e., right-sided hemiparesis with left-sided apraxia. So-called transcortical motor aphasia without echolalia can be caused by loss of stimuli from the sensory fields.
This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.
Al-Talabani, Abdulbasit; Sellahewa, Harin; Jassim, Sabah A.
Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more "natural" than the acted datasets where the subjects attempt to suppress all but one emotion.
Kühn, Simone; Brass, Marcel; Gallinat, Jürgen
The so-called embodiment of communication has attracted considerable interest. Recently a growing number of studies have proposed a link between Broca's area's involvement in action processing and its involvement in speech. The present quantitative meta-analysis set out to test whether neuroimaging studies on imitation and overt speech show overlap within inferior frontal gyrus. By means of activation likelihood estimation (ALE), we investigated concurrence of brain regions activated by object-free hand imitation studies as well as overt speech studies including simple syllable and more complex word production. We found direct overlap between imitation and speech in bilateral pars opercularis (BA 44) within Broca's area. Subtraction analyses revealed no unique localization neither for speech nor for imitation. To verify the potential of ALE subtraction analysis to detect unique involvement within Broca's area, we contrasted the results of a meta-analysis on motor inhibition and imitation and found separable regions involved for imitation. This is the first meta-analysis to compare the neural correlates of imitation and overt speech. The results are in line with the proposed evolutionary roots of speech in imitation.
Ding, Nai; Patel, Aniruddh D; Chen, Lin; Butler, Henry; Luo, Cheng; Poeppel, David
Speech and music have structured rhythms. Here we discuss a major acoustic correlate of spoken and musical rhythms, the slow (0.25-32Hz) temporal modulations in sound intensity and compare the modulation properties of speech and music. We analyze these modulations using over 25h of speech and over 39h of recordings of Western music. We show that the speech modulation spectrum is highly consistent across 9 languages (including languages with typologically different rhythmic characteristics). A different, but similarly consistent modulation spectrum is observed for music, including classical music played by single instruments of different types, symphonic, jazz, and rock. The temporal modulations of speech and music show broad but well-separated peaks around 5 and 2Hz, respectively. These acoustically dominant time scales may be intrinsic features of speech and music, a possibility which should be investigated using more culturally diverse samples in each domain. Distinct modulation timescales for speech and music could facilitate their perceptual analysis and its neural processing. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hardison, Debra M.
The majority of studies in second-language (L2) speech processing have involved unimodal (i.e., auditory) input; however, in many instances, speech communication involves both visual and auditory sources of information. Some researchers have argued that multimodal speech is the primary mode of speech perception (e.g., Rosenblum 2005). Research on…