WorldWideScience

Sample records for audiovisual speech perception

  1. Lip movements affect infants' audiovisual speech perception.

    Science.gov (United States)

    Yeung, H Henny; Werker, Janet F

    2013-05-01

    Speech is robustly audiovisual from early in infancy. Here we show that audiovisual speech perception in 4.5-month-old infants is influenced by sensorimotor information related to the lip movements they make while chewing or sucking. Experiment 1 consisted of a classic audiovisual matching procedure, in which two simultaneously displayed talking faces (visual [i] and [u]) were presented with a synchronous vowel sound (audio /i/ or /u/). Infants' looking patterns were selectively biased away from the audiovisual matching face when the infants were producing lip movements similar to those needed to produce the heard vowel. Infants' looking patterns returned to those of a baseline condition (no lip movements, looking longer at the audiovisual matching face) when they were producing lip movements that did not match the heard vowel. Experiment 2 confirmed that these sensorimotor effects interacted with the heard vowel, as looking patterns differed when infants produced these same lip movements while seeing and hearing a talking face producing an unrelated vowel (audio /a/). These findings suggest that the development of speech perception and speech production may be mutually informative.

  2. Prediction and constraint in audiovisual speech perception.

    Science.gov (United States)

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  3. Prediction and constraint in audiovisual speech perception

    Science.gov (United States)

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  4. Ordinal models of audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2011-01-01

    Audiovisual information is integrated in speech perception. One manifestation of this is the McGurk illusion in which watching the articulating face alters the auditory phonetic percept. Understanding this phenomenon fully requires a computational model with predictive power. Here, we describe...... ordinal models that can account for the McGurk illusion. We compare this type of models to the Fuzzy Logical Model of Perception (FLMP) in which the response categories are not ordered. While the FLMP generally fit the data better than the ordinal model it also employs more free parameters in complex...... experiments when the number of response categories are high as it is for speech perception in general. Testing the predictive power of the models using a form of cross-validation we found that ordinal models perform better than the FLMP. Based on these findings we suggest that ordinal models generally have...

  5. Talker Variability in Audiovisual Speech Perception

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-07-01

    Full Text Available A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition. So far, this talker-variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target-word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  6. The role of visual spatial attention in audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias; Tiippana, K.; Laarni, J.

    2009-01-01

    integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre......-attentive but recent reports have challenged this view. Here we study the effect of visual spatial attention on the McGurk effect. By presenting a movie of two faces symmetrically displaced to each side of a central fixation point and dubbed with a single auditory speech track, we were able to discern the influences...

  7. The contribution of dynamic visual cues to audiovisual speech perception.

    Science.gov (United States)

    Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

    2015-08-01

    Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Arrested Development of Audiovisual Speech Perception in Autism Spectrum Disorders

    Science.gov (United States)

    Stevenson, Ryan A.; Siemann, Justin K.; Woynaroski, Tiffany G.; Schneider, Brittany C.; Eberly, Haley E.; Camarata, Stephen M.; Wallace, Mark T.

    2013-01-01

    Atypical communicative abilities are a core marker of Autism Spectrum Disorders (ASD). A number of studies have shown that, in addition to auditory comprehension differences, individuals with autism frequently show atypical responses to audiovisual speech, suggesting a multisensory contribution to these communicative differences from their typically developing peers. To shed light on possible differences in the maturation of audiovisual speech integration, we tested younger (ages 6-12) and older (ages 13-18) children with and without ASD on a task indexing such multisensory integration. To do this, we used the McGurk effect, in which the pairing of incongruent auditory and visual speech tokens typically results in the perception of a fused percept distinct from the auditory and visual signals, indicative of active integration of the two channels conveying speech information. Whereas little difference was seen in audiovisual speech processing (i.e., reports of McGurk fusion) between the younger ASD and TD groups, there was a significant difference at the older ages. While TD controls exhibited an increased rate of fusion (i.e., integration) with age, children with ASD failed to show this increase. These data suggest arrested development of audiovisual speech integration in ASD. The results are discussed in light of the extant literature and necessary next steps in research. PMID:24218241

  9. Speech-specific audiovisual perception affects identification but not detection of speech

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... this issue, Tuomainen et al. (2005) used sine-wave speech stimuli created from three time-varying sine waves tracking the formants of a natural speech signal. Naïve observers tend not to recognize sine wave speech as speech but become able to decode its phonetic content when informed of the speech...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...

  10. Modeling the Development of Audiovisual Cue Integration in Speech Perception.

    Science.gov (United States)

    Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

    2017-03-21

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

  11. Audiovisual speech perception development at varying levels of perceptual processing

    Science.gov (United States)

    Lalonde, Kaylah; Holt, Rachael Frush

    2016-01-01

    This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children. PMID:27106318

  12. Talker variability in audio-visual speech perception.

    Science.gov (United States)

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  13. Audiovisual Speech Perception and Eye Gaze Behavior of Adults with Asperger Syndrome

    Science.gov (United States)

    Saalasti, Satu; Katsyri, Jari; Tiippana, Kaisa; Laine-Hernandez, Mari; von Wendt, Lennart; Sams, Mikko

    2012-01-01

    Audiovisual speech perception was studied in adults with Asperger syndrome (AS), by utilizing the McGurk effect, in which conflicting visual articulation alters the perception of heard speech. The AS group perceived the audiovisual stimuli differently from age, sex and IQ matched controls. When a voice saying /p/ was presented with a face…

  14. Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception

    Science.gov (United States)

    Wilson, Amanda H.; Alsius, Agnès; Parè, Martin; Munhall, Kevin G.

    2016-01-01

    Purpose: The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method: We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent…

  15. The organization and reorganization of audiovisual speech perception in the first year of life.

    Science.gov (United States)

    Danielson, D Kyle; Bruderer, Alison G; Kandhadai, Padmapriya; Vatikiotis-Bateson, Eric; Werker, Janet F

    2017-04-01

    The period between six and 12 months is a sensitive period for language learning during which infants undergo auditory perceptual attunement, and recent results indicate that this sensitive period may exist across sensory modalities. We tested infants at three stages of perceptual attunement (six, nine, and 11 months) to determine 1) whether they were sensitive to the congruence between heard and seen speech stimuli in an unfamiliar language, and 2) whether familiarization with congruent audiovisual speech could boost subsequent non-native auditory discrimination. Infants at six- and nine-, but not 11-months, detected audiovisual congruence of non-native syllables. Familiarization to incongruent, but not congruent, audiovisual speech changed auditory discrimination at test for six-month-olds but not nine- or 11-month-olds. These results advance the proposal that speech perception is audiovisual from early in ontogeny, and that the sensitive period for audiovisual speech perception may last somewhat longer than that for auditory perception alone.

  16. Gaze-direction-based MEG averaging during audiovisual speech perception

    Directory of Open Access Journals (Sweden)

    Lotta Hirvenkari

    2010-03-01

    Full Text Available To take a step towards real-life-like experimental setups, we simultaneously recorded magnetoencephalographic (MEG signals and subject’s gaze direction during audiovisual speech perception. The stimuli were utterances of /apa/ dubbed onto two side-by-side female faces articulating /apa/ (congruent and /aka/ (incongruent in synchrony, repeated once every 3 s. Subjects (N = 10 were free to decide which face they viewed, and responses were averaged to two categories according to the gaze direction. The right-hemisphere 100-ms response to the onset of the second vowel (N100m’ was a fifth smaller to incongruent than congruent stimuli. The results demonstrate the feasibility of realistic viewing conditions with gaze-based averaging of MEG signals.

  17. Audiovisual integration in speech perception: a multi-stage process

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    Integration of speech signals from ear and eye is a well-known feature of speech perception. This is evidenced by the McGurk illusion in which visual speech alters auditory speech perception and by the advantage observed in auditory speech detection when a visual signal is present. Here we invest...

  18. Self-organizing maps for measuring similarity of audiovisual speech percepts

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich

    . Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled......The goal of this work is to find a way to measure similarity of audiovisual speech percepts. Phoneme-related self-organizing maps (SOM) with a rectangular basis are trained with data material from a (labeled) video film. For the training, a combination of auditory speech features and corresponding...... audio-visual speech percepts and to measure coarticulatory effects....

  19. Audiovisual Speech Perception in Children with Developmental Language Disorder in Degraded Listening Conditions

    Science.gov (United States)

    Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo

    2013-01-01

    Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…

  20. Audio-visual speech perception in noise: Implanted children and young adults versus normal hearing peers.

    Science.gov (United States)

    Taitelbaum-Swead, Riki; Fostick, Leah

    2017-01-01

    The purpose of the current study was to evaluate auditory, visual and audiovisual speech perception abilities among two groups of cochlear implant (CI) users: prelingual children and long-term young adults, as compared to their normal hearing (NH) peers. Prospective cohort study that included 50 participants, divided into two groups of CI (10 children and 10 adults), and two groups of normal hearing peers (15 participants each). Speech stimuli included monosyllabic meaningful and nonsense words in a signal to noise ratio of 0 dB. Speech stimuli were introduced via auditory, visual and audiovisual modalities. (1) CI children and adults show lower speech perception accuracy with background noise in audiovisual and auditory modalities, as compared to NH peers, but significantly higher visual speech perception scores. (2) CI children are superior to CI adults in speech perception in noise via auditory modality, but inferior in the visual one. Both CI children and CI adults had similar audiovisual integration. The findings of the current study show that in spite of the fact that the CI children were implanted bilaterally, at a very young age, and using advanced technology, they still have difficulties in perceiving speech in adverse listening conditions even when adding the visual modality. This suggests that adding audiovisual training might be beneficial for this group by improving their audiovisual integration in difficult listening situations. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  1. Effect of attentional load on audiovisual speech perception: evidence from ERPs.

    Science.gov (United States)

    Alsius, Agnès; Möttönen, Riikka; Sams, Mikko E; Soto-Faraco, Salvador; Tiippana, Kaisa

    2014-01-01

    Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs) generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual, and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e., a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.

  2. Effect of attentional load on audiovisual speech perception: Evidence from ERPs

    Directory of Open Access Journals (Sweden)

    Agnès eAlsius

    2014-07-01

    Full Text Available Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e. a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.

  3. Electrophysiological assessment of audiovisual integration in speech perception

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Dau, Torsten

    with human faces, which all are variations on a relatively constrained space of features, face perception is sensitive to manipulations of the structure of the face, the relation between its segments, and the properties of the segments. Does this sensitivity alter the influence of visual speech......, the mismatch negativity response (MMN). MMN has the property of being evoked when an acoustic stimulus deviates from a learned pattern of stimuli. In three experimental studies, this effect is utilized to track when a coinciding visual signal alters auditory speech perception. Visual speech emanates from...... clear. Another interesting property of speech perception is that it is relatively tolerant towards temporal shifts between acoustic and visual speech signals. Here, behavioral studies report that perception of speech exhibits far greater temporal tolerance than towards non>speech stimuli. Current...

  4. A Possible Neurophysiological Correlate of AudioVisual Binding and Unbinding in Speech Perception

    Directory of Open Access Journals (Sweden)

    Attigodu Chandrashekara eGanesh

    2014-11-01

    Full Text Available Audiovisual speech integration of auditory and visual streams generally ends up in a fusion into a single percept. One classical example is the McGurk effect in which incongruent auditory and visual speech signals may lead to a fused percept different from either visual or auditory inputs. In a previous set of experiments, we showed that if a McGurk stimulus is preceded by an incongruent audiovisual context (composed of incongruent auditory and visual speech materials the amount of McGurk fusion is largely decreased. We interpreted this result in the framework of a two-stage binding and fusion model of audiovisual speech perception, with an early audiovisual binding stage controlling the fusion/decision process and likely to produce unbinding with less fusion if the context is incoherent. In order to provide further electrophysiological evidence for this binding/unbinding stage, early auditory evoked N1/P2 responses were here compared during auditory, congruent and incongruent audiovisual speech perception, according to either prior coherent or incoherent audiovisual contexts. Following the coherent context, in line with previous EEG/MEG studies, visual information in the congruent audiovisual condition was found to modify auditory evoked potentials, with a latency decrease of P2 responses compared to the auditory condition. Importantly, both P2 amplitude and latency in the congruent audiovisual condition increased from the coherent to the incoherent context. Although potential contamination by visual responses from the visual cortex cannot be discarded, our results might provide a possible neurophysiological correlate of early binding/unbinding process applied on audiovisual interactions.

  5. Infant perception of audio-visual speech synchrony in familiar and unfamiliar fluent speech.

    Science.gov (United States)

    Pons, Ferran; Lewkowicz, David J

    2014-06-01

    We investigated the effects of linguistic experience and language familiarity on the perception of audio-visual (A-V) synchrony in fluent speech. In Experiment 1, we tested a group of monolingual Spanish- and Catalan-learning 8-month-old infants to a video clip of a person speaking Spanish. Following habituation to the audiovisually synchronous video, infants saw and heard desynchronized clips of the same video where the audio stream now preceded the video stream by 366, 500, or 666 ms. In Experiment 2, monolingual Catalan and Spanish infants were tested with a video clip of a person speaking English. Results indicated that in both experiments, infants detected a 666 and a 500 ms asynchrony. That is, their responsiveness to A-V synchrony was the same regardless of their specific linguistic experience or familiarity with the tested language. Compared to previous results from infant studies with isolated audiovisual syllables, these results show that infants are more sensitive to A-V temporal relations inherent in fluent speech. Furthermore, the absence of a language familiarity effect on the detection of A-V speech asynchrony at eight months of age is consistent with the broad perceptual tuning usually observed in infant response to linguistic input at this age. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. Audio-Visual Speech Perception: A Developmental ERP Investigation

    Science.gov (United States)

    Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…

  7. Language/Culture Modulates Brain and Gaze Processes in Audiovisual Speech Perception.

    Science.gov (United States)

    Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki

    2016-10-13

    Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs' response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs' early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.

  8. Silent articulation modulates auditory and audiovisual speech perception.

    Science.gov (United States)

    Sato, Marc; Troille, Emilie; Ménard, Lucie; Cathiard, Marie-Agnès; Gracco, Vincent

    2013-06-01

    The concept of an internal forward model that internally simulates the sensory consequences of an action is a central idea in speech motor control. Consistent with this hypothesis, silent articulation has been shown to modulate activity of the auditory cortex and to improve the auditory identification of concordant speech sounds, when embedded in white noise. In the present study, we replicated and extended this behavioral finding by showing that silently articulating a syllable in synchrony with the presentation of a concordant auditory and/or visually ambiguous speech stimulus improves its identification. Our results further demonstrate that, even in the case of perfect perceptual identification, concurrent mouthing of a syllable speeds up the perceptual processing of a concordant speech stimulus. These results reflect multisensory-motor interactions during speech perception and provide new behavioral arguments for internally generated sensory predictions during silent speech production.

  9. Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

    2016-01-01

    Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…

  10. Nonnative audiovisual speech perception in noise: dissociable effects of the speaker and listener.

    Science.gov (United States)

    Xie, Zilong; Yi, Han-Gyol; Chandrasekaran, Bharath

    2014-01-01

    Nonnative speech poses a challenge to speech perception, especially in challenging listening environments. Audiovisual (AV) cues are known to improve native speech perception in noise. The extent to which AV cues benefit nonnative speech perception in noise, however, is much less well-understood. Here, we examined native American English-speaking and native Korean-speaking listeners' perception of English sentences produced by a native American English speaker and a native Korean speaker across a range of signal-to-noise ratios (SNRs;-4 to -20 dB) in audio-only and audiovisual conditions. We employed psychometric function analyses to characterize the pattern of AV benefit across SNRs. For native English speech, the largest AV benefit occurred at intermediate SNR (i.e. -12 dB); but for nonnative English speech, the largest AV benefit occurred at a higher SNR (-4 dB). The psychometric function analyses demonstrated that the AV benefit patterns were different between native and nonnative English speech. The nativeness of the listener exerted negligible effects on the AV benefit across SNRs. However, the nonnative listeners' ability to gain AV benefit in native English speech was related to their proficiency in English. These findings suggest that the native language background of both the speaker and listener clearly modulate the optimal use of AV cues in speech recognition.

  11. Large scale functional brain networks underlying temporal integration of audio-visual speech perception: An EEG study

    OpenAIRE

    G. Vinodh Kumar; Tamesh Halder; Amit Kumar Jaiswal; Abhishek Mukherjee; Dipanjan Roy; Arpan Banerjee

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. How...

  12. Timing in audiovisual speech perception: A mini review and new psychophysical data.

    Science.gov (United States)

    Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory

    2016-02-01

    Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.

  13. The early maximum likelihood estimation model of audiovisual integration in speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2015-01-01

    Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk−MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely...... integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross...... focused on the fuzzy logical model of perception (FLMP), which provides excellent fits to experimental observations but also has been criticized for being too flexible, post hoc and difficult to interpret. The current study introduces the early maximum likelihood estimation (MLE) model of audiovisual...

  14. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

    Directory of Open Access Journals (Sweden)

    Magnus eAlm

    2015-07-01

    Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.

  15. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.

  16. Visual and Auditory Components in the Perception of Asynchronous Audiovisual Speech.

    Science.gov (United States)

    García-Pérez, Miguel A; Alcalá-Quintana, Rocío

    2015-12-01

    Research on asynchronous audiovisual speech perception manipulates experimental conditions to observe their effects on synchrony judgments. Probabilistic models establish a link between the sensory and decisional processes underlying such judgments and the observed data, via interpretable parameters that allow testing hypotheses and making inferences about how experimental manipulations affect such processes. Two models of this type have recently been proposed, one based on independent channels and the other using a Bayesian approach. Both models are fitted here to a common data set, with a subsequent analysis of the interpretation they provide about how experimental manipulations affected the processes underlying perceived synchrony. The data consist of synchrony judgments as a function of audiovisual offset in a speech stimulus, under four within-subjects manipulations of the quality of the visual component. The Bayesian model could not accommodate asymmetric data, was rejected by goodness-of-fit statistics for 8/16 observers, and was found to be nonidentifiable, which renders uninterpretable parameter estimates. The independent-channels model captured asymmetric data, was rejected for only 1/16 observers, and identified how sensory and decisional processes mediating asynchronous audiovisual speech perception are affected by manipulations that only alter the quality of the visual component of the speech signal.

  17. Specialization in audiovisual speech perception: a replication study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    participated in the experiment, which consisted of 3 conditions. In the non-speech condition, observers were trained and tested in their ability to categorize sine wave speech tokens in arbitrary categories. The natural speech condition was similar but used natural speech signals and observers categorized...

  18. Auditory Perceptual Learning for Speech Perception Can Be Enhanced by Audiovisual Training

    Directory of Open Access Journals (Sweden)

    Lynne E Bernstein

    2013-03-01

    Full Text Available Speech perception under audiovisual conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how audiovisual training might benefit or impede auditory perceptual learning speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures in a protocol with a fixed number of trials. In Experiment 1, paired-associates (PA audiovisual (AV training of one group of participants was compared with audio-only (AO training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct. PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called "reverse hierarchy theory" of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early audiovisual speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.

  19. Maladaptive connectivity of Broca's area in schizophrenia during audiovisual speech perception: an fMRI study.

    Science.gov (United States)

    Szycik, G R; Ye, Z; Mohammadi, B; Dillo, W; Te Wildt, B T; Samii, A; Frieling, H; Bleich, S; Münte, T F

    2013-12-03

    Speech comprehension relies on auditory as well as visual information, and is enhanced in healthy subjects, when audiovisual (AV) information is present. Patients with schizophrenia have been reported to have problems regarding this AV integration process, but little is known about which underlying neural processes are altered. Functional magnetic resonance imaging was performed in 15 schizophrenia patients (SP) and 15 healthy controls (HC) to study functional connectivity of Broca's area by means of a beta series correlation method during perception of audiovisually presented bisyllabic German nouns, in which audio and video either matched or did not match. Broca's area of SP showed stronger connectivity with supplementary motor cortex for incongruent trials whereas HC connectivity was stronger for congruent trials. The right posterior superior temporal sulcus (RpSTS) area showed differences in connectivity for congruent and incongruent trials in HC in contrast to SP where the connectivity was similar for both conditions. These smaller differences in connectivity in SP suggest a less adaptive processing of audiovisually congruent and incongruent speech. The findings imply that AV integration problems in schizophrenia are associated with maladaptive connectivity of Broca's and RpSTS area in particular when confronted with incongruent stimuli. Results are discussed in light of recent AV speech perception models. Copyright © 2013 IBRO. Published by Elsevier Ltd. All rights reserved.

  20. Perception of the multisensory coherence of fluent audiovisual speech in infancy: its emergence and the role of experience.

    Science.gov (United States)

    Lewkowicz, David J; Minar, Nicholas J; Tift, Amy H; Brandon, Melissa

    2015-02-01

    To investigate the developmental emergence of the perception of the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8- to 10-, and 12- to 14-month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor 8- to 10-month-old infants exhibited audiovisual matching in that they did not look longer at the matching monologue. In contrast, the 12- to 14-month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, perceived the multisensory coherence of native-language monologues earlier in the test trials than that of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12- to 14-month-olds did not depend on audiovisual synchrony, whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audiovisual synchrony cues are more important in the perception of the multisensory coherence of non-native speech than that of native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Audio-visual speech in noise perception in dyslexia

    NARCIS (Netherlands)

    van Laarhoven, T.; Keetels, M.N.; Schakel, L.; Vroomen, J.

    2017-01-01

    Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and

  2. Effects of congenital hearing loss and cochlear implantation on audiovisual speech perception in infants and children.

    Science.gov (United States)

    Bergeson, Tonya R; Houston, Derek M; Miyamoto, Richard T

    2010-01-01

    Cochlear implantation has recently become available as an intervention strategy for young children with profound hearing impairment. In fact, infants as young as 6 months are now receiving cochlear implants (CIs), and even younger infants are being fitted with hearing aids (HAs). Because early audiovisual experience may be important for normal development of speech perception, it is important to investigate the effects of a period of auditory deprivation and amplification type on multimodal perceptual processes of infants and children. The purpose of this study was to investigate audiovisual perception skills in normal-hearing (NH) infants and children and deaf infants and children with CIs and HAs of similar chronological ages. We used an Intermodal Preferential Looking Paradigm to present the same woman's face articulating two words ("judge" and "back") in temporal synchrony on two sides of a TV monitor, along with an auditory presentation of one of the words. The results showed that NH infants and children spontaneously matched auditory and visual information in spoken words; deaf infants and children with HAs did not integrate the audiovisual information; and deaf infants and children with CIs initially did not initially integrate the audiovisual information but gradually matched the auditory and visual information in spoken words. These results suggest that a period of auditory deprivation affects multimodal perceptual processes that may begin to develop normally after several months of auditory experience.

  3. Perception of the Multisensory Coherence of Fluent Audiovisual Speech in Infancy: Its Emergence & the Role of Experience

    Science.gov (United States)

    Lewkowicz, David J.; Minar, Nicholas J.; Tift, Amy H.; Brandon, Melissa

    2014-01-01

    To investigate the developmental emergence of the ability to perceive the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8–10, and 12–14 month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor the 8–10 month-old infants exhibited audio-visual matching in that neither group exhibited greater looking at the matching monologue. In contrast, the 12–14 month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, they perceived the multisensory coherence of native-language monologues earlier in the test trials than of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12–14 month olds did not depend on audio-visual synchrony whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audio-visual synchrony cues are more important in the perception of the multisensory coherence of non-native than native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. PMID:25462038

  4. Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception

    DEFF Research Database (Denmark)

    Baart, Martijn; Lindborg, Alma Cornelia; Andersen, Tobias S

    2017-01-01

    Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech-induced suppression of P2 amplitude (which is generally taken as a measure...

  5. A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

    Directory of Open Access Journals (Sweden)

    John F Magnotti

    2017-02-01

    Full Text Available Audiovisual speech integration combines information from auditory speech (talker's voice and visual speech (talker's mouth movements to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga, that are integrated to produce a fused percept ("da". This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba. We describe a simplified model of causal inference in multisensory speech perception (CIMS that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.

  6. A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

    Science.gov (United States)

    Magnotti, John F; Beauchamp, Michael S

    2017-02-01

    Audiovisual speech integration combines information from auditory speech (talker's voice) and visual speech (talker's mouth movements) to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga), that are integrated to produce a fused percept ("da"). This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba). We describe a simplified model of causal inference in multisensory speech perception (CIMS) that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.

  7. The effect of combined sensory and semantic components on audio-visual speech perception in older adults.

    Science.gov (United States)

    Maguinness, Corrina; Setti, Annalisa; Burke, Kate E; Kenny, Rose Anne; Newell, Fiona N

    2011-01-01

    Previous studies have found that perception in older people benefits from multisensory over unisensory information. As normal speech recognition is affected by both the auditory input and the visual lip movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual 'blur' compared to audio-visual 'no blur' condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.

  8. The effect of combined sensory and semantic components on audio-visual speech perception in older adults

    Directory of Open Access Journals (Sweden)

    Corrina eMaguinness

    2011-12-01

    Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.

  9. A possible neurophysiological correlate of audiovisual binding and unbinding in speech perception.

    Science.gov (United States)

    Ganesh, Attigodu C; Berthommier, Frédéric; Vilain, Coriandre; Sato, Marc; Schwartz, Jean-Luc

    2014-01-01

    Audiovisual (AV) speech integration of auditory and visual streams generally ends up in a fusion into a single percept. One classical example is the McGurk effect in which incongruent auditory and visual speech signals may lead to a fused percept different from either visual or auditory inputs. In a previous set of experiments, we showed that if a McGurk stimulus is preceded by an incongruent AV context (composed of incongruent auditory and visual speech materials) the amount of McGurk fusion is largely decreased. We interpreted this result in the framework of a two-stage "binding and fusion" model of AV speech perception, with an early AV binding stage controlling the fusion/decision process and likely to produce "unbinding" with less fusion if the context is incoherent. In order to provide further electrophysiological evidence for this binding/unbinding stage, early auditory evoked N1/P2 responses were here compared during auditory, congruent and incongruent AV speech perception, according to either prior coherent or incoherent AV contexts. Following the coherent context, in line with previous electroencephalographic/magnetoencephalographic studies, visual information in the congruent AV condition was found to modify auditory evoked potentials, with a latency decrease of P2 responses compared to the auditory condition. Importantly, both P2 amplitude and latency in the congruent AV condition increased from the coherent to the incoherent context. Although potential contamination by visual responses from the visual cortex cannot be discarded, our results might provide a possible neurophysiological correlate of early binding/unbinding process applied on AV interactions.

  10. Dynamic changes in superior temporal sulcus connectivity during perception of noisy audiovisual speech.

    Science.gov (United States)

    Nath, Audrey R; Beauchamp, Michael S

    2011-02-02

    Humans are remarkably adept at understanding speech, even when it is contaminated by noise. Multisensory integration may explain some of this ability: combining independent information from the auditory modality (vocalizations) and the visual modality (mouth movements) reduces noise and increases accuracy. Converging evidence suggests that the superior temporal sulcus (STS) is a critical brain area for multisensory integration, but little is known about its role in the perception of noisy speech. Behavioral studies have shown that perceptual judgments are weighted by the reliability of the sensory modality: more reliable modalities are weighted more strongly, even if the reliability changes rapidly. We hypothesized that changes in the functional connectivity of STS with auditory and visual cortex could provide a neural mechanism for perceptual reliability weighting. To test this idea, we performed five blood oxygenation level-dependent functional magnetic resonance imaging and behavioral experiments in 34 healthy subjects. We found increased functional connectivity between the STS and auditory cortex when the auditory modality was more reliable (less noisy) and increased functional connectivity between the STS and visual cortex when the visual modality was more reliable, even when the reliability changed rapidly during presentation of successive words. This finding matched the results of a behavioral experiment in which the perception of incongruent audiovisual syllables was biased toward the more reliable modality, even with rapidly changing reliability. Changes in STS functional connectivity may be an important neural mechanism underlying the perception of noisy speech.

  11. Reduced audiovisual integration in synesthesia--evidence from bimodal speech perception.

    Science.gov (United States)

    Sinke, Christopher; Neufeld, Janina; Zedler, Markus; Emrich, Hinderk M; Bleich, Stefan; Münte, Thomas F; Szycik, Gregor R

    2014-03-01

    Recent research suggests synesthesia as a result of a hypersensitive multimodal binding mechanism. To address the question whether multimodal integration is altered in synesthetes in general, grapheme-colour and auditory-visual synesthetes were investigated using speech-related stimulation in two behavioural experiments. First, we used the McGurk illusion to test the strength and number of illusory perceptions in synesthesia. In a second step, we analysed the gain in speech perception coming from seen articulatory movements under acoustically noisy conditions. We used disyllabic nouns as stimulation and varied signal-to-noise ratio of the auditory stream presented concurrently to a matching video of the speaker. We hypothesized that if synesthesia is due to a general hyperbinding mechanism this group of subjects should be more susceptible to McGurk illusions and profit more from the visual information during audiovisual speech perception. The results indicate that there are differences between synesthetes and controls concerning multisensory integration--but in the opposite direction as hypothesized. Synesthetes showed a reduced number of illusions and had a reduced gain in comprehension by viewing matching articulatory movements in comparison to control subjects. Our results indicate that rather than having a hypersensitive binding mechanism, synesthetes show weaker integration of vision and audition. © 2012 The British Psychological Society.

  12. An fMRI Study of Audiovisual Speech Perception Reveals Multisensory Interactions in Auditory Cortex

    National Research Council Canada - National Science Library

    Okada, Kayoko; Venezia, Jonathan H; Matchin, William; Saberi, Kourosh; Hickok, Gregory

    2013-01-01

    .... While previous audiovisual speech studies demonstrate that high-level multisensory interactions occur in pSTS, what remains unclear is how early in the processing hierarchy these multisensory interactions may occur...

  13. An Assessment of Behavioral Dynamic Information Processing Measures in Audiovisual Speech Perception

    Directory of Open Access Journals (Sweden)

    Nicholas eAltieri

    2011-09-01

    Full Text Available Research has shown that visual speech perception can assist accuracy in identification of spoken words. However, little is known about the dynamics of the processing mechanisms involved in audiovisual integration. In particular, architecture and capacity, measured using response time methodologies, have not been investigated. An issue related to architecture concerns whether the auditory and visual sources of the speech signal are integrated early or late. We propose that early integration most naturally corresponds to coactive processing whereas late integration corresponds to separate decisions parallel processing. We implemented the Double Factorial Paradigm (DFP in two studies. First, we carried out a pilot study using a two alternative forced-choice discrimination task to assess architecture, decision rule, and provide a preliminary assessment of capacity (integration efficiency. Next, Experiment 1 was designed to specifically assess audiovisual integration efficiency in an ecologically valid way by including lower auditory S/N ratios and a larger response set size. Results from the pilot study support a separate decisions parallel, late integration model. Results from both studies showed that capacity was severely limited for high auditory S/N ratios. However, Experiment 1 demonstrated that capacity improved as the auditory signal became more degraded. This evidence strongly suggests that integration efficiency is vitally affected by the S/N ratio.

  14. Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training.

    Science.gov (United States)

    Bernstein, Lynne E; Auer, Edward T; Eberhardt, Silvio P; Jiang, Jintao

    2013-01-01

    Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called "reverse hierarchy theory" of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.

  15. Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training

    Science.gov (United States)

    Bernstein, Lynne E.; Auer, Edward T.; Eberhardt, Silvio P.; Jiang, Jintao

    2013-01-01

    Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called “reverse hierarchy theory” of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning. PMID:23515520

  16. Does audiovisual speech offer a fountain of youth for old ears? An event-related brain potential study of age differences in audiovisual speech perception.

    Science.gov (United States)

    Winneke, Axel H; Phillips, Natalie A

    2011-06-01

    The current study addressed the question whether audiovisual (AV) speech can improve speech perception in older and younger adults in a noisy environment. Event-related potentials (ERPs) were recorded to investigate age-related differences in the processes underlying AV speech perception. Participants performed an object categorization task in three conditions, namely auditory-only (A), visual-only (V), and AVspeech. Both age groups revealed an equivalent behavioral AVspeech benefit over unisensory trials. ERP analyses revealed an amplitude reduction of the auditory P1 and N1 on AVspeech trials relative to the summed unisensory (A + V) response in both age groups. These amplitude reductions are interpreted as an indication of multisensory efficiency as fewer neural resources were recruited to achieve better performance. Of interest, the observed P1 amplitude reduction was larger in older adults. Younger and older adults also showed an earlier auditory N1 in AVspeech relative to A and A + V trials, an effect that was again greater in the older adults. The degree of multisensory latency shift was predicted by basic auditory functioning (i.e., higher hearing thresholds were associated with larger latency shifts) in both age groups. Together, the results show that AV speech processing is not only intact in older adults, but that the facilitation of neural responses occurs earlier in and to a greater extent than in younger adults. Thus, older adults appear to benefit more from additional visual speech cues than younger adults, possibly to compensate for more impoverished unisensory inputs because of sensory aging. (c) 2011 APA, all rights reserved.

  17. Audio-Visual Perception of Gender by Infants Emerges Earlier for Adult-Directed Speech.

    Directory of Open Access Journals (Sweden)

    Anne-Raphaëlle Richoz

    Full Text Available Early multisensory perceptual experiences shape the abilities of infants to perform socially-relevant visual categorization, such as the extraction of gender, age, and emotion from faces. Here, we investigated whether multisensory perception of gender is influenced by infant-directed (IDS or adult-directed (ADS speech. Six-, 9-, and 12-month-old infants saw side-by-side silent video-clips of talking faces (a male and a female and heard either a soundtrack of a female or a male voice telling a story in IDS or ADS. Infants participated in only one condition, either IDS or ADS. Consistent with earlier work, infants displayed advantages in matching female relative to male faces and voices. Moreover, the new finding that emerged in the current study was that extraction of gender from face and voice was stronger at 6 months with ADS than with IDS, whereas at 9 and 12 months, matching did not differ for IDS versus ADS. The results indicate that the ability to perceive gender in audiovisual speech is influenced by speech manner. Our data suggest that infants may extract multisensory gender information developmentally earlier when looking at adults engaged in conversation with other adults (i.e., ADS than when adults are directly talking to them (i.e., IDS. Overall, our findings imply that the circumstances of social interaction may shape early multisensory abilities to perceive gender.

  18. Audio-Visual Perception of Gender by Infants Emerges Earlier for Adult-Directed Speech.

    Science.gov (United States)

    Richoz, Anne-Raphaëlle; Quinn, Paul C; Hillairet de Boisferon, Anne; Berger, Carole; Loevenbruck, Hélène; Lewkowicz, David J; Lee, Kang; Dole, Marjorie; Caldara, Roberto; Pascalis, Olivier

    2017-01-01

    Early multisensory perceptual experiences shape the abilities of infants to perform socially-relevant visual categorization, such as the extraction of gender, age, and emotion from faces. Here, we investigated whether multisensory perception of gender is influenced by infant-directed (IDS) or adult-directed (ADS) speech. Six-, 9-, and 12-month-old infants saw side-by-side silent video-clips of talking faces (a male and a female) and heard either a soundtrack of a female or a male voice telling a story in IDS or ADS. Infants participated in only one condition, either IDS or ADS. Consistent with earlier work, infants displayed advantages in matching female relative to male faces and voices. Moreover, the new finding that emerged in the current study was that extraction of gender from face and voice was stronger at 6 months with ADS than with IDS, whereas at 9 and 12 months, matching did not differ for IDS versus ADS. The results indicate that the ability to perceive gender in audiovisual speech is influenced by speech manner. Our data suggest that infants may extract multisensory gender information developmentally earlier when looking at adults engaged in conversation with other adults (i.e., ADS) than when adults are directly talking to them (i.e., IDS). Overall, our findings imply that the circumstances of social interaction may shape early multisensory abilities to perceive gender.

  19. An fMRI Study of Audiovisual Speech Perception Reveals Multisensory Interactions in Auditory Cortex.

    Science.gov (United States)

    Okada, Kayoko; Venezia, Jonathan H; Matchin, William; Saberi, Kourosh; Hickok, Gregory

    2013-01-01

    Research on the neural basis of speech-reading implicates a network of auditory language regions involving inferior frontal cortex, premotor cortex and sites along superior temporal cortex. In audiovisual speech studies, neural activity is consistently reported in posterior superior temporal Sulcus (pSTS) and this site has been implicated in multimodal integration. Traditionally, multisensory interactions are considered high-level processing that engages heteromodal association cortices (such as STS). Recent work, however, challenges this notion and suggests that multisensory interactions may occur in low-level unimodal sensory cortices. While previous audiovisual speech studies demonstrate that high-level multisensory interactions occur in pSTS, what remains unclear is how early in the processing hierarchy these multisensory interactions may occur. The goal of the present fMRI experiment is to investigate how visual speech can influence activity in auditory cortex above and beyond its response to auditory speech. In an audiovisual speech experiment, subjects were presented with auditory speech with and without congruent visual input. Holding the auditory stimulus constant across the experiment, we investigated how the addition of visual speech influences activity in auditory cortex. We demonstrate that congruent visual speech increases the activity in auditory cortex.

  20. An fMRI Study of Audiovisual Speech Perception Reveals Multisensory Interactions in Auditory Cortex.

    Directory of Open Access Journals (Sweden)

    Kayoko Okada

    Full Text Available Research on the neural basis of speech-reading implicates a network of auditory language regions involving inferior frontal cortex, premotor cortex and sites along superior temporal cortex. In audiovisual speech studies, neural activity is consistently reported in posterior superior temporal Sulcus (pSTS and this site has been implicated in multimodal integration. Traditionally, multisensory interactions are considered high-level processing that engages heteromodal association cortices (such as STS. Recent work, however, challenges this notion and suggests that multisensory interactions may occur in low-level unimodal sensory cortices. While previous audiovisual speech studies demonstrate that high-level multisensory interactions occur in pSTS, what remains unclear is how early in the processing hierarchy these multisensory interactions may occur. The goal of the present fMRI experiment is to investigate how visual speech can influence activity in auditory cortex above and beyond its response to auditory speech. In an audiovisual speech experiment, subjects were presented with auditory speech with and without congruent visual input. Holding the auditory stimulus constant across the experiment, we investigated how the addition of visual speech influences activity in auditory cortex. We demonstrate that congruent visual speech increases the activity in auditory cortex.

  1. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study.

    Science.gov (United States)

    Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our

  2. Large scale functional brain networks underlying temporal integration of audio-visual speech perception: An EEG study

    Directory of Open Access Journals (Sweden)

    G. Vinodh Kumar

    2016-10-01

    Full Text Available Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal speech sound (McGurk-effect when presented with incongruent audio-visual (AV speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal and the integrative brain sites in the vicinity of the superior temporal sulcus (STS for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV

  3. Bimodal bilingualism as multisensory training?: Evidence for improved audiovisual speech perception after sign language exposure.

    Science.gov (United States)

    Williams, Joshua T; Darcy, Isabelle; Newman, Sharlene D

    2016-02-15

    The aim of the present study was to characterize effects of learning a sign language on the processing of a spoken language. Specifically, audiovisual phoneme comprehension was assessed before and after 13 weeks of sign language exposure. L2 ASL learners performed this task in the fMRI scanner. Results indicated that L2 American Sign Language (ASL) learners' behavioral classification of the speech sounds improved with time compared to hearing nonsigners. Results indicated increased activation in the supramarginal gyrus (SMG) after sign language exposure, which suggests concomitant increased phonological processing of speech. A multiple regression analysis indicated that learner's rating on co-sign speech use and lipreading ability was correlated with SMG activation. This pattern of results indicates that the increased use of mouthing and possibly lipreading during sign language acquisition may concurrently improve audiovisual speech processing in budding hearing bimodal bilinguals. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Brief Report: Arrested Development of Audiovisual Speech Perception in Autism Spectrum Disorders

    Science.gov (United States)

    Stevenson, Ryan A.; Siemann, Justin K.; Woynaroski, Tiffany G.; Schneider, Brittany C.; Eberly, Haley E.; Camarata, Stephen M.; Wallace, Mark T.

    2014-01-01

    Atypical communicative abilities are a core marker of Autism Spectrum Disorders (ASD). A number of studies have shown that, in addition to auditory comprehension differences, individuals with autism frequently show atypical responses to audiovisual speech, suggesting a multisensory contribution to these communicative differences from their…

  5. Audio-visual speech perception in prelingually deafened Japanese children following sequential bilateral cochlear implantation.

    Science.gov (United States)

    Yamamoto, Ryosuke; Naito, Yasushi; Tona, Risa; Moroto, Saburo; Tamaya, Rinko; Fujiwara, Keizo; Shinohara, Shogo; Takebayashi, Shinji; Kikuchi, Masahiro; Michida, Tetsuhiko

    2017-11-01

    An effect of audio-visual (AV) integration is observed when the auditory and visual stimuli are incongruent (the McGurk effect). In general, AV integration is helpful especially in subjects wearing hearing aids or cochlear implants (CIs). However, the influence of AV integration on spoken word recognition in individuals with bilateral CIs (Bi-CIs) has not been fully investigated so far. In this study, we investigated AV integration in children with Bi-CIs. The study sample included thirty one prelingually deafened children who underwent sequential bilateral cochlear implantation. We assessed their responses to congruent and incongruent AV stimuli with three CI-listening modes: only the 1st CI, only the 2nd CI, and Bi-CIs. The responses were assessed in the whole group as well as in two sub-groups: a proficient group (syllable intelligibility ≥80% with the 1st CI) and a non-proficient group (syllable intelligibility Japanese children who underwent sequential bilateral cochlear implantation exhibit AV integration abilities, both in monaural listening as well as in binaural listening. We also observed a higher influence of visual stimuli on speech perception with the 2nd CI in the non-proficient group, suggesting that Bi-CIs listeners with poorer speech recognition rely on visual information more compared to the proficient subjects to compensate for poorer auditory input. Nevertheless, poorer quality auditory input with the 2nd CI did not interfere with AV integration with binaural listening (with Bi-CIs). Overall, the findings of this study might be used to inform future research to identify the best strategies for speech training using AV integration effectively in prelingually deafened children. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Audiovisual perception of natural speech is impaired in adult dyslexics: an ERP study.

    Science.gov (United States)

    Rüsseler, J; Gerth, I; Heldmann, M; Münte, T F

    2015-02-26

    The present study used event-related brain potentials (ERPs) to investigate audiovisual integration processes in the perception of natural speech in a group of German adult developmental dyslexic readers. Twelve dyslexic and twelve non-dyslexic adults viewed short videos of a male German speaker. Disyllabic German nouns served as stimulus material. The auditory and the visual stimulus streams were segregated to create four conditions: in the congruent condition, the spoken word and the auditory word were identical. In the incongruent condition, the auditory and the visual word (i.e., the lip movements of the utterance) were different. Furthermore, on half of the trials, white noise (45 dB SPL) was superimposed on the auditory trace. Subjects had to say aloud the word they understood after they viewed the video. Behavioral data. Dyslexic readers committed more errors compared to normal readers in the noise conditions, and this effect was particularly present for congruent trials. ERPs showed a distinct N170 component at temporo-parietal electrodes that was smaller in amplitude for dyslexic readers. Both, normal and dyslexic readers, showed a clear effect of noise at centro-parietal electrodes between 300 and 600 ms. An analysis of error trials reflecting audiovisual integration (verbal responses in the incongruent noise condition that are a mix of the visual and the auditory word) revealed more positive ERPs for dyslexic readers at temporo-parietal electrodes 200-500 ms poststimulus. For normal readers, no such effect was present. These findings are discussed as reflecting increased effort in dyslexics under circumstances of distorted acoustic input. The superimposition of noise leads dyslexics to rely more on the integration of auditory and visual input (lip reading). Furthermore, the smaller N170-amplitudes indicate deficits in the processing of moving faces in dyslexic adults. Copyright © 2014 IBRO. Published by Elsevier Ltd. All rights reserved.

  7. Atypical audio-visual speech perception and McGurk effects in children with specific language impairment.

    Science.gov (United States)

    Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric

    2014-01-01

    Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed.

  8. Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition

    Science.gov (United States)

    Stevenson, Ryan A.; Nelms, Caitlin; Baum, Sarah H.; Zurkovsky, Lilia; Barense, Morgan D.; Newhouse, Paul A.; Wallace, Mark T.

    2014-01-01

    Over the next two decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio (SNR). For whole-word recognition, older relative to younger adults showed greater multisensory gains at intermediate SNRs, but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as SNR decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments, and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy aging populations, and that deficits begin to emerge only at the more complex, word-recognition level of speech signals. PMID:25282337

  9. Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition.

    Science.gov (United States)

    Stevenson, Ryan A; Nelms, Caitlin E; Baum, Sarah H; Zurkovsky, Lilia; Barense, Morgan D; Newhouse, Paul A; Wallace, Mark T

    2015-01-01

    Over the next 2 decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio. For whole-word recognition, older adults relative to younger adults showed greater multisensory gains at intermediate SNRs but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as signal-to-noise ratio decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy-aging populations and that deficits begin to emerge only at the more complex word-recognition level of speech signals. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Perception of audiovisual speech synchrony for native and non-native language.

    Science.gov (United States)

    Navarra, Jordi; Alsius, Agnès; Velasco, Ignacio; Soto-Faraco, Salvador; Spence, Charles

    2010-04-06

    To what extent does our prior experience with the correspondence between audiovisual stimuli influence how we subsequently bind them? We addressed this question by testing English and Spanish speakers (having little prior experience of Spanish and English, respectively) on a crossmodal simultaneity judgment (SJ) task with English or Spanish spoken sentences. The results revealed that the visual speech stream had to lead the auditory speech stream by a significantly larger interval in the participants' native language than in the non-native language for simultaneity to be perceived. Critically, the difference in temporal processing between perceiving native vs. non-native language tends to disappear as the amount of experience with the non-native language increases. We propose that this modulation of multisensory temporal processing as a function of prior experience is a consequence of the constraining role that visual information plays in the temporal alignment of audiovisual speech signals. Copyright 2010 Elsevier B.V. All rights reserved.

  11. Contributions of local speech encoding and functional connectivity to audio-visual speech perception.

    Science.gov (United States)

    Giordano, Bruno L; Ince, Robin A A; Gross, Joachim; Schyns, Philippe G; Panzeri, Stefano; Kayser, Christoph

    2017-06-07

    Seeing a speaker's face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker's face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments.

  12. Contributions of local speech encoding and functional connectivity to audio-visual speech perception

    Science.gov (United States)

    Giordano, Bruno L; Ince, Robin A A; Gross, Joachim; Schyns, Philippe G; Panzeri, Stefano; Kayser, Christoph

    2017-01-01

    Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments. DOI: http://dx.doi.org/10.7554/eLife.24763.001 PMID:28590903

  13. Time course of multisensory interactions during audiovisual speech perception in humans: a magnetoencephalographic study.

    Science.gov (United States)

    Möttönen, Riikka; Schürmann, Martin; Sams, Mikko

    2004-06-10

    During social interaction speech is perceived simultaneously by audition and vision. We studied interactions in the processing of auditory (A) and visual (V) speech signals in the human brain by comparing neuromagnetic responses to phonetically congruent audiovisual (AV) syllables with the arithmetic sum of responses to A and V syllables. Differences between AV and A+V responses were found bilaterally in the auditory cortices 150-200 ms and in the right superior temporal sulcus (STS) 250-600 ms after stimulus onset, showing that both sensory-specific and multisensory regions of the human temporal cortices are involved in AV speech processing. Importantly, our results suggest that AV interaction in the auditory cortex precedes that in the multisensory STS region.

  14. Speech-specificity of two audiovisual integration effects

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2010-01-01

    Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...

  15. Attention to touch weakens audiovisual speech integration.

    Science.gov (United States)

    Alsius, Agnès; Navarra, Jordi; Soto-Faraco, Salvador

    2007-11-01

    One of the classic examples of multisensory integration in humans occurs when speech sounds are combined with the sight of corresponding articulatory gestures. Despite the longstanding assumption that this kind of audiovisual binding operates in an attention-free mode, recent findings (Alsius et al. in Curr Biol, 15(9):839-843, 2005) suggest that audiovisual speech integration decreases when visual or auditory attentional resources are depleted. The present study addressed the generalization of this attention constraint by testing whether a similar decrease in multisensory integration is observed when attention demands are imposed on a sensory domain that is not involved in speech perception, such as touch. We measured the McGurk illusion in a dual task paradigm involving a difficult tactile task. The results showed that the percentage of visually influenced responses to audiovisual stimuli was reduced when attention was diverted to a tactile task. This finding is attributed to a modulatory effect on audiovisual integration of speech mediated by supramodal attention limitations. We suggest that the interactions between the attentional system and crossmodal binding mechanisms may be much more extensive and dynamic than it was advanced in previous studies.

  16. An fMRI Study of Audiovisual Speech Perception Reveals Multisensory Interactions in Auditory Cortex: e68959

    National Research Council Canada - National Science Library

    Kayoko Okada; Jonathan H Venezia; William Matchin; Kourosh Saberi; Gregory Hickok

    2013-01-01

    .... While previous audiovisual speech studies demonstrate that high-level multisensory interactions occur in pSTS, what remains unclear is how early in the processing hierarchy these multisensory interactions may occur...

  17. The natural statistics of audiovisual speech.

    Directory of Open Access Journals (Sweden)

    Chandramouli Chandrasekaran

    2009-07-01

    Full Text Available Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2-7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver.

  18. Audiovisual integration of speech in a patient with Broca's Aphasia.

    Science.gov (United States)

    Andersen, Tobias S; Starrfelt, Randi

    2015-01-01

    Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca's aphasia.

  19. Perception of Audio-Visual Speech Synchrony in Spanish-Speaking Children with and without Specific Language Impairment

    Science.gov (United States)

    Pons, Ferran; Andreu, Llorenc; Sanz-Torrent, Monica; Buil-Legaz, Lucia; Lewkowicz, David J.

    2013-01-01

    Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the…

  20. Causal inference of asynchronous audiovisual speech

    Directory of Open Access Journals (Sweden)

    John F Magnotti

    2013-11-01

    Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.

  1. Audiovisual Asynchrony Detection in Human Speech

    Science.gov (United States)

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  2. Audiovisual Discrimination between Laughter and Speech

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audiovisual approach to distinguishing laughter from speech and we show that integrating the information from audio and video leads to an improved reliability of audiovisual approach in

  3. Infants' preference for native audiovisual speech dissociated from congruency preference.

    Directory of Open Access Journals (Sweden)

    Kathleen Shaw

    Full Text Available Although infant speech perception in often studied in isolated modalities, infants' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces. Across two experiments, we tested infants' sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English and non-native (Spanish language. In Experiment 1, infants' looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.

  4. Neural development of networks for audiovisual speech comprehension.

    Science.gov (United States)

    Dick, Anthony Steven; Solodkin, Ana; Small, Steven L

    2010-08-01

    Everyday conversation is both an auditory and a visual phenomenon. While visual speech information enhances comprehension for the listener, evidence suggests that the ability to benefit from this information improves with development. A number of brain regions have been implicated in audiovisual speech comprehension, but the extent to which the neurobiological substrate in the child compares to the adult is unknown. In particular, developmental differences in the network for audiovisual speech comprehension could manifest through the incorporation of additional brain regions, or through different patterns of effective connectivity. In the present study we used functional magnetic resonance imaging and structural equation modeling (SEM) to characterize the developmental changes in network interactions for audiovisual speech comprehension. The brain response was recorded while children 8- to 11-years-old and adults passively listened to stories under audiovisual (AV) and auditory-only (A) conditions. Results showed that in children and adults, AV comprehension activated the same fronto-temporo-parietal network of regions known for their contribution to speech production and perception. However, the SEM network analysis revealed age-related differences in the functional interactions among these regions. In particular, the influence of the posterior inferior frontal gyrus/ventral premotor cortex on supramarginal gyrus differed across age groups during AV, but not A speech. This functional pathway might be important for relating motor and sensory information used by the listener to identify speech sounds. Further, its development might reflect changes in the mechanisms that relate visual speech information to articulatory speech representations through experience producing and perceiving speech. 2009 Elsevier Inc. All rights reserved.

  5. Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

    Science.gov (United States)

    D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

    2016-08-01

    Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Audiovisual integration for speech during mid-childhood: electrophysiological evidence.

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer

    2014-12-01

    Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7-8-year-olds and 10-11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. Copyright © 2014 Elsevier Inc. All rights reserved.

  7. Rapid, generalized adaptation to asynchronous audiovisual speech.

    Science.gov (United States)

    Van der Burg, Erik; Goodbourn, Patrick T

    2015-04-07

    The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity. © 2015 The Author(s) Published by the Royal Society. All rights reserved.

  8. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    DEFF Research Database (Denmark)

    Andersen, Tobias; Starrfelt, Randi

    2015-01-01

    Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech...... perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca......'s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical...

  9. Multisensory Speech Perception in Children with Autism Spectrum Disorders

    OpenAIRE

    Woynaroski, Tiffany G.; Kwakye, Leslie D.; Foss-Feig, Jennifer H.; Stevenson, Ryan A.; Stone, Wendy L.; Wallace, Mark T.

    2013-01-01

    This study examined unisensory and multisensory speech perception in 8–17 year old children with autism spectrum disorders (ASD) and typically developing controls matched on chronological age, sex, and IQ. Consonant– vowel syllables were presented in visual only, auditory only, matched audio-visual, and mismatched audiovisual (“McGurk”) conditions. Participants with ASD displayed deficits in visual only and matched audiovisual speech perception. Additionally, children with ASD reported a visu...

  10. Neural Dynamics of Audiovisual Speech Integration under Variable Listening Conditions: An Individual Participant Analysis

    Directory of Open Access Journals (Sweden)

    Nicholas eAltieri

    2013-09-01

    Full Text Available Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend & Nozawa, 1995, a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of -12 dB, and S/N ratio of -18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude in lower auditory S/N ratios (higher capacity/efficiency compared to the high S/N ratio (low capacity/inefficient integration. The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity.

  11. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    Directory of Open Access Journals (Sweden)

    Tobias Søren Andersen

    2015-04-01

    Full Text Available Lesions to Broca’s area cause aphasia characterised by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca’s area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca’s area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca’s aphasia did not experience the McGurk illusion suggesting that an intact Broca’s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca’s aphasia who experienced the McGurk illusion. This indicates that an intact Broca’s area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca’s area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke’s aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca’s aphasia.

  12. A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography.

    Science.gov (United States)

    Ozker, Muge; Schepers, Inga M; Magnotti, John F; Yoshor, Daniel; Beauchamp, Michael S

    2017-06-01

    Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.

  13. Electrophysiological evidence for speech-specific audiovisual integration

    NARCIS (Netherlands)

    Baart, M.; Stekelenburg, J.J.; Vroomen, J.

    2014-01-01

    Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were

  14. Multistage audiovisual integration of speech: dissociating identification and detection

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    signal. Here we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers...

  15. Multisensory and sensorimotor interactions in speech perception

    OpenAIRE

    Kaisa eTiippana; Riikka eMöttönen; Jean-Luc eSchwartz

    2015-01-01

    International audience; This research topic presents speech as a natural, well-learned, multisensory communication signal, processed by multiple mechanisms. Reflecting the general status of the field, most articles focus on audiovisual speech perception and many utilize the McGurk effect, which arises when discrepant visual and auditory speech stimuli are presented (McGurk and MacDonald, 1976). Tiippana (2014) argues that the McGurk effect can be used as a proxy for multisensory integration p...

  16. Can you McGurk yourself? Self-face and self-voice in audiovisual speech.

    Science.gov (United States)

    Aruffo, Christopher; Shore, David I

    2012-02-01

    We are constantly exposed to our own face and voice, and we identify our own faces and voices as familiar. However, the influence of self-identity upon self-speech perception is still uncertain. Speech perception is a synthesis of both auditory and visual inputs; although we hear our own voice when we speak, we rarely see the dynamic movements of our own face. If visual speech and identity are processed independently, no processing advantage would obtain in viewing one's own highly familiar face. In the present experiment, the relative contributions of facial and vocal inputs to speech perception were evaluated with an audiovisual illusion. Our results indicate that auditory self-speech conveys a processing advantage, whereas visual self-speech does not. The data thereby support a model of visual speech as dynamic movement processed separately from speaker recognition.

  17. The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech.

    Science.gov (United States)

    Crosse, Michael J; Lalor, Edmund C

    2014-04-01

    Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information.

  18. Vision of tongue movements bias auditory speech perception.

    Science.gov (United States)

    D'Ausilio, Alessandro; Bartoli, Eleonora; Maffongelli, Laura; Berry, Jeffrey James; Fadiga, Luciano

    2014-10-01

    Audiovisual speech perception is likely based on the association between auditory and visual information into stable audiovisual maps. Conflicting audiovisual inputs generate perceptual illusions such as the McGurk effect. Audiovisual mismatch effects could be either driven by the detection of violations in the standard audiovisual statistics or via the sensorimotor reconstruction of the distal articulatory event that generated the audiovisual ambiguity. In order to disambiguate between the two hypotheses we exploit the fact that the tongue is hidden to vision. For this reason, tongue movement encoding can solely be learned via speech production but not via others׳ speech perception alone. Here we asked participants to identify speech sounds while matching or mismatching visual representations of tongue movements which were shown. Vision of congruent tongue movements facilitated auditory speech identification with respect to incongruent trials. This result suggests that direct visual experience of an articulator movement is not necessary for the generation of audiovisual mismatch effects. Furthermore, we suggest that audiovisual integration in speech may benefit from speech production learning. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. Atypical audiovisual speech integration in infants at risk for autism.

    Directory of Open Access Journals (Sweden)

    Jeanne A Guiraud

    Full Text Available The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/- audio/ba/and the congruent visual/ba/- audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/- audio/ga/display compared with the congruent visual/ga/- audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16 = 17.153, p = 0.001. The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25 = 0.09, p = 0.767, in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41 = 4.466, p = 0.041. In some cases this reduced ability might lead to the poor communication skills characteristic of autism.

  20. Atypical audiovisual speech integration in infants at risk for autism.

    Science.gov (United States)

    Guiraud, Jeanne A; Tomalski, Przemyslaw; Kushnerenko, Elena; Ribeiro, Helena; Davies, Kim; Charman, Tony; Elsabbagh, Mayada; Johnson, Mark H

    2012-01-01

    The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/- audio/ba/and the congruent visual/ba/- audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/- audio/ga/display compared with the congruent visual/ga/- audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16) = 17.153, p = 0.001). The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25) = 0.09, p = 0.767), in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41) = 4.466, p = 0.041). In some cases this reduced ability might lead to the poor communication skills characteristic of autism.

  1. Cross-modal matching of audio-visual German and French fluent speech in infancy.

    Science.gov (United States)

    Kubicek, Claudia; Hillairet de Boisferon, Anne; Dupierrix, Eve; Pascalis, Olivier; Lœvenbruck, Hélène; Gervain, Judit; Schwarzer, Gudrun

    2014-01-01

    The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants' audio-visual matching ability of native (German) and non-native (French) fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matching native as well as non-native audio and visual speech stimuli, whereas 6-month-olds perceived the audio-visual correspondence of native language stimuli only. This suggests that intersensory matching narrows for fluent speech between 4.5 and 6 months of age. In Experiment 2, auditory and visual speech information was presented simultaneously, therefore, providing temporal synchrony cues. Here, 6-month-olds were found to match native as well as non-native speech indicating facilitation of temporal synchrony cues on the intersensory perception of non-native fluent speech. Intriguingly, despite the fact that audio and visual stimuli cohered temporally, 12-month-olds matched the non-native language only. Results were discussed with regard to multisensory perceptual narrowing during the first year of life.

  2. Cross-modal matching of audio-visual German and French fluent speech in infancy.

    Directory of Open Access Journals (Sweden)

    Claudia Kubicek

    Full Text Available The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants' audio-visual matching ability of native (German and non-native (French fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matching native as well as non-native audio and visual speech stimuli, whereas 6-month-olds perceived the audio-visual correspondence of native language stimuli only. This suggests that intersensory matching narrows for fluent speech between 4.5 and 6 months of age. In Experiment 2, auditory and visual speech information was presented simultaneously, therefore, providing temporal synchrony cues. Here, 6-month-olds were found to match native as well as non-native speech indicating facilitation of temporal synchrony cues on the intersensory perception of non-native fluent speech. Intriguingly, despite the fact that audio and visual stimuli cohered temporally, 12-month-olds matched the non-native language only. Results were discussed with regard to multisensory perceptual narrowing during the first year of life.

  3. Audiovisual perception of congruent and incongruent Dutch front vowels.

    Science.gov (United States)

    Valkenier, Bea; Duyne, Jurriaan Y; Andringa, Tjeerd C; Baskent, Deniz

    2012-12-01

    Auditory perception of vowels in background noise is enhanced when combined with visually perceived speech features. The objective of this study was to investigate whether the influence of visual cues on vowel perception extends to incongruent vowels, in a manner similar to the McGurk effect observed with consonants. Identification of Dutch front vowels /i, y, e, Y/ that share all features other than height and lip-rounding was measured for congruent and incongruent audiovisual conditions. The audio channel was systematically degraded by adding noise, increasing the reliance on visual cues. The height feature was more robustly carried over through the auditory channel and the lip-rounding feature through the visual channel. Hence, congruent audiovisual presentation enhanced identification, while incongruent presentation led to perceptual fusions and thus decreased identification. Visual cues influence the identification of congruent as well as incongruent audiovisual vowels. Incongruent visual information results in perceptual fusions, demonstrating that the McGurk effect can be instigated by long phonemes such as vowels. This result extends to the incongruent presentation of the visually less reliably perceived height. The findings stress the importance of audiovisual congruency in communication devices, such as cochlear implants and videoconferencing tools, where the auditory signal could be degraded.

  4. Multisensory Speech Perception in Children with Autism Spectrum Disorders

    Science.gov (United States)

    Woynaroski, Tiffany G.; Kwakye, Leslie D.; Foss-Feig, Jennifer H.; Stevenson, Ryan A.; Stone, Wendy L.; Wallace, Mark T.

    2013-01-01

    This study examined unisensory and multisensory speech perception in 8-17 year old children with autism spectrum disorders (ASD) and typically developing controls matched on chronological age, sex, and IQ. Consonant-vowel syllables were presented in visual only, auditory only, matched audiovisual, and mismatched audiovisual ("McGurk")…

  5. Early and late beta-band power reflect audiovisual perception in the McGurk illusion.

    Science.gov (United States)

    Roa Romero, Yadira; Senkowski, Daniel; Keil, Julian

    2015-04-01

    The McGurk illusion is a prominent example of audiovisual speech perception and the influence that visual stimuli can have on auditory perception. In this illusion, a visual speech stimulus influences the perception of an incongruent auditory stimulus, resulting in a fused novel percept. In this high-density electroencephalography (EEG) study, we were interested in the neural signatures of the subjective percept of the McGurk illusion as a phenomenon of speech-specific multisensory integration. Therefore, we examined the role of cortical oscillations and event-related responses in the perception of congruent and incongruent audiovisual speech. We compared the cortical activity elicited by objectively congruent syllables with incongruent audiovisual stimuli. Importantly, the latter elicited a subjectively congruent percept: the McGurk illusion. We found that early event-related responses (N1) to audiovisual stimuli were reduced during the perception of the McGurk illusion compared with congruent stimuli. Most interestingly, our study showed a stronger poststimulus suppression of beta-band power (13-30 Hz) at short (0-500 ms) and long (500-800 ms) latencies during the perception of the McGurk illusion compared with congruent stimuli. Our study demonstrates that auditory perception is influenced by visual context and that the subsequent formation of a McGurk illusion requires stronger audiovisual integration even at early processing stages. Our results provide evidence that beta-band suppression at early stages reflects stronger stimulus processing in the McGurk illusion. Moreover, stronger late beta-band suppression in McGurk illusion indicates the resolution of incongruent physical audiovisual input and the formation of a coherent, illusory multisensory percept. Copyright © 2015 the American Physiological Society.

  6. Speech perception as categorization

    National Research Council Canada - National Science Library

    Holt, Lori L; Lotto, Andrew J

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words...

  7. Hearing Impairment and Audiovisual Speech Integration Ability: A Case Study Report

    Directory of Open Access Journals (Sweden)

    Nicholas eAltieri

    2014-07-01

    Full Text Available Research in audiovisual speech perception has demonstrated that sensory factors such as auditory and visual acuity are associated with a listener’s ability to extract and combine auditory and visual speech cues. This case study report examined audiovisual integration using a newly developed measure of capacity in a sample of hearing-impaired listeners. Capacity assessments are unique because they examine the contribution of reaction-time (RT as well as accuracy to determine the extent to which a listener efficiently combines auditory and visual speech cues relative to independent race model predictions. Multisensory speech integration ability was examined in two experiments: An open-set sentence recognition and a closed set speeded-word recognition study that measured capacity. Most germane to our approach, capacity illustrated speed-accuracy tradeoffs that may be predicted by audiometric configuration. Results revealed that some listeners benefit from increased accuracy, but fail to benefit in terms of speed on audiovisual relative to unisensory trials. Conversely, other listeners may not benefit in the accuracy domain but instead show an audiovisual processing time benefit.

  8. Inverse effectiveness and multisensory interactions in visual event-related potentials with audiovisual speech.

    Science.gov (United States)

    Stevenson, Ryan A; Bushmakin, Maxim; Kim, Sunah; Wallace, Mark T; Puce, Aina; James, Thomas W

    2012-07-01

    In recent years, it has become evident that neural responses previously considered to be unisensory can be modulated by sensory input from other modalities. In this regard, visual neural activity elicited to viewing a face is strongly influenced by concurrent incoming auditory information, particularly speech. Here, we applied an additive-factors paradigm aimed at quantifying the impact that auditory speech has on visual event-related potentials (ERPs) elicited to visual speech. These multisensory interactions were measured across parametrically varied stimulus salience, quantified in terms of signal to noise, to provide novel insights into the neural mechanisms of audiovisual speech perception. First, we measured a monotonic increase of the amplitude of the visual P1-N1-P2 ERP complex during a spoken-word recognition task with increases in stimulus salience. ERP component amplitudes varied directly with stimulus salience for visual, audiovisual, and summed unisensory recordings. Second, we measured changes in multisensory gain across salience levels. During audiovisual speech, the P1 and P1-N1 components exhibited less multisensory gain relative to the summed unisensory components with reduced salience, while N1-P2 amplitude exhibited greater multisensory gain as salience was reduced, consistent with the principle of inverse effectiveness. The amplitude interactions were correlated with behavioral measures of multisensory gain across salience levels as measured by response times, suggesting that change in multisensory gain associated with unisensory salience modulations reflects an increased efficiency of visual speech processing.

  9. On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

    Directory of Open Access Journals (Sweden)

    Wesley Mattheyses

    2009-01-01

    Full Text Available Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.

  10. Dynamic Bayesian Networks for Audio-Visual Speech Recognition

    Directory of Open Access Journals (Sweden)

    Liang Luhong

    2002-01-01

    Full Text Available The use of visual features in audio-visual speech recognition (AVSR is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM and the factorial HMM (FHMM, and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.

  11. Processing of Audiovisually Congruent and Incongruent Speech in School-Age Children with a History of Specific Language Impairment: A Behavioral and Event-Related Potentials Study

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer; Macias, Danielle; Gustafson, Dana

    2015-01-01

    Previous studies indicate that at least some aspects of audiovisual speech perception are impaired in children with specific language impairment (SLI). However, whether audiovisual processing difficulties are also present in older children with a history of this disorder is unknown. By combining electrophysiological and behavioral measures, we…

  12. No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag.

    Directory of Open Access Journals (Sweden)

    Jean-Luc Schwartz

    2014-07-01

    Full Text Available An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.

  13. Audiovisual perception of communication problems

    NARCIS (Netherlands)

    Barkhuysen, P.; Krahmer, E.J.; Swerts, M.G.J.; Bell, B.; Marien, I.

    2004-01-01

    We describe three perception studies in which subjects are offered film fragments (without any dialogue context) of speakers interacting with a spoken dialogue system. In half of these fragments, the speaker is or becomes aware of a communication problem. Subjects have to determine by forced choice

  14. The natural statistics of audiovisual speech

    National Research Council Canada - National Science Library

    Chandrasekaran, Chandramouli; Trubanova, Andrea; Stillittano, Sébastien; Caplier, Alice; Ghazanfar, Asif A

    2009-01-01

    .... Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech...

  15. On the Role of Crossmodal Prediction in Audiovisual Emotion Perception

    Directory of Open Access Journals (Sweden)

    Sarah eJessen

    2013-07-01

    Full Text Available Humans rely on multiple sensory modalities to determine the emotional state of others. In fact, such multisensory perception may be one of the mechanisms explaining the ease and efficiency by which others’ emotions are recognized. But how and when exactly do the different modalities interact? One aspect in multisensory perception that has received increasing interest in recent years is the concept of crossmodal prediction. In emotion perception, as in most other settings, visual information precedes the auditory one. Thereby, leading in visual information can facilitate subsequent auditory processing. While this mechanism has often been described in audiovisual speech perception, it has not been addressed so far in audiovisual emotion perception. Based on the current state of the art in (a crossmodal prediction and (b multisensory emotion perception research, we propose that it is essential to consider the former in order to fully understand the latter. Focusing on electroencephalographic (EEG and magnetoencephalographic (MEG studies, we provide a brief overview of the current research in both fields. In discussing these findings, we suggest that emotional visual information may allow for a more reliable prediction of auditory information compared to non-emotional visual information. In support of this hypothesis, we present a re-analysis of a previous data set that shows an inverse correlation between the N1 response in the EEG and the duration of visual emotional but not non-emotional information. If the assumption that emotional content allows for more reliable predictions can be corroborated in future studies, crossmodal prediction is a crucial factor in our understanding of multisensory emotion perception.

  16. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

    Directory of Open Access Journals (Sweden)

    Hwee Ling eLee

    2014-08-01

    Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.

  17. Some Behavioral and Neurobiological Constraints on Theories of Audiovisual Speech Integration: A Review and Suggestions for New Directions

    Science.gov (United States)

    Altieri, Nicholas; Pisoni, David B.; Townsend, James T.

    2012-01-01

    Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield’s feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration. PMID:21968081

  18. Some behavioral and neurobiological constraints on theories of audiovisual speech integration: a review and suggestions for new directions.

    Science.gov (United States)

    Altieri, Nicholas; Pisoni, David B; Townsend, James T

    2011-01-01

    Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield's feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration.

  19. Speech perception and production

    Science.gov (United States)

    Casserly, Elizabeth D.; Pisoni, David B.

    2012-01-01

    Until recently, research in speech perception and speech production has largely focused on the search for psychological and phonetic evidence of discrete, abstract, context-free symbolic units corresponding to phonological segments or phonemes. Despite this common conceptual goal and intimately related objects of study, however, research in these two domains of speech communication has progressed more or less independently for more than 60 years. In this article, we present an overview of the foundational works and current trends in the two fields, specifically discussing the progress made in both lines of inquiry as well as the basic fundamental issues that neither has been able to resolve satisfactorily so far. We then discuss theoretical models and recent experimental evidence that point to the deep, pervasive connections between speech perception and production. We conclude that although research focusing on each domain individually has been vital in increasing our basic understanding of spoken language processing, the human capacity for speech communication is so complex that gaining a full understanding will not be possible until speech perception and production are conceptually reunited in a joint approach to problems shared by both modes. PMID:23946864

  20. BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

    OpenAIRE

    Karpov, A.A.; M. Zelezny

    2014-01-01

    We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar) are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating ...

  1. Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

    Directory of Open Access Journals (Sweden)

    Warrick Roseboom

    Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.

  2. Assessing the effect of physical differences in the articulation of consonants and vowels on audiovisual temporal perception

    Science.gov (United States)

    Vatakis, Argiro; Maragos, Petros; Rodomagoulakis, Isidoros; Spence, Charles

    2012-01-01

    We investigated how the physical differences associated with the articulation of speech affect the temporal aspects of audiovisual speech perception. Video clips of consonants and vowels uttered by three different speakers were presented. The video clips were analyzed using an auditory-visual signal saliency model in order to compare signal saliency and behavioral data. Participants made temporal order judgments (TOJs) regarding which speech-stream (auditory or visual) had been presented first. The sensitivity of participants' TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. We expected that in the case of the place of articulation and roundedness, where the visual-speech signal is more salient, temporal perception of speech would be modulated by the visual-speech signal. No such effect was expected for the manner of articulation or height. The results demonstrate that for place and manner of articulation, participants' temporal percept was affected (although not always significantly) by highly-salient speech-signals with the visual-signals requiring smaller visual-leads at the PSS. This was not the case when height was evaluated. These findings suggest that in the case of audiovisual speech perception, a highly salient visual-speech signal may lead to higher probabilities regarding the identity of the auditory-signal that modulate the temporal window of multisensory integration of the speech-stimulus. PMID:23060756

  3. Assessing the effect of physical differences in the articulation of consonants and vowels on audiovisual temporal perception

    Directory of Open Access Journals (Sweden)

    Argiro eVatakis

    2012-10-01

    Full Text Available We investigated how the physical differences associated with the articulation of speech affect the temporal aspects of audiovisual speech perception. Video clips of consonants and vowels uttered by three different speakers were presented. The video clips were analysed using an auditory-visual signal saliency model in order to compare signal saliency and behavioural data. Participants made temporal order judgments (TOJs regarding which speech-stream (auditory or visual had been presented first. The sensitivity of participants’ TOJs and the point of subjective simultaneity (PSS were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. We expected that in the case of the place of articulation and roundedness, where the visual-speech signal is more salient, temporal perception of speech would be modulated by the visual-speech signal. No such effect was expected for the manner of articulation or height. The results demonstrate that for place and manner of articulation, participants’ temporal percept was affected (although not always significantly by highly-salient speech-signals with the visual-signals requiring smaller visual-leads at the PSS. This was not the case when height was evaluated. These findings suggest that in the case of audiovisual speech perception, a highly salient visual-speech signal may lead to higher probabilities regarding the identity of the auditory-signal that modulate the temporal window of multisensory integration of the speech-stream.

  4. Distinct cortical locations for integration of audiovisual speech and the McGurk effect

    Directory of Open Access Journals (Sweden)

    Laura C. Erickson

    2014-06-01

    Full Text Available Audiovisual (AV speech integration is often studied using the McGurk effect, where the combination of specific incongruent auditory and visual speech cues produces the perception of a third illusory speech percept. Recently, several studies have implicated the posterior superior temporal sulcus (pSTS in the McGurk effect; however, the exact roles of the pSTS and other brain areas in correcting differing AV sensory inputs remain unclear. Using functional magnetic resonance imaging (fMRI in ten participants, we aimed to isolate brain areas specifically involved in processing congruent AV speech and the McGurk effect. Speech stimuli were composed of sounds and/or videos of consonant-vowel tokens resulting in four stimulus classes: congruent AV speech (AVCong, incongruent AV speech resulting in the McGurk effect (AVMcGurk, acoustic-only speech (AO, and visual-only speech (VO. In group- and single-subject-analyses, left pSTS exhibited significantly greater fMRI signal for congruent AV speech (i.e., AVCong trials than for both AO and VO trials. Right superior temporal gyrus, medial prefrontal cortex, and cerebellum were also identified. For McGurk speech (i.e., AVMcGurk trials, two clusters in the left posterior superior temporal gyrus (pSTG, just posterior to Heschl’s gyrus or on its border, exhibited greater fMRI signal than both AO and VO trials. We propose that while some brain areas, such as left pSTS, may be more critical for the integration of AV speech, other areas, such as left pSTG, may generate the corrected or merged percept arising from conflicting auditory and visual cues (i.e., as in the McGurk effect. These findings are consistent with the concept that posterior superior temporal areas represent part of a dorsal auditory stream, which is involved in multisensory integration, sensorimotor control, and optimal state estimation (Rauschecker and Scott, 2009.

  5. Distinct cortical locations for integration of audiovisual speech and the McGurk effect

    Science.gov (United States)

    Erickson, Laura C.; Zielinski, Brandon A.; Zielinski, Jennifer E. V.; Liu, Guoying; Turkeltaub, Peter E.; Leaver, Amber M.; Rauschecker, Josef P.

    2014-01-01

    Audiovisual (AV) speech integration is often studied using the McGurk effect, where the combination of specific incongruent auditory and visual speech cues produces the perception of a third illusory speech percept. Recently, several studies have implicated the posterior superior temporal sulcus (pSTS) in the McGurk effect; however, the exact roles of the pSTS and other brain areas in “correcting” differing AV sensory inputs remain unclear. Using functional magnetic resonance imaging (fMRI) in ten participants, we aimed to isolate brain areas specifically involved in processing congruent AV speech and the McGurk effect. Speech stimuli were composed of sounds and/or videos of consonant–vowel tokens resulting in four stimulus classes: congruent AV speech (AVCong), incongruent AV speech resulting in the McGurk effect (AVMcGurk), acoustic-only speech (AO), and visual-only speech (VO). In group- and single-subject analyses, left pSTS exhibited significantly greater fMRI signal for congruent AV speech (i.e., AVCong trials) than for both AO and VO trials. Right superior temporal gyrus, medial prefrontal cortex, and cerebellum were also identified. For McGurk speech (i.e., AVMcGurk trials), two clusters in the left posterior superior temporal gyrus (pSTG), just posterior to Heschl’s gyrus or on its border, exhibited greater fMRI signal than both AO and VO trials. We propose that while some brain areas, such as left pSTS, may be more critical for the integration of AV speech, other areas, such as left pSTG, may generate the “corrected” or merged percept arising from conflicting auditory and visual cues (i.e., as in the McGurk effect). These findings are consistent with the concept that posterior superior temporal areas represent part of a “dorsal auditory stream,” which is involved in multisensory integration, sensorimotor control, and optimal state estimation (Rauschecker and Scott, 2009). PMID:24917840

  6. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception

    National Research Council Canada - National Science Library

    Treille, Avril; Vilain, Coriandre; Sato, Marc

    2014-01-01

    Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed...

  7. The influence of infant-directed speech on 12-month-olds' intersensory perception of fluent speech.

    Science.gov (United States)

    Kubicek, Claudia; Gervain, Judit; Hillairet de Boisferon, Anne; Pascalis, Olivier; Lœvenbruck, Hélène; Schwarzer, Gudrun

    2014-11-01

    The present study examined whether infant-directed (ID) speech facilitates intersensory matching of audio-visual fluent speech in 12-month-old infants. German-learning infants' audio-visual matching ability of German and French fluent speech was assessed by using a variant of the intermodal matching procedure, with auditory and visual speech information presented sequentially. In Experiment 1, the sentences were spoken in an adult-directed (AD) manner. Results showed that 12-month-old infants did not exhibit a matching performance for the native, nor for the non-native language. However, Experiment 2 revealed that when ID speech stimuli were used, infants did perceive the relation between auditory and visual speech attributes, but only in response to their native language. Thus, the findings suggest that ID speech might have an influence on the intersensory perception of fluent speech and shed further light on multisensory perceptual narrowing. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. Evaluating the influence of the 'unity assumption' on the temporal perception of realistic audiovisual stimuli.

    Science.gov (United States)

    Vatakis, Argiro; Spence, Charles

    2008-01-01

    Vatakis, A. and Spence, C. (in press) [Crossmodal binding: Evaluating the 'unity assumption' using audiovisual speech stimuli. Perception &Psychophysics] recently demonstrated that when two briefly presented speech signals (one auditory and the other visual) refer to the same audiovisual speech event, people find it harder to judge their temporal order than when they refer to different speech events. Vatakis and Spence argued that the 'unity assumption' facilitated crossmodal binding on the former (matching) trials by means of a process of temporal ventriloquism. In the present study, we investigated whether the 'unity assumption' would also affect the binding of non-speech stimuli (video clips of object action or musical notes). The auditory and visual stimuli were presented at a range of stimulus onset asynchronies (SOAs) using the method of constant stimuli. Participants made unspeeded temporal order judgments (TOJs) regarding which modality stream had been presented first. The auditory and visual musical and object action stimuli were either matched (e.g., the sight of a note being played on a piano together with the corresponding sound) or else mismatched (e.g., the sight of a note being played on a piano together with the sound of a guitar string being plucked). However, in contrast to the results of Vatakis and Spence's recent speech study, no significant difference in the accuracy of temporal discrimination performance for the matched versus mismatched video clips was observed. Reasons for this discrepancy are discussed.

  9. Musician advantage for speech-on-speech perception

    NARCIS (Netherlands)

    Başkent, Deniz; Gaudrain, Etienne

    Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level

  10. Face configuration affects speech perception: Evidence from a McGurk mismatch negativity study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; MacDonald, Ewen; Andersen, Tobias

    2015-01-01

    We perceive identity, expression and speech from faces. While perception of identity and expression depends crucially on the configuration of facial features it is less clear whether this holds for visual speech perception. Facial configuration is poorly perceived for upside-down faces...... as demonstrated by the Thatcher illusion in which the orientation of the eyes and mouth with respect to the face is inverted (Thatcherization). This gives the face a grotesque appearance but this is only seen when the face is upright. Thatcherization can likewise disrupt visual speech perception but only when...... the face is upright indicating that facial configuration can be important for visual speech perception. This effect can propagate to auditory speech perception through audiovisual integration so that Thatcherization disrupts the McGurk illusion in which visual speech perception alters perception...

  11. Cued speech for enhancing speech perception and first language development of children with cochlear implants.

    Science.gov (United States)

    Leybaert, Jacqueline; LaSasso, Carol J

    2010-06-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants.

  12. Cued Speech for Enhancing Speech Perception and First Language Development of Children With Cochlear Implants

    Science.gov (United States)

    Leybaert, Jacqueline; LaSasso, Carol J.

    2010-01-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

  13. [Speech audiometry, speech perception and cognitive functions. German version].

    Science.gov (United States)

    Meister, H

    2017-03-01

    Examination of cognitive functions in the framework of speech perception has recently gained increasing scientific and clinical interest. Especially against the background of age-related hearing impairment and cognitive decline potential new perspectives in terms of better individualisation of auditory diagnosis and rehabilitation might arise. This review addresses the relationships of speech audiometry, speech perception and cognitive functions. It presents models of speech perception, discusses associations of neuropsychological with audiometric outcomes and shows recent efforts to consider cognitive functions with speech audiometry.

  14. Eye Can Hear Clearly Now: Inverse Effectiveness in Natural Audiovisual Speech Processing Relies on Long-Term Crossmodal Temporal Integration.

    Science.gov (United States)

    Crosse, Michael J; Di Liberto, Giovanni M; Lalor, Edmund C

    2016-09-21

    Speech comprehension is improved by viewing a speaker's face, especially in adverse hearing conditions, a principle known as inverse effectiveness. However, the neural mechanisms that help to optimize how we integrate auditory and visual speech in such suboptimal conversational environments are not yet fully understood. Using human EEG recordings, we examined how visual speech enhances the cortical representation of auditory speech at a signal-to-noise ratio that maximized the perceptual benefit conferred by multisensory processing relative to unisensory processing. We found that the influence of visual input on the neural tracking of the audio speech signal was significantly greater in noisy than in quiet listening conditions, consistent with the principle of inverse effectiveness. Although envelope tracking during audio-only speech was greatly reduced by background noise at an early processing stage, it was markedly restored by the addition of visual speech input. In background noise, multisensory integration occurred at much lower frequencies and was shown to predict the multisensory gain in behavioral performance at a time lag of ∼250 ms. Critically, we demonstrated that inverse effectiveness, in the context of natural audiovisual (AV) speech processing, relies on crossmodal integration over long temporal windows. Our findings suggest that disparate integration mechanisms contribute to the efficient processing of AV speech in background noise. The behavioral benefit of seeing a speaker's face during conversation is especially pronounced in challenging listening environments. However, the neural mechanisms underlying this phenomenon, known as inverse effectiveness, have not yet been established. Here, we examine this in the human brain using natural speech-in-noise stimuli that were designed specifically to maximize the behavioral benefit of audiovisual (AV) speech. We find that this benefit arises from our ability to integrate multimodal information over

  15. Robot Command Interface Using an Audio-Visual Speech Recognition System

    Science.gov (United States)

    Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

    In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

  16. Alterations in audiovisual simultaneity perception in amblyopia.

    Science.gov (United States)

    Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

    2017-01-01

    Amblyopia is a developmental visual impairment that is increasingly recognized to affect higher-level perceptual and multisensory processes. To further investigate the audiovisual (AV) perceptual impairments associated with this condition, we characterized the temporal interval in which asynchronous auditory and visual stimuli are perceived as simultaneous 50% of the time (i.e., the AV simultaneity window). Adults with unilateral amblyopia (n = 17) and visually normal controls (n = 17) judged the simultaneity of a flash and a click presented with both eyes viewing. The signal onset asynchrony (SOA) varied from 0 ms to 450 ms for auditory-lead and visual-lead conditions. A subset of participants with amblyopia (n = 6) was tested monocularly. Compared to the control group, the auditory-lead side of the AV simultaneity window was widened by 48 ms (36%; p = 0.002), whereas that of the visual-lead side was widened by 86 ms (37%; p = 0.02). The overall mean window width was 500 ms, compared to 366 ms among controls (37% wider; p = 0.002). Among participants with amblyopia, the simultaneity window parameters were unchanged by viewing condition, but subgroup analysis revealed differential effects on the parameters by amblyopia severity, etiology, and foveal suppression status. Possible mechanisms to explain these findings include visual temporal uncertainty, interocular perceptual latency asymmetry, and disruption of normal developmental tuning of sensitivity to audiovisual asynchrony.

  17. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation

    Directory of Open Access Journals (Sweden)

    Briony eBanks

    2015-08-01

    Full Text Available Perceptual adaptation allows humans to understand a variety of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker’s facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese accent with audiovisual or audio-only cues, without separate training. Participants’ eye gaze was recorded to verify that they looked at the speaker’s face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, they do not improve perceptual adaptation.

  18. Voluntary stuttering suppresses true stuttering: a window on the speech perception-production link.

    Science.gov (United States)

    Saltuklaroglu, Tim; Kalinowski, Joseph; Dayalu, Vikram N; Stuart, Andrew; Rastatter, Michael P

    2004-02-01

    In accord with a proposed innate link between speech perception and production (e.g., motor theory), this study provides compelling evidence for the inhibition of stuttering events in people who stutter prior to the initiation of the intended speech act, via both the perception and the production of speech gestures. Stuttering frequency during reading was reduced in 10 adults who stutter by approximately 40% in three of four experimental conditions: (1) following passive audiovisual presentation (i.e., viewing and hearing) of another person producing pseudostuttering (stutter-like syllabic repetitions) and following active shadowing of both (2) pseudostuttered and (3) fluent speech. Stuttering was not inhibited during reading following passive audiovisual presentation of fluent speech. Syllabic repetitions can inhibit stuttering both when produced and when perceived, and we suggest that these elementary stuttering forms may serve as compensatory speech gestures for releasing involuntary stuttering blocks by engaging mirror neuronal systems that are predisposed for fluent gestural imitation.

  19. Audiovisual integration in the human perception of materials.

    Science.gov (United States)

    Fujisaki, Waka; Goda, Naokazu; Motoyoshi, Isamu; Komatsu, Hidehiko; Nishida, Shin'ya

    2014-04-17

    Interest in the perception of the material of objects has been growing. While material perception is a critical ability for animals to properly regulate behavioral interactions with surrounding objects (e.g., eating), little is known about its underlying processing. Vision and audition provide useful information for material perception; using only its visual appearance or impact sound, we can infer what an object is made from. However, what material is perceived when the visual appearance of one material is combined with the impact sound of another, and what are the rules that govern cross-modal integration of material information? We addressed these questions by asking 16 human participants to rate how likely it was that audiovisual stimuli (48 combinations of visual appearances of six materials and impact sounds of eight materials) along with visual-only stimuli and auditory-only stimuli fell into each of 13 material categories. The results indicated strong interactions between audiovisual material perceptions; for example, the appearance of glass paired with a pepper sound is perceived as transparent plastic. Rating material-category likelihoods follow a multiplicative integration rule in that the categories judged to be likely are consistent with both visual and auditory stimuli. On the other hand, rating-material properties, such as roughness and hardness, follow a weighted average rule. Despite a difference in their integration calculations, both rules can be interpreted as optimal Bayesian integration of independent audiovisual estimations for the two types of material judgment, respectively.

  20. Design and realisation of an audiovisual speech activity detector

    NARCIS (Netherlands)

    Van Bree, K.C.

    2006-01-01

    For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will givefalse positives when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach

  1. How can audiovisual pathways enhance the temporal resolution of time-compressed speech in blind subjects?

    Directory of Open Access Journals (Sweden)

    Ingo eHertrich

    2013-08-01

    Full Text Available In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1 the auditory system, (2 supramodal representations, and (3 frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for "reading" texts at ultra-fast speaking rates (> 16 syllables/s, exceeding by far the normal range of 6 syllables/s. An fMRI study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic (MEG measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the "bottleneck" for understanding time-compressed speech seems related to a demand for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments considering cross-modal adjustments in space, time, and object recognition.

  2. How can audiovisual pathways enhance the temporal resolution of time-compressed speech in blind subjects?

    Science.gov (United States)

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2013-01-01

    In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1) the auditory system, (2) supramodal representations, and (3) frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal signal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for "reading" texts at ultra-fast speaking rates (>16 syllables/s), exceeding by far the normal range of 6 syllables/s. A functional magnetic resonance imaging study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the "bottleneck" for understanding time-compressed speech seems related to higher demands for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments on cross-modal adjustments during space, time, and object recognition.

  3. Musician advantage for speech-on-speech perception.

    Science.gov (United States)

    Başkent, Deniz; Gaudrain, Etienne

    2016-03-01

    Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level auditory cognitive functions, such as attention. Indeed, despite the few non-musicians who performed as well as musicians, on a group level, there was a strong musician benefit for speech perception in a speech masker. This benefit does not seem to result from better voice processing and could instead be related to better stream segregation or enhanced cognitive functions.

  4. Multisensory speech perception without the left superior temporal sulcus.

    Science.gov (United States)

    Baum, Sarah H; Martin, Randi C; Hamilton, A Cris; Beauchamp, Michael S

    2012-09-01

    Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. The level of audiovisual print-speech integration deficits in dyslexia.

    Science.gov (United States)

    Kronschnabel, Jens; Brem, Silvia; Maurer, Urs; Brandeis, Daniel

    2014-09-01

    The classical phonological deficit account of dyslexia is increasingly linked to impairments in grapho-phonological conversion, and to dysfunctions in superior temporal regions associated with audiovisual integration. The present study investigates mechanisms of audiovisual integration in typical and impaired readers at the critical developmental stage of adolescence. Congruent and incongruent audiovisual as well as unimodal (visual only and auditory only) material was presented. Audiovisual presentations were single letters and three-letter (consonant-vowel-consonant) stimuli accompanied by matching or mismatching speech sounds. Three-letter stimuli exhibited fast phonetic transitions as in real-life language processing and reading. Congruency effects, i.e. different brain responses to congruent and incongruent stimuli were taken as an indicator of audiovisual integration at a phonetic level (grapho-phonological conversion). Comparisons of unimodal and audiovisual stimuli revealed basic, more sensory aspects of audiovisual integration. By means of these two criteria of audiovisual integration, the generalizability of audiovisual deficits in dyslexia was tested. Moreover, it was expected that the more naturalistic three-letter stimuli are superior to single letters in revealing group differences. Electrophysiological and hemodynamic (EEG and fMRI) data were acquired simultaneously in a simple target detection task. Applying the same statistical models to event-related EEG potentials and fMRI responses allowed comparing the effects detected by the two techniques at a descriptive level. Group differences in congruency effects (congruent against incongruent) were observed in regions involved in grapho-phonological processing, including the left inferior frontal and angular gyri and the inferotemporal cortex. Importantly, such differences also emerged in superior temporal key regions. Three-letter stimuli revealed stronger group differences than single letters. No

  6. The motor theory of speech perception revisited

    National Research Council Canada - National Science Library

    Massaro, Dominic W; Chen, Trevor H

    2008-01-01

    .... We make the counter argument that perceiving speech is not perceiving gestures, that the motor system is not recruited for perceiving speech, and that speech perception can be adequately described...

  7. Using multiple visual tandem streams in audio-visual speech recognition

    OpenAIRE

    Topkaya, İbrahim Saygın; Topkaya, Ibrahim Saygin; Erdoğan, Hakan; Erdogan, Hakan

    2011-01-01

    The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers ...

  8. Audiovisual Asynchrony Detection and Speech Intelligibility in Noise With Moderate to Severe Sensorineural Hearing Impairment

    NARCIS (Netherlands)

    Baskent, Deniz; Bazo, Danny

    2011-01-01

    Objective: The objective of this study is to explore the sensitivity to intermodal asynchrony in audiovisual speech with moderate to severe sensorineural hearing loss. Based on previous studies, two opposing expectations were an increase in sensitivity, as hearing-impaired listeners heavily rely on

  9. Audiovisual discrimination between speech and laughter: Why and when visual information might help

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Past research on automatic laughter classification/detection has focused mainly on audio-based approaches. Here we present an audiovisual approach to distinguishing laughter from speech, and we show that integrating the information from audio and video channels may lead to improved performance over

  10. Audiovisual Perception of Congruent and Incongruent Dutch Front Vowels

    NARCIS (Netherlands)

    Valkenier, Bea; Duyne, Jurriaan Y.; Andringa, Tjeerd C.; Başkent, Deniz

    2012-01-01

    Purpose: Auditory perception of vowels in background noise is enhanced when combined with visually perceived speech features. The objective of this study was to investigate whether the influence of visual cues on vowel perception extends to incongruent vowels, in a manner similar to the McGurk

  11. Audiovisual Perception of Congruent and Incongruent Dutch Front Vowels

    Science.gov (United States)

    Valkenier, Bea; Duyne, Jurriaan Y.; Andringa, Tjeerd C.; Baskent, Deniz

    2012-01-01

    Purpose: Auditory perception of vowels in background noise is enhanced when combined with visually perceived speech features. The objective of this study was to investigate whether the influence of visual cues on vowel perception extends to incongruent vowels, in a manner similar to the McGurk effect observed with consonants. Method:…

  12. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Directory of Open Access Journals (Sweden)

    Yue Zhao

    2012-12-01

    Full Text Available Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.

  13. Audiovisual Temporal Processing and Synchrony Perception in the Rat

    Science.gov (United States)

    Schormans, Ashley L.; Scott, Kaela E.; Vo, Albert M. Q.; Tyker, Anna; Typlt, Marei; Stolzberg, Daniel; Allman, Brian L.

    2017-01-01

    Extensive research on humans has improved our understanding of how the brain integrates information from our different senses, and has begun to uncover the brain regions and large-scale neural activity that contributes to an observer’s ability to perceive the relative timing of auditory and visual stimuli. In the present study, we developed the first behavioral tasks to assess the perception of audiovisual temporal synchrony in rats. Modeled after the parameters used in human studies, separate groups of rats were trained to perform: (1) a simultaneity judgment task in which they reported whether audiovisual stimuli at various stimulus onset asynchronies (SOAs) were presented simultaneously or not; and (2) a temporal order judgment task in which they reported whether they perceived the auditory or visual stimulus to have been presented first. Furthermore, using in vivo electrophysiological recordings in the lateral extrastriate visual (V2L) cortex of anesthetized rats, we performed the first investigation of how neurons in the rat multisensory cortex integrate audiovisual stimuli presented at different SOAs. As predicted, rats (n = 7) trained to perform the simultaneity judgment task could accurately (~80%) identify synchronous vs. asynchronous (200 ms SOA) trials. Moreover, the rats judged trials at 10 ms SOA to be synchronous, whereas the majority (~70%) of trials at 100 ms SOA were perceived to be asynchronous. During the temporal order judgment task, rats (n = 7) perceived the synchronous audiovisual stimuli to be “visual first” for ~52% of the trials, and calculation of the smallest timing interval between the auditory and visual stimuli that could be detected in each rat (i.e., the just noticeable difference (JND)) ranged from 77 ms to 122 ms. Neurons in the rat V2L cortex were sensitive to the timing of audiovisual stimuli, such that spiking activity was greatest during trials when the visual stimulus preceded the auditory by 20–40 ms. Ultimately

  14. Speech audiometry, speech perception, and cognitive functions : English version.

    Science.gov (United States)

    Meister, H

    2017-01-01

    Examination of cognitive functions in the framework of speech perception has recently gained increasing scientific and clinical interest. Especially against the background of age-related hearing impairment and cognitive decline, potential new perspectives in terms of a better individualization of auditory diagnosis and rehabilitation might arise. This review addresses the relationships between speech audiometry, speech perception, and cognitive functions. It presents models of speech perception, discusses associations of neuropsychological and audiometric outcomes, and shows examples of recent efforts undertaken in Germany to consider cognitive functions with speech audiometry.

  15. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

    Directory of Open Access Journals (Sweden)

    Petar S. Aleksic

    2002-11-01

    Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0–30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.

  16. Classifying laughter and speech using audio-visual feature prediction

    NARCIS (Netherlands)

    Petridis, Stavros; Asghar, Ali; Pantic, Maja

    2010-01-01

    In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and

  17. Linguistic experience and audio-visual perception of non-native fricatives.

    Science.gov (United States)

    Wang, Yue; Behne, Dawn M; Jiang, Haisheng

    2008-09-01

    This study examined the effects of linguistic experience on audio-visual (AV) perception of non-native (L2) speech. Canadian English natives and Mandarin Chinese natives differing in degree of English exposure [long and short length of residence (LOR) in Canada] were presented with English fricatives of three visually distinct places of articulation: interdentals nonexistent in Mandarin and labiodentals and alveolars common in both languages. Stimuli were presented in quiet and in a cafe-noise background in four ways: audio only (A), visual only (V), congruent AV (AVc), and incongruent AV (AVi). Identification results showed that overall performance was better in the AVc than in the A or V condition and better in quiet than in cafe noise. While the Mandarin long LOR group approximated the native English patterns, the short LOR group showed poorer interdental identification, more reliance on visual information, and greater AV-fusion with the AVi materials, indicating the failure of L2 visual speech category formation with the short LOR non-natives and the positive effects of linguistic experience with the long LOR non-natives. These results point to an integrated network in AV speech processing as a function of linguistic background and provide evidence to extend auditory-based L2 speech learning theories to the visual domain.

  18. Neural bases of accented speech perception

    Directory of Open Access Journals (Sweden)

    Patti eAdank

    2015-10-01

    Full Text Available The recognition of unfamiliar regional and foreign accents represents a challenging task for the speech perception system (Adank, Evans, Stuart-Smith, & Scott, 2009; Floccia, Goslin, Girard, & Konopczynski, 2006. Despite the frequency with which we encounter such accents, the neural mechanisms supporting successful perception of accented speech are poorly understood. Nonetheless, candidate neural substrates involved in processing speech in challenging listening conditions, including accented speech, are beginning to be identified. This review will outline neural bases associated with perception of accented speech in the light of current models of speech perception, and compare these data to brain areas associated with processing other speech distortions. We will subsequently evaluate competing models of speech processing with regards to neural processing of accented speech. See Cristia et al. (2012 for an in-depth overview of behavioural aspects of accent processing.

  19. Neural bases of accented speech perception.

    Science.gov (United States)

    Adank, Patti; Nuttall, Helen E; Banks, Briony; Kennedy-Higgins, Daniel

    2015-01-01

    The recognition of unfamiliar regional and foreign accents represents a challenging task for the speech perception system (Floccia et al., 2006; Adank et al., 2009). Despite the frequency with which we encounter such accents, the neural mechanisms supporting successful perception of accented speech are poorly understood. Nonetheless, candidate neural substrates involved in processing speech in challenging listening conditions, including accented speech, are beginning to be identified. This review will outline neural bases associated with perception of accented speech in the light of current models of speech perception, and compare these data to brain areas associated with processing other speech distortions. We will subsequently evaluate competing models of speech processing with regards to neural processing of accented speech. See Cristia et al. (2012) for an in-depth overview of behavioral aspects of accent processing.

  20. Production and Perception of Fast Speech

    NARCIS (Netherlands)

    Janse, E.

    2003-01-01

    This thesis reports on a series of experiments investigating how speakers produce and listeners perceive fast speech. The main research question is how the perception of naturally produced fast speech compares to the perception of artificially time-compressed speech. Research has shown that

  1. Neural bases of accented speech perception

    OpenAIRE

    Patti eAdank; Nuttall, Helen E.; Briony eBanks; Dan eKennedy-Higgins

    2015-01-01

    The recognition of unfamiliar regional and foreign accents represents a challenging task for the speech perception system (Adank, Evans, Stuart-Smith, & Scott, 2009; Floccia, Goslin, Girard, & Konopczynski, 2006). Despite the frequency with which we encounter such accents, the neural mechanisms supporting successful perception of accented speech are poorly understood. Nonetheless, candidate neural substrates involved in processing speech in challenging listening conditions, including accented...

  2. The shrink point: audiovisual integration of speech-gesture synchrony

    OpenAIRE

    Kirchhof, Carolin

    2017-01-01

    Up to now, the focus in gesture research has long been on the production of speech-accompanying gestures and on how speech-gesture utterances contribute to communication. An issue that has mostly been neglected is in how far listeners even perceive the gesture-part of a multimodal utterance. For instance, there has been a major focus on the lexico-semiotic connection between spontaneously coproduced gestures and speech in gesture research (e.g., de Ruiter, 2007; Kita & Özyürek, 20...

  3. Infant Perception of Atypical Speech Signals

    Science.gov (United States)

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  4. ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

    Directory of Open Access Journals (Sweden)

    D.V. Ivanko

    2016-05-01

    Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.

  5. Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus.

    Science.gov (United States)

    Venezia, Jonathan H; Vaden, Kenneth I; Rong, Feng; Maddox, Dale; Saberi, Kourosh; Hickok, Gregory

    2017-01-01

    The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.

  6. Effect of hearing loss on semantic access by auditory and audiovisual speech in children.

    Science.gov (United States)

    Jerger, Susan; Tye-Murray, Nancy; Damian, Markus F; Abdi, Hervé

    2013-01-01

    This research studied whether the mode of input (auditory versus audiovisual) influenced semantic access by speech in children with sensorineural hearing impairment (HI). Participants, 31 children with HI and 62 children with normal hearing (NH), were tested with the authors' new multimodal picture word task. Children were instructed to name pictures displayed on a monitor and ignore auditory or audiovisual speech distractors. The semantic content of the distractors was varied to be related versus unrelated to the pictures (e.g., picture distractor of dog-bear versus dog-cheese, respectively). In children with NH, picture-naming times were slower in the presence of semantically related distractors. This slowing, called semantic interference, is attributed to the meaning-related picture-distractor entries competing for selection and control of the response (the lexical selection by competition hypothesis). Recently, a modification of the lexical selection by competition hypothesis, called the competition threshold (CT) hypothesis, proposed that (1) the competition between the picture-distractor entries is determined by a threshold, and (2) distractors with experimentally reduced fidelity cannot reach the CT. Thus, semantically related distractors with reduced fidelity do not produce the normal interference effect, but instead no effect or semantic facilitation (faster picture naming times for semantically related versus unrelated distractors). Facilitation occurs because the activation level of the semantically related distractor with reduced fidelity (1) is not sufficient to exceed the CT and produce interference but (2) is sufficient to activate its concept, which then strengthens the activation of the picture and facilitates naming. This research investigated whether the proposals of the CT hypothesis generalize to the auditory domain, to the natural degradation of speech due to HI, and to participants who are children. Our multimodal picture word task allowed us

  7. Audiovisual Integration in Children Listening to Spectrally Degraded Speech

    Science.gov (United States)

    Maidment, David W.; Kang, Hi Jee; Stewart, Hannah J.; Amitay, Sygal

    2015-01-01

    Purpose: The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Method: Children (n = 69) and adults (n = 15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in…

  8. Multisensory speech perception of young children with profound hearing loss.

    Science.gov (United States)

    Kishon-Rabin, L; Haras, N; Bergman, M

    1997-10-01

    The contribution of a two-channel vibrotactile aid (Trill VTA 2/3, AVR Communications LTD) to the audiovisual perception of speech was evaluated in four young children with profound hearing loss using words and speech pattern contrasts. An intensive, hierarchical, and systematic training program was provided. The results show that the addition of the tactile (T) modality to the auditory and visual (A+V) modalities enhanced speech perception performance significantly on all tests. Specifically, at the end of the training sessions, the tactile supplementation increased word recognition scores in a 44-word, closed-set task by 12 percentage points; detection of consonant in final position by 50 percentage points; detection of sibilant in final position by 30 percentage points; and detection of voicing in final position by 25 percentage points. Significant learning over time was evident for all test materials, in all modalities. As expected, fastest learning (i.e., smallest time constants) was found for the AVT condition. The results of this study provide further evidence that sensory information provided by the tactile modality can enhance speech perception in young children.

  9. Face configuration affects speech perception: Evidence from a McGurk mismatch negativity study.

    Science.gov (United States)

    Eskelund, Kasper; MacDonald, Ewen N; Andersen, Tobias S

    2015-01-01

    We perceive identity, expression and speech from faces. While perception of identity and expression depends crucially on the configuration of facial features it is less clear whether this holds for visual speech perception. Facial configuration is poorly perceived for upside-down faces as demonstrated by the Thatcher illusion in which the orientation of the eyes and mouth with respect to the face is inverted (Thatcherization). This gives the face a grotesque appearance but this is only seen when the face is upright. Thatcherization can likewise disrupt visual speech perception but only when the face is upright indicating that facial configuration can be important for visual speech perception. This effect can propagate to auditory speech perception through audiovisual integration so that Thatcherization disrupts the McGurk illusion in which visual speech perception alters perception of an incongruent acoustic phoneme. This is known as the McThatcher effect. Here we show that the McThatcher effect is reflected in the McGurk mismatch negativity (MMN). The MMN is an event-related potential elicited by a change in auditory perception. The McGurk-MMN can be elicited by a change in auditory perception due to the McGurk illusion without any change in the acoustic stimulus. We found that Thatcherization disrupted a strong McGurk illusion and a correspondingly strong McGurk-MMN only for upright faces. This confirms that facial configuration can be important for audiovisual speech perception. For inverted faces we found a weaker McGurk illusion but, surprisingly, no MMN. We also found no correlation between the strength of the McGurk illusion and the amplitude of the McGurk-MMN. We suggest that this may be due to a threshold effect so that a strong McGurk illusion is required to elicit the McGurk-MMN. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

  10. Sensorimotor influences on speech perception in infancy.

    Science.gov (United States)

    Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F

    2015-11-03

    The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.

  11. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Georgios Mantokoudis

    Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  12. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Science.gov (United States)

    Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D; Senn, Pascal

    2013-01-01

    To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0-500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Higher frame rate (>7 fps), higher camera resolution (>640 × 480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  13. The motor theory of speech perception revisited.

    Science.gov (United States)

    Massaro, Dominic W; Chen, Trevor H

    2008-04-01

    Galantucci, Fowler, and Turvey (2006) have claimed that perceiving speech is perceiving gestures and that the motor system is recruited for perceiving speech. We make the counter argument that perceiving speech is not perceiving gestures, that the motor system is not recruitedfor perceiving speech, and that speech perception can be adequately described by a prototypical pattern recognition model, the fuzzy logical model of perception (FLMP). Empirical evidence taken as support for gesture and motor theory is reconsidered in more detail and in the framework of the FLMR Additional theoretical and logical arguments are made to challenge gesture and motor theory.

  14. Localization of Sublexical Speech Perception Components

    Science.gov (United States)

    Turkeltaub, Peter E.; Coslett, H. Branch

    2010-01-01

    Models of speech perception are in general agreement with respect to the major cortical regions involved, but lack precision with regard to localization and lateralization of processing units. To refine these models we conducted two Activation Likelihood Estimation (ALE) meta-analyses of the neuroimaging literature on sublexical speech perception.…

  15. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception

    OpenAIRE

    Avrill eTreille; Coriandre eVilain; Marc eSato

    2014-01-01

    Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker’s face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant...

  16. Assessing variability in audiovisual speech integration skills using capacity and accuracy measures.

    Science.gov (United States)

    Altieri, Nicholas; Hudock, Daniel

    2014-10-01

    While most normal-hearing listeners rely on the auditory modality to obtain speech information, research has demonstrated the importance that non-auditory modalities have on language recognition during face-to-face communication. The efficient utilization of the visual modality becomes increasingly important in difficult listening conditions, and especially for older and hearing-impaired listeners with sensory or cognitive decline. First, this report will quantify audiovisual integration skills using a recently developed capacity measure that incorporates speed and accuracy. Second, to investigate sensory factors contributing to integration ability, high and low-frequency hearing thresholds will be correlated with capacity, as well as gain measures from sentence recognition. Integration scores were obtained from a within-subjects design using an open-set sentence speech recognition experiment and a closed set speeded-word classification experiment, designed to examine integration (i.e. capacity). A sample of 44 adult listeners without a self-reported history of hearing-loss was recruited. RESULTS demonstrated a significant relationship between measures of audiovisual integration and hearing thresholds. Our data indicated that a listener's ability to integrate auditory and visual speech information in the domains of speed and accuracy is associated with auditory sensory capabilities and possibly other sensory and cognitive factors.

  17. Differential gaze patterns on eyes and mouth during audiovisual speech segmentation

    Directory of Open Access Journals (Sweden)

    Laina G. Lusk

    2016-02-01

    Full Text Available Speech is inextricably multisensory: both auditory and visual components provide critical information for all aspects of speech processing, including speech segmentation, the visual components of which have been the target of a growing number of studies. In particular, a recent study (Mitchel & Weiss, 2014 established that adults can utilize facial cues (i.e. visual prosody to identify word boundaries in fluent speech. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2014. Subjects spent the most time watching the eyes and mouth. A significant trend in gaze durations was found with the longest gaze duration on the mouth, followed by the eyes and then the nose. In addition, eye-gaze patterns changed across familiarization as subjects learned the word boundaries, showing decreased attention to the mouth in later blocks while attention on other facial features remained consistent. These findings highlight the importance of the visual component of speech processing and suggest that the mouth may play a critical role in visual speech segmentation.

  18. Differential Gaze Patterns on Eyes and Mouth During Audiovisual Speech Segmentation.

    Science.gov (United States)

    Lusk, Laina G; Mitchel, Aaron D

    2016-01-01

    Speech is inextricably multisensory: both auditory and visual components provide critical information for all aspects of speech processing, including speech segmentation, the visual components of which have been the target of a growing number of studies. In particular, a recent study (Mitchel and Weiss, 2014) established that adults can utilize facial cues (i.e., visual prosody) to identify word boundaries in fluent speech. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2014). Subjects spent the most time watching the eyes and mouth. A significant trend in gaze durations was found with the longest gaze duration on the mouth, followed by the eyes and then the nose. In addition, eye-gaze patterns changed across familiarization as subjects learned the word boundaries, showing decreased attention to the mouth in later blocks while attention on other facial features remained consistent. These findings highlight the importance of the visual component of speech processing and suggest that the mouth may play a critical role in visual speech segmentation.

  19. Development of a test battery for evaluating speech perception in complex listening environments.

    Science.gov (United States)

    Brungart, Douglas S; Sheffield, Benjamin M; Kubli, Lina R

    2014-08-01

    In the real world, spoken communication occurs in complex environments that involve audiovisual speech cues, spatially separated sound sources, reverberant listening spaces, and other complicating factors that influence speech understanding. However, most clinical tools for assessing speech perception are based on simplified listening environments that do not reflect the complexities of real-world listening. In this study, speech materials from the QuickSIN speech-in-noise test by Killion, Niquette, Gudmundsen, Revit, and Banerjee [J. Acoust. Soc. Am. 116, 2395-2405 (2004)] were modified to simulate eight listening conditions spanning the range of auditory environments listeners encounter in everyday life. The standard QuickSIN test method was used to estimate 50% speech reception thresholds (SRT50) in each condition. A method of adjustment procedure was also used to obtain subjective estimates of the lowest signal-to-noise ratio (SNR) where the listeners were able to understand 100% of the speech (SRT100) and the highest SNR where they could detect the speech but could not understand any of the words (SRT0). The results show that the modified materials maintained most of the efficiency of the QuickSIN test procedure while capturing performance differences across listening conditions comparable to those reported in previous studies that have examined the effects of audiovisual cues, binaural cues, room reverberation, and time compression on the intelligibility of speech.

  20. BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

    Directory of Open Access Journals (Sweden)

    A. A. Karpov

    2014-09-01

    Full Text Available We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating all the components for generation of audio, visual and signed speech. The proposed system performs automatic translation of input textual information into speech (audio information and gestures (video information, information fusion and its output in the form of multimedia information. A user can input any grammatically correct text in Russian or Czech languages to the system; it is analyzed by the text processor to detect sentences, words and characters. Then this textual information is converted into symbols of the sign language notation. We apply international «Hamburg Notation System» - HamNoSys, which describes the main differential features of each manual sign: hand shape, hand orientation, place and type of movement. On their basis the 3D signing avatar displays the elements of the sign language. The virtual 3D model of human’s head and upper body has been created using VRML virtual reality modeling language, and it is controlled by the software based on OpenGL graphical library. The developed multimodal synthesis system is a universal one since it is oriented for both regular users and disabled people (in particular, for the hard-of-hearing and visually impaired, and it serves for multimedia output (by audio and visual modalities of input textual information.

  1. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception.

    Science.gov (United States)

    Treille, Avril; Vilain, Coriandre; Sato, Marc

    2014-01-01

    Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker's face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant and predictive visual and haptic cues may facilitate auditory speech processing. To further investigate this hypothesis, auditory evoked potentials were here compared during auditory-only, audio-visual and audio-haptic speech perception in live dyadic interactions between a listener and a speaker. In line with previous studies, auditory evoked potentials were attenuated and speeded up during both audio-haptic and audio-visual compared to auditory speech perception. Importantly, the observed latency and amplitude reduction did not significantly depend on the degree of visual and haptic recognition of the speech targets. Altogether, these results further demonstrate cross-modal interactions between the auditory, visual and haptic speech signals. Although they do not contradict the hypothesis that visual and haptic sensory inputs convey predictive information with respect to the incoming auditory speech input, these results suggest that, at least in live conversational interactions, systematic conclusions on sensory predictability in bimodal speech integration have to be taken with caution, with the extraction of predictive cues likely depending on the variability of the speech stimuli.

  2. Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition

    Directory of Open Access Journals (Sweden)

    Berthommier Frédéric

    2002-01-01

    Full Text Available It has been shown that integration of acoustic and visual information especially in noisy conditions yields improved speech recognition results. This raises the question of how to weight the two modalities in different noise conditions. Throughout this paper we develop a weighting process adaptive to various background noise situations. In the presented recognition system, audio and video data are combined following a Separate Integration (SI architecture. A hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM system is used for the experiments. The neural networks were in all cases trained on clean data. Firstly, we evaluate the performance of different weighting schemes in a manually controlled recognition task with different types of noise. Next, we compare different criteria to estimate the reliability of the audio stream. Based on this, a mapping between the measurements and the free parameter of the fusion process is derived and its applicability is demonstrated. Finally, the possibilities and limitations of adaptive weighting are compared and discussed.

  3. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  4. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Iwano Koji

    2007-01-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  5. Language Specific Speech Perception and the Onset of Reading.

    Science.gov (United States)

    Burnham, Denis

    2003-01-01

    Investigates the degree to which native speech perception is superior to non-native speech perception. Shows that language specific speech perception is a linguistic rather than an acoustic phenomenon. Discusses results in terms of early speech perception abilities, experience with oral communication, cognitive ability, alphabetic versus…

  6. Audiovisual speech integration does not rely on the motor system: evidence from articulatory suppression, the McGurk effect, and fMRI.

    Science.gov (United States)

    Matchin, William; Groulx, Kier; Hickok, Gregory

    2014-03-01

    Visual speech influences the perception of heard speech. A classic example of this is the McGurk effect, whereby an auditory /pa/ overlaid onto a visual /ka/ induces the fusion percept of /ta/. Recent behavioral and neuroimaging research has highlighted the importance of both articulatory representations and motor speech regions of the brain, particularly Broca's area, in audiovisual (AV) speech integration. Alternatively, AV speech integration may be accomplished by the sensory system through multisensory integration in the posterior STS. We assessed the claims regarding the involvement of the motor system in AV integration in two experiments: (i) examining the effect of articulatory suppression on the McGurk effect and (ii) determining if motor speech regions show an AV integration profile. The hypothesis regarding experiment (i) is that if the motor system plays a role in McGurk fusion, distracting the motor system through articulatory suppression should result in a reduction of McGurk fusion. The results of experiment (i) showed that articulatory suppression results in no such reduction, suggesting that the motor system is not responsible for the McGurk effect. The hypothesis of experiment (ii) was that if the brain activation to AV speech in motor regions (such as Broca's area) reflects AV integration, the profile of activity should reflect AV integration: AV > AO (auditory only) and AV > VO (visual only). The results of experiment (ii) demonstrate that motor speech regions do not show this integration profile, whereas the posterior STS does. Instead, activity in motor regions is task dependent. The combined results suggest that AV speech integration does not rely on the motor system.

  7. Individual differneces in degraded speech perception

    Science.gov (United States)

    Carbonell, Kathy M.

    One of the lasting concerns in audiology is the unexplained individual differences in speech perception performance even for individuals with similar audiograms. One proposal is that there are cognitive/perceptual individual differences underlying this vulnerability and that these differences are present in normal hearing (NH) individuals but do not reveal themselves in studies that use clear speech produced in quiet (because of a ceiling effect). However, previous studies have failed to uncover cognitive/perceptual variables that explain much of the variance in NH performance on more challenging degraded speech tasks. This lack of strong correlations may be due to either examining the wrong measures (e.g., working memory capacity) or to there being no reliable differences in degraded speech performance in NH listeners (i.e., variability in performance is due to measurement noise). The proposed project has 3 aims; the first, is to establish whether there are reliable individual differences in degraded speech performance for NH listeners that are sustained both across degradation types (speech in noise, compressed speech, noise-vocoded speech) and across multiple testing sessions. The second aim is to establish whether there are reliable differences in NH listeners' ability to adapt their phonetic categories based on short-term statistics both across tasks and across sessions; and finally, to determine whether performance on degraded speech perception tasks are correlated with performance on phonetic adaptability tasks, thus establishing a possible explanatory variable for individual differences in speech perception for NH and hearing impaired listeners.

  8. Neural pathways for visual speech perception.

    Science.gov (United States)

    Bernstein, Lynne E; Liebenthal, Einat

    2014-01-01

    This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  9. Neural pathways for visual speech perception

    Directory of Open Access Journals (Sweden)

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  10. Neural pathways for visual speech perception

    Science.gov (United States)

    Bernstein, Lynne E.; Liebenthal, Einat

    2014-01-01

    This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611

  11. Speech perception as an active cognitive process

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-03-01

    Full Text Available One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processingd with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or

  12. The Neural Substrates of Infant Speech Perception

    Science.gov (United States)

    Homae, Fumitaka; Watanabe, Hama; Taga, Gentaro

    2014-01-01

    Infants often pay special attention to speech sounds, and they appear to detect key features of these sounds. To investigate the neural foundation of speech perception in infants, we measured cortical activation using near-infrared spectroscopy. We presented the following three types of auditory stimuli while 3-month-old infants watched a silent…

  13. Audio-visual perception of new wind parks

    OpenAIRE

    Yu, T.; Behm, H.; Bill, R.; Kang, J.

    2017-01-01

    Previous studies have reported negative impacts of wind parks on the public. These studies considered the noise levels or visual levels separately but not audio-visual interactive factors. This study investigated the audio-visual impact of a new wind park using virtual technology that combined audio and visual features of the environment. Participants were immersed through Google Cardboard in an actual landscape without wind parks (ante operam) and in the same landscape with wind parks (post ...

  14. The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

    Science.gov (United States)

    Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

    2017-01-01

    We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework

  15. Sensorimotor influences on speech perception in infancy

    Science.gov (United States)

    Bruderer, Alison G.; Danielson, D. Kyle; Kandhadai, Padmapriya; Werker, Janet F.

    2015-01-01

    The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception–production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants’ speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants’ tongues. With a looking-time procedure, we found that temporarily restraining infants’ articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral–motor movements influence speech sound discrimination. Moreover, an experimentally induced “impairment” in articulator movement can compromise speech perception performance, raising the question of whether long-term oral–motor impairments may impact perceptual development. PMID:26460030

  16. Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

    Science.gov (United States)

    Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

    2011-01-01

    Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…

  17. Portable Tactile Aids for Speech Perception.

    Science.gov (United States)

    Lynch, Michael P.; And Others

    1989-01-01

    Experiments using portable tactile aids in speech perception are reviewed, focusing on training studies, additive benefit studies, and device comparison studies (including the "Tactaid II,""Tactaid V,""Tacticon 1600," and "Tickle Talker"). The potential of tactual information in perception of the overall…

  18. Speech perception in children with speech output disorders.

    NARCIS (Netherlands)

    Nijland, L.

    2009-01-01

    Research in the field of speech production pathology is dominated by describing deficits in output. However, perceptual problems might underlie, precede, or interact with production disorders. The present study hypothesizes that the level of the production disorders is linked to level of perception

  19. Prediction-based classification for audiovisual discrimination between laughter and speech

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja; Cohn, Jeffrey F.

    Recent evidence in neuroscience support the theory that prediction of spatial and temporal patterns in the brain plays a key role in human actions and perception. Inspired by these findings, a system that discriminates laughter from speech by modeling the spatial and temporal relationship between

  20. Dissociating Cortical Activity during Processing of Native and Non-Native Audiovisual Speech from Early to Late Infancy

    Directory of Open Access Journals (Sweden)

    Eswen Fava

    2014-08-01

    Full Text Available Initially, infants are capable of discriminating phonetic contrasts across the world’s languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech. Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity.

  1. Electrophysiological Evidence for a Multisensory Speech-Specific Mode of Perception

    Science.gov (United States)

    Stekelenburg, Jeroen J.; Vroomen, Jean

    2012-01-01

    We investigated whether the interpretation of auditory stimuli as speech or non-speech affects audiovisual (AV) speech integration at the neural level. Perceptually ambiguous sine-wave replicas (SWS) of natural speech were presented to listeners who were either in "speech mode" or "non-speech mode". At the behavioral level, incongruent lipread…

  2. Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

    Directory of Open Access Journals (Sweden)

    Elena V Kushnerenko

    2013-07-01

    Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.

  3. The development of multisensory speech perception continues into the late childhood years.

    Science.gov (United States)

    Ross, Lars A; Molholm, Sophie; Blanco, Daniella; Gomez-Ramirez, Manuel; Saint-Amour, Dave; Foxe, John J

    2011-06-01

    Observing a speaker's articulations substantially improves the intelligibility of spoken speech, especially under noisy listening conditions. This multisensory integration of speech inputs is crucial to effective communication. Appropriate development of this ability has major implications for children in classroom and social settings, and deficits in it have been linked to a number of neurodevelopmental disorders, especially autism. It is clear from structural imaging studies that there is a prolonged maturational course within regions of the perisylvian cortex that persists into late childhood, and these regions have been firmly established as being crucial to speech and language functions. Given this protracted maturational timeframe, we reasoned that multisensory speech processing might well show a similarly protracted developmental course. Previous work in adults has shown that audiovisual enhancement in word recognition is most apparent within a restricted range of signal-to-noise ratios (SNRs). Here, we investigated when these properties emerge during childhood by testing multisensory speech recognition abilities in typically developing children aged between 5 and 14 years, and comparing them with those of adults. By parametrically varying SNRs, we found that children benefited significantly less from observing visual articulations, displaying considerably less audiovisual enhancement. The findings suggest that improvement in the ability to recognize speech-in-noise and in audiovisual integration during speech perception continues quite late into the childhood years. The implication is that a considerable amount of multisensory learning remains to be achieved during the later schooling years, and that explicit efforts to accommodate this learning may well be warranted. European Journal of Neuroscience © 2011 Federation of European Neuroscience Societies and Blackwell Publishing Ltd. No claim to original US government works.

  4. Effects of noise and audiovisual cues on speech processing in adults with and without ADHD.

    Science.gov (United States)

    Michalek, Anne M P; Watson, Silvana M; Ash, Ivan; Ringleb, Stacie; Raymer, Anastasia

    2014-03-01

    This study examined the interplay among internal (e.g. attention, working memory abilities) and external (e.g. background noise, visual information) factors in individuals with and without ADHD. A 2 × 2 × 6 mixed design with correlational analyses was used to compare participant results on a standardized listening in noise sentence repetition task (QuickSin; Killion et al, 2004 ), presented in an auditory and an audiovisual condition as signal-to-noise ratio (SNR) varied from 25-0 dB and to determine individual differences in working memory capacity and short-term recall. Thirty-eight young adults without ADHD and twenty-five young adults with ADHD. Diagnosis, modality, and signal-to-noise ratio all affected the ability to process speech in noise. The interaction between the diagnosis of ADHD, the presence of visual cues, and the level of noise had an effect on a person's ability to process speech in noise. conclusion: Young adults with ADHD benefited less from visual information during noise than young adults without ADHD, an effect influenced by working memory abilities.

  5. Speech perception: motoric contributions versus the motor theory.

    Science.gov (United States)

    Devlin, Joseph T; Aydelott, Jennifer

    2009-03-10

    Recent studies indicate that the motor cortex is involved not only in the production of speech, but also in its perception. These studies have sparked a renewed interest in gesture-based theories of speech perception.

  6. The Beginnings of Danish Speech Perception

    DEFF Research Database (Denmark)

    Østerbye, Torkil

    Little is known about the perception of speech sounds by native Danish listeners. However, the Danish sound system differs in several interesting ways from the sound systems of other languages. For instance, Danish is characterized, among other features, by a rich vowel inventory and by different...... to interesting differences in speech perception and acquisition of Danish adults and infants when compared to English. The book is useful for professionals as well as students of linguistics, psycholinguistics and phonetics/phonology, or anyone else who may be interested in language....

  7. Speech perception in medico-legal assessment of hearing disabilities.

    Science.gov (United States)

    Pedersen, Ellen Raben; Juhl, Peter Møller; Wetke, Randi; Andersen, Ture Dammann

    2016-10-01

    Examination of Danish data for medico-legal compensations regarding hearing disabilities. The study purposes are: (1) to investigate whether discrimination scores (DSs) relate to patients' subjective experience of their hearing and communication ability (the latter referring to audio-visual perception), (2) to compare DSs from different discrimination tests (auditory/audio-visual perception and without/with noise), and (3) to relate different handicap measures in the scaling used for compensation purposes in Denmark. Data from a 15 year period (1999-2014) were collected and analysed. The data set includes 466 patients, from which 50 were omitted due to suspicion of having exaggerated their hearing disabilities. The DSs relate well to the patients' subjective experience of their speech perception ability. By comparing DSs for different test setups it was found that adding noise entails a relatively more difficult listening condition than removing visual cues. The hearing and communication handicap degrees were found to agree, whereas the measured handicap degrees tended to be higher than the self-assessed handicap degrees. The DSs can be used to assess patients' hearing and communication abilities. The difference in the obtained handicap degrees emphasizes the importance of collecting self-assessed as well as measured handicap degrees.

  8. Neural correlates of quality during perception of audiovisual stimuli

    CERN Document Server

    Arndt, Sebastian

    2016-01-01

    This book presents a new approach to examining perceived quality of audiovisual sequences. It uses electroencephalography to understand how exactly user quality judgments are formed within a test participant, and what might be the physiologically-based implications when being exposed to lower quality media. The book redefines experimental paradigms of using EEG in the area of quality assessment so that they better suit the requirements of standard subjective quality testings. Therefore, experimental protocols and stimuli are adjusted accordingly. .

  9. Developing an Audiovisual Notebook as a Self-Learning Tool in Histology: Perceptions of Teachers and Students

    Science.gov (United States)

    Campos-Sánchez, Antonio; López-Núñez, Juan-Antonio; Scionti, Giuseppe; Garzón, Ingrid; González-Andrades, Miguel; Alaminos, Miguel; Sola, Tomás

    2014-01-01

    Videos can be used as didactic tools for self-learning under several circumstances, including those cases in which students are responsible for the development of this resource as an audiovisual notebook. We compared students' and teachers' perceptions regarding the main features that an audiovisual notebook should include. Four…

  10. Speech Misperception: Speaking and Seeing Interfere Differently with Hearing

    OpenAIRE

    Takemi Mochida; Toshitaka Kimura; Sadao Hiroya; Norimichi Kitagawa; Hiroaki Gomi; Tadahisa Kondo

    2013-01-01

    Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phoneme...

  11. Speech-Perception-in-Noise Deficits in Dyslexia

    Science.gov (United States)

    Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian

    2009-01-01

    Speech perception deficits in developmental dyslexia were investigated in quiet and various noise conditions. Dyslexics exhibited clear speech perception deficits in noise but not in silence. "Place-of-articulation" was more affected than "voicing" or "manner-of-articulation." Speech-perception-in-noise deficits persisted when performance of…

  12. Syllable Congruency of Audio-Visual Speech Stimuli Facilitates the Spatial Ventriloquism Only with Bilateral Visual Presentations

    Directory of Open Access Journals (Sweden)

    Shoko Kanaya

    2011-10-01

    Full Text Available Spatial ventriloquism refers to a shift of perceptual location of a sound toward a synchronized visual stimulus. It has been assumed to reflect early processes uninfluenced by cognitive factors such as syllable congruency between audio-visual speech stimuli. Conventional experiments have examined compelling situations which typically entail pairs of single audio and visual stimuli to be bound. However, for natural environments our multisensory system is designed to select relevant sensory signals to be bound among adjacent stimuli. This selection process may depend upon higher (cognitive mechanisms. We investigated whether a cognitive factor affects the size of the ventriloquism when an additional visual stimulus is presented with a conventional audio-visual pair. Participants were presented with a set of audio-visual speech stimuli, comprising one or two bilateral movies of a person uttering single syllables together with recordings of this person speaking the same syllables. One of movies and the speech sound were combined in either congruent or incongruent ways. Participants had to identify sound locations. Results show that syllable congruency affected the size of the ventriloquism only when two movies were presented simultaneously. The selection of a relevant stimulus pair among two or more candidates can be regulated by some higher processes.

  13. Commentary: Dichotomies in the perception of speech

    Indian Academy of Sciences (India)

    Home; Journals; Journal of Biosciences; Volume 27; Issue 3. Commentary: Dichotomies in the perception of speech. Mohinish Shukla. Volume 27 Issue 3 June 2002 pp 189-190. Fulltext. Click here to view fulltext PDF. Permanent link: http://www.ias.ac.in/article/fulltext/jbsc/027/03/0189-0190. Author Affiliations. Mohinish ...

  14. Speech Perception Results: Audition and Lipreading Enhancement.

    Science.gov (United States)

    Geers, Ann; Brenner, Chris

    1994-01-01

    This paper describes changes in speech perception performance of deaf children using cochlear implants, tactile aids, or conventional hearing aids over a three-year period. Eleven of the 13 children with cochlear implants were able to identify words on the basis of auditory consonant cues. Significant lipreading enhancement was also achieved with…

  15. Phonological and Phonetic Biases in Speech Perception

    Science.gov (United States)

    Key, Michael Parrish

    2012-01-01

    This dissertation investigates how knowledge of phonological generalizations influences speech perception, with a particular focus on evidence that phonological processing is autonomous from (rather than interactive with) auditory processing. A model is proposed in which auditory cue constraints and markedness constraints interact to determine a…

  16. Multisensory speech perception in autism spectrum disorder: From phoneme to whole-word perception.

    Science.gov (United States)

    Stevenson, Ryan A; Baum, Sarah H; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Wallace, Mark T

    2017-07-01

    Speech perception in noisy environments is boosted when a listener can see the speaker's mouth and integrate the auditory and visual speech information. Autistic children have a diminished capacity to integrate sensory information across modalities, which contributes to core symptoms of autism, such as impairments in social communication. We investigated the abilities of autistic and typically-developing (TD) children to integrate auditory and visual speech stimuli in various signal-to-noise ratios (SNR). Measurements of both whole-word and phoneme recognition were recorded. At the level of whole-word recognition, autistic children exhibited reduced performance in both the auditory and audiovisual modalities. Importantly, autistic children showed reduced behavioral benefit from multisensory integration with whole-word recognition, specifically at low SNRs. At the level of phoneme recognition, autistic children exhibited reduced performance relative to their TD peers in auditory, visual, and audiovisual modalities. However, and in contrast to their performance at the level of whole-word recognition, both autistic and TD children showed benefits from multisensory integration for phoneme recognition. In accordance with the principle of inverse effectiveness, both groups exhibited greater benefit at low SNRs relative to high SNRs. Thus, while autistic children showed typical multisensory benefits during phoneme recognition, these benefits did not translate to typical multisensory benefit of whole-word recognition in noisy environments. We hypothesize that sensory impairments in autistic children raise the SNR threshold needed to extract meaningful information from a given sensory input, resulting in subsequent failure to exhibit behavioral benefits from additional sensory information at the level of whole-word recognition. Autism Res 2017. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. Autism Res 2017, 10: 1280-1290. © 2017 International

  17. A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

    Science.gov (United States)

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

    2015-01-01

    The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.

  18. Explicit pattern recognition models for speech perception

    Science.gov (United States)

    Nearey, Terrance M.

    2003-10-01

    Optimal statistical classification of arbitrary input signals can be obtained, in principle, via a Bayesian classifier, given (perfect) knowledge of the distributions of signal properties for the set of target categories. At least for certain constrained problems, such as the perception of isolated vowels, simple (imperfect) statistical pattern recognition techniques can accurately predict human listeners' performance. This paper sketches several relatively successful case studies of the application of static pattern recognition techniques to speech perception. (Static techniques require inputs of a fixed length, e.g., F1 and F2 for isolated vowels.) Real speech clearly requires dynamic pattern recognition, allowing inputs of arbitrary length. Certain such methods, such as dynamic programming and hidden Markov models, have been widely exploited in automatic speech recognition. The present paper will describe initial attempts to apply variants of such methods to the data from a perception experiment [T. Nearey and R. Smits, J. Acoust. Soc. Am. 111 (2002)] involving the perception of three (VCV) or four (VCCV) segment strings. Practical and conceptual problems in the application of such techniques to human perception will be discussed. [Work supported by SSHRC.

  19. The spatiotemporal characteristics of elementary audiovisual speech and music processing in musically untrained subjects.

    Science.gov (United States)

    Elmer, Stefan; Meyer, Martin; Jäncke, Lutz

    2012-03-01

    Previously, the EEG technique has been used to investigate the spatiotemporal properties of audiovisual (AV) processing by taking advantage of the violation of the "additive model", which is considered to be a very conservative approach. In the present work, we used a less conservative and novel approach than the criterion of superadditivity for estimating AV interactions. Hence, we estimated AV interaction patterns by comparing the responses to AV stimuli with the averaged responses to the unimodal visual and auditory stimuli in musically untrained subjects and by presenting syllables and piano tones coupled with flashlights. Our results suggest that the two AV objects elicited consistent interaction patterns within the time course of unisensory processing in the time range between 80 and 250ms post stimulus onset. The scalp topographies, as well as the source estimation approach we adopted, indicate that the first interaction pattern at around 100ms was partially driven by auditory-related cortical regions. Additionally, we found evidence for a second interaction pattern at around 200ms that was mainly associated with the responsiveness of extra-sensory brain regions. During this later processing stage, only the music condition was associated with putative responses that originated from auditory-related cortical fields. This study provides a novel approach to investigate the basic principles underlying elementary AV speech and music processing in subjects without formal musical education. Copyright © 2011 Elsevier B.V. All rights reserved.

  20. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception

    Directory of Open Access Journals (Sweden)

    Avrill eTreille

    2014-05-01

    Full Text Available Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker’s face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant and predictive visual and haptic cues may facilitate auditory speech processing. To further investigate this hypothesis, auditory evoked potentials were here compared during auditory-only, audio-visual and audio-haptic speech perception in live dyadic interactions between a listener and a speaker. In line with previous studies, auditory evoked potentials were attenuated and speeded up during both audio-haptic and audio-visual compared to auditory speech perception. Importantly, the observed latency and amplitude reduction did not significantly depend on the degree of visual and haptic recognition of the speech targets. Altogether, these results further demonstrate cross-modal interactions between the auditory, visual and haptic speech signals. Although they do not contradict the hypothesis that visual and haptic sensory inputs convey predictive information with respect to the incoming auditory speech input, these results suggest that, at least in live conversational interactions, systematic conclusions on sensory predictability in bimodal speech integration have to be taken with caution, with the extraction of predictive cues likely depending on the variability of the speech stimuli.

  1. A link between individual differences in multisensory speech perception and eye movements.

    Science.gov (United States)

    Gurler, Demet; Doyle, Nathan; Walker, Edgar; Magnotti, John; Beauchamp, Michael

    2015-05-01

    The McGurk effect is an illusion in which visual speech information dramatically alters the perception of auditory speech. However, there is a high degree of individual variability in how frequently the illusion is perceived: some individuals almost always perceive the McGurk effect, while others rarely do. Another axis of individual variability is the pattern of eye movements make while viewing a talking face: some individuals often fixate the mouth of the talker, while others rarely do. Since the talker's mouth carries the visual speech necessary information to induce the McGurk effect, we hypothesized that individuals who frequently perceive the McGurk effect should spend more time fixating the talker's mouth. We used infrared eye tracking to study eye movements as 40 participants viewed audiovisual speech. Frequent perceivers of the McGurk effect were more likely to fixate the mouth of the talker, and there was a significant correlation between McGurk frequency and mouth looking time. The noisy encoding of disparity model of McGurk perception showed that individuals who frequently fixated the mouth had lower sensory noise and higher disparity thresholds than those who rarely fixated the mouth. Differences in eye movements when viewing the talker's face may be an important contributor to interindividual differences in multisensory speech perception.

  2. Benefits of Music Training for Perception of Emotional Speech Prosody in Deaf Children With Cochlear Implants.

    Science.gov (United States)

    Good, Arla; Gordon, Karen A; Papsin, Blake C; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A

    Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation.

  3. Benefits of Music Training for Perception of Emotional Speech Prosody in Deaf Children With Cochlear Implants

    Science.gov (United States)

    Gordon, Karen A.; Papsin, Blake C.; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A.

    2017-01-01

    Objectives: Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Design: Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Results: Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Conclusions: Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation. PMID:28085739

  4. Aero-tactile integration in speech perception.

    Science.gov (United States)

    Gick, Bryan; Derrick, Donald

    2009-11-26

    Visual information from a speaker's face can enhance or interfere with accurate auditory perception. This integration of information across auditory and visual streams has been observed in functional imaging studies, and has typically been attributed to the frequency and robustness with which perceivers jointly encounter event-specific information from these two modalities. Adding the tactile modality has long been considered a crucial next step in understanding multisensory integration. However, previous studies have found an influence of tactile input on speech perception only under limited circumstances, either where perceivers were aware of the task or where they had received training to establish a cross-modal mapping. Here we show that perceivers integrate naturalistic tactile information during auditory speech perception without previous training. Drawing on the observation that some speech sounds produce tiny bursts of aspiration (such as English 'p'), we applied slight, inaudible air puffs on participants' skin at one of two locations: the right hand or the neck. Syllables heard simultaneously with cutaneous air puffs were more likely to be heard as aspirated (for example, causing participants to mishear 'b' as 'p'). These results demonstrate that perceivers integrate event-relevant tactile information in auditory perception in much the same way as they do visual information.

  5. How Our Own Speech Rate Influences Our Perception of Others

    Science.gov (United States)

    Bosker, Hans Rutger

    2017-01-01

    In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects…

  6. Audiovisual alignment of co-speech gestures to speech supports word learning in 2-year-olds.

    Science.gov (United States)

    Jesse, Alexandra; Johnson, Elizabeth K

    2016-05-01

    Analyses of caregiver-child communication suggest that an adult tends to highlight objects in a child's visual scene by moving them in a manner that is temporally aligned with the adult's speech productions. Here, we used the looking-while-listening paradigm to examine whether 25-month-olds use audiovisual temporal alignment to disambiguate and learn novel word-referent mappings in a difficult word-learning task. Videos of two equally interesting and animated novel objects were simultaneously presented to children, but the movement of only one of the objects was aligned with an accompanying object-labeling audio track. No social cues (e.g., pointing, eye gaze, touch) were available to the children because the speaker was edited out of the videos. Immediately afterward, toddlers were presented with still images of the two objects and asked to look at one or the other. Toddlers looked reliably longer to the labeled object, demonstrating their acquisition of the novel word-referent mapping. A control condition showed that children's performance was not solely due to the single unambiguous labeling that had occurred at experiment onset. We conclude that the temporal link between a speaker's utterances and the motion they imposed on the referent object helps toddlers to deduce a speaker's intended reference in a difficult word-learning scenario. In combination with our previous work, these findings suggest that intersensory redundancy is a source of information used by language users of all ages. That is, intersensory redundancy is not just a word-learning tool used by young infants. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Audiovisual associations alter the perception of low-level visual motion

    Directory of Open Access Journals (Sweden)

    Hulusi eKafaligonul

    2015-03-01

    Full Text Available Motion perception is a pervasive nature of vision and is affected by both immediate pattern of sensory inputs and prior experiences acquired through associations. Recently, several studies reported that an association can be established quickly between directions of visual motion and static sounds of distinct frequencies. After the association is formed, sounds are able to change the perceived direction of visual motion. To determine whether such rapidly acquired audiovisual associations and their subsequent influences on visual motion perception are dependent on the involvement of higher-order attentive tracking mechanisms, we designed psychophysical experiments using regular and reverse-phi random dot motions isolating low-level pre-attentive motion processing. Our results show that an association between the directions of low-level visual motion and static sounds can be formed and this audiovisual association alters the subsequent perception of low-level visual motion. These findings support the view that audiovisual associations are not restricted to high-level attention based motion system and early-level visual motion processing has some potential role.

  8. Are there interactive processes in speech perception?

    Science.gov (United States)

    McClelland, James L.; Mirman, Daniel; Holt, Lori L.

    2012-01-01

    Lexical information facilitates speech perception, especially when sounds are ambiguous or degraded. The interactive approach to understanding this effect posits that this facilitation is accomplished through bi-directional flow of information, allowing lexical knowledge to influence pre-lexical processes. Alternative autonomous theories posit feed-forward processing with lexical influence restricted to post-perceptual decision processes. We review evidence supporting the prediction of interactive models that lexical influences can affect pre-lexical mechanisms, triggering compensation, adaptation and retuning of phonological processes generally taken to be pre-lexical. We argue that these and other findings point to interactive processing as a fundamental principle for perception of speech and other modalities. PMID:16843037

  9. Speech-in-speech perception and executive function involvement.

    Science.gov (United States)

    Perrone-Bertolotti, Marcela; Tassin, Maxime; Meunier, Fanny

    2017-01-01

    This present study investigated the link between speech-in-speech perception capacities and four executive function components: response suppression, inhibitory control, switching and working memory. We constructed a cross-modal semantic priming paradigm using a written target word and a spoken prime word, implemented in one of two concurrent auditory sentences (cocktail party situation). The prime and target were semantically related or unrelated. Participants had to perform a lexical decision task on visual target words and simultaneously listen to only one of two pronounced sentences. The attention of the participant was manipulated: The prime was in the pronounced sentence listened to by the participant or in the ignored one. In addition, we evaluate the executive function abilities of participants (switching cost, inhibitory-control cost and response-suppression cost) and their working memory span. Correlation analyses were performed between the executive and priming measurements. Our results showed a significant interaction effect between attention and semantic priming. We observed a significant priming effect in the attended but not in the ignored condition. Only priming effects obtained in the ignored condition were significantly correlated with some of the executive measurements. However, no correlation between priming effects and working memory capacity was found. Overall, these results confirm, first, the role of attention for semantic priming effect and, second, the implication of executive functions in speech-in-noise understanding capacities.

  10. Oral Kinesthetic Sensitivity and the Perception of Speech

    Science.gov (United States)

    Larson, Stephen; Hudson, Floyd G.

    1973-01-01

    Studied the relationship between auditory ability and oral form discrimination in children with varying degrees of speech and language development. Results lend support to motor theory of speech perception. (ST)

  11. Audio-visual perception system for a humanoid robotic head.

    Science.gov (United States)

    Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M; Bandera, Juan P; Romero-Garces, Adrian; Reche-Lopez, Pedro

    2014-05-28

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  12. Audio-Visual Perception System for a Humanoid Robotic Head

    Directory of Open Access Journals (Sweden)

    Raquel Viciana-Abad

    2014-05-01

    Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  13. Giving Speech a Hand: Gesture Modulates Activity in Auditory Cortex During Speech Perception

    OpenAIRE

    Hubbard, Amy L; Wilson, Stephen M.; Callan, Daniel E; Dapretto, Mirella

    2009-01-01

    Viewing hand gestures during face-to-face communication affects speech perception and comprehension. Despite the visible role played by gesture in social interactions, relatively little is known about how the brain integrates hand gestures with co-occurring speech. Here we used functional magnetic resonance imaging (fMRI) and an ecologically valid paradigm to investigate how beat gesture – a fundamental type of hand gesture that marks speech prosody – might impact speech perception at the neu...

  14. Monkey Lipsmacking Develops Like the Human Speech Rhythm

    Science.gov (United States)

    Morrill, Ryan J.; Paukner, Annika; Ferrari, Pier F.; Ghazanfar, Asif A.

    2012-01-01

    Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved "de novo" in humans. An alternative account--the one we explored here--is that the rhythm of speech evolved through the modification of rhythmic facial…

  15. Emotion Recognition from Speech Signals and Perception of Music

    OpenAIRE

    Fernandez Pradier, Melanie

    2012-01-01

    This thesis deals with emotion recognition from speech signals. The feature extraction step shall be improved by looking at the perception of music. In music theory, different pitch intervals (consonant, dissonant) and chords are believed to invoke different feelings in listeners. The question is whether there is a similar mechanism between perception of music and perception of emotional speech. Our research will follow three stages. First, the relationship between speech and music at segment...

  16. Speech perception at the interface of neurobiology and linguistics

    National Research Council Canada - National Science Library

    David Poeppel; William J Idsardi; Virginie van Wassenhove

    2008-01-01

    Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations...

  17. Resonant cortical dynamics of speech perception

    Science.gov (United States)

    Grossberg, Stephen

    2003-04-01

    What is the neural representation of a speech code as it evolves in time? How do listeners integrate temporally distributed phonemic information into coherent representations of syllables and words? How does the brain extract invariant properties of variable-rate speech? This talk describes a neural model that suggests answers to these questions, while quantitatively simulating speech and word recognition data. The conscious speech and word recognition code is suggested to be a resonant wave, and a percept of silence a temporal discontinuity in the rate that resonance evolves. A resonant wave emerges when sequential activation and storage of phonemic items in working memory provides bottom-up input to list chunks that group together sequences of items of variable length. The list chunks compete and winning chunks activate top-down expectations that amplify and focus attention on consistent working memory items, while suppressing inconsistent ones. The ensuing resonance boosts activation levels of selected items and chunks. Because resonance occurs after working memory activation, it can incorporate information presented after intervening silence intervals, so future sounds can influence how we hear past sounds. The model suggests that resonant dynamics enable the brain to learn quickly without suffering catastrophic forgetting, as described within Adaptive Resonance Theory.

  18. Correlation between audio-visual enhancement of speech in different noise environments and SNR: a combined behavioral and electrophysiological study.

    Science.gov (United States)

    Liu, B; Lin, Y; Gao, X; Dang, J

    2013-09-05

    In the present study, we investigated the multisensory gain as the difference of speech recognition accuracies between the audio-visual (AV) and auditory-only (A) conditions, and the multisensory gain as the difference between the event-related potentials (ERPs) evoked under the AV condition and the sum of the ERPs evoked under the A and visual-only (V) conditions in different noise environments. Videos of a female speaker articulating the Chinese monosyllable words accompanied with different levels of pink noise were used as the stimulus materials. The selected signal-to-noise ratios (SNRs) were -16, -12, -8, -4 and 0 dB. Under the A, V and AV conditions the accuracy of the speech recognition was measured and the ERPs evoked under different conditions were analyzed, respectively. The behavioral results showed that the maximum gain as the difference of speech recognition accuracies between the AV and A conditions was at the -12 dB SNR. The ERP results showed that the multisensory gain as the difference between the ERPs evoked under the AV condition and the sum of ERPs evoked under the A and V conditions at the -12 dB SNR was significantly higher than those at the other SNRs in the time window of 130-200 ms in the area from frontal to central region. The multisensory gains in audio-visual speech recognition at different SNRs were not completely accordant with the principle of inverse effectiveness, but confirmed to cross-modal stochastic resonance. Copyright © 2013 IBRO. Published by Elsevier Ltd. All rights reserved.

  19. Perception of Speech Sounds in School-Aged Children with Speech Sound Disorders.

    Science.gov (United States)

    Preston, Jonathan L; Irwin, Julia R; Turcios, Jacqueline

    2015-11-01

    Children with speech sound disorders may perceive speech differently than children with typical speech development. The nature of these speech differences is reviewed with an emphasis on assessing phoneme-specific perception for speech sounds that are produced in error. Category goodness judgment, or the ability to judge accurate and inaccurate tokens of speech sounds, plays an important role in phonological development. The software Speech Assessment and Interactive Learning System, which has been effectively used to assess preschoolers' ability to perform goodness judgments, is explored for school-aged children with residual speech errors (RSEs). However, data suggest that this particular task may not be sensitive to perceptual differences in school-aged children. The need for the development of clinical tools for assessment of speech perception in school-aged children with RSE is highlighted, and clinical suggestions are provided. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  20. The neurobiology of speech perception decline in aging.

    Science.gov (United States)

    Bilodeau-Mercure, Mylène; Lortie, Catherine L; Sato, Marc; Guitton, Matthieu J; Tremblay, Pascale

    2015-03-01

    Speech perception difficulties are common among elderlies; yet the underlying neural mechanisms are still poorly understood. New empirical evidence suggesting that brain senescence may be an important contributor to these difficulties has challenged the traditional view that peripheral hearing loss was the main factor in the etiology of these difficulties. Here, we investigated the relationship between structural and functional brain senescence and speech perception skills in aging. Following audiometric evaluations, participants underwent MRI while performing a speech perception task at different intelligibility levels. As expected, with age speech perception declined, even after controlling for hearing sensitivity using an audiological measure (pure tone averages), and a bioacoustical measure (DPOAEs recordings). Our results reveal that the core speech network, centered on the supratemporal cortex and ventral motor areas bilaterally, decreased in spatial extent in older adults. Importantly, our results also show that speech skills in aging are affected by changes in cortical thickness and in brain functioning. Age-independent intelligibility effects were found in several motor and premotor areas, including the left ventral premotor cortex and the right supplementary motor area (SMA). Age-dependent intelligibility effects were also found, mainly in sensorimotor cortical areas, and in the left dorsal anterior insula. In this region, changes in BOLD signal modulated the relationship between age and speech perception skills suggesting a role for this region in maintaining speech perception in older ages. These results provide important new insights into the neurobiology of speech perception in aging.

  1. The Relationship between Speech Production and Speech Perception Deficits in Parkinson's Disease

    Science.gov (United States)

    De Keyser, Kim; Santens, Patrick; Bockstael, Annelies; Botteldooren, Dick; Talsma, Durk; De Vos, Stefanie; Van Cauwenberghe, Mieke; Verheugen, Femke; Corthals, Paul; De Letter, Miet

    2016-01-01

    Purpose: This study investigated the possible relationship between hypokinetic speech production and speech intensity perception in patients with Parkinson's disease (PD). Method: Participants included 14 patients with idiopathic PD and 14 matched healthy controls (HCs) with normal hearing and cognition. First, speech production was objectified…

  2. Lexical and sublexical units in speech perception.

    Science.gov (United States)

    Giroux, Ibrahima; Rey, Arnaud

    2009-03-01

    Saffran, Newport, and Aslin (1996a) found that human infants are sensitive to statistical regularities corresponding to lexical units when hearing an artificial spoken language. Two sorts of segmentation strategies have been proposed to account for this early word-segmentation ability: bracketing strategies, in which infants are assumed to insert boundaries into continuous speech, and clustering strategies, in which infants are assumed to group certain speech sequences together into units (Swingley, 2005). In the present study, we test the predictions of two computational models instantiating each of these strategies i.e., Serial Recurrent Networks: Elman, 1990; and Parser: Perruchet & Vinter, 1998 in an experiment where we compare the lexical and sublexical recognition performance of adults after hearing 2 or 10 min of an artificial spoken language. The results are consistent with Parser's predictions and the clustering approach, showing that performance on words is better than performance on part-words only after 10 min. This result suggests that word segmentation abilities are not merely due to stronger associations between sublexical units but to the emergence of stronger lexical representations during the development of speech perception processes. Copyright © 2009, Cognitive Science Society, Inc.

  3. [Pathophysiology of auditory and speech perception].

    Science.gov (United States)

    Dauman, René

    2009-05-20

    Auditory perception or hearing can be defined as the interpretation of sensory evidence, produced by the ears in response to sound, in terms of the events that caused the sound. We do not hear a window but we may hear a window closing. We do not hear a dog but we may hear a dog barking. And we do not hear a person but we may hear a person talking. Hearing impairment can result in anxiety or stress in everyday life. Pure-tone hearing loss (or threshold shift) is a measure of hearing impairment. Aging and excessive noise are the main causes of hearing impairment. Speech perception is another concept. The difference with the former is best illustrated by the disabled individual declaring "I can hear that someone is talking to me, but I don't understand what she says". Being unable to understand easily and clearly significant others, especially in understanding speech in a noisy environment, can give rise to considerable psychosocial and professional consequences (disability). Presbycusis is the decline in hearing sensitivity caused by the aging process at different levels of the auditory system. However, it is difficult to isolate age effects from other contributors to age-related hearing loss such as noise damage, genetic susceptibility, inflammatory otologic disorders, and ototoxic agents. Therefore, presbycusis and age-related hearing loss are often used synonymously. In this report pathophysiology is mostly described with regard to presbycusis, and the main peripheral types of presbycusis (sensory or Corti organ-related, strial, and neural) are summarized. An original experimental model of strial presbycusis, based on chronic application of furosemide at the round window, is further described. Central presbycusis is mainly determined by degeneration secondary to peripheral impairment (concept of deafferentation). Central auditory changes typically affect speed of processing and result in poorer speech understanding in noise or with rapid or degraded speech. Last

  4. Exploring Australian speech-language pathologists' use and perceptions ofnon-speech oral motor exercises.

    Science.gov (United States)

    Rumbach, Anna F; Rose, Tanya A; Cheah, Mynn

    2018-01-29

    To explore Australian speech-language pathologists' use of non-speech oral motor exercises, and rationales for using/not using non-speech oral motor exercises in clinical practice. A total of 124 speech-language pathologists practising in Australia, working with paediatric and/or adult clients with speech sound difficulties, completed an online survey. The majority of speech-language pathologists reported that they did not use non-speech oral motor exercises when working with paediatric or adult clients with speech sound difficulties. However, more than half of the speech-language pathologists working with adult clients who have dysarthria reported using non-speech oral motor exercises with this population. The most frequently reported rationale for using non-speech oral motor exercises in speech sound difficulty management was to improve awareness/placement of articulators. The majority of speech-language pathologists agreed there is no clear clinical or research evidence base to support non-speech oral motor exercise use with clients who have speech sound difficulties. This study provides an overview of Australian speech-language pathologists' reported use and perceptions of non-speech oral motor exercises' applicability and efficacy in treating paediatric and adult clients who have speech sound difficulties. The research findings provide speech-language pathologists with insight into how and why non-speech oral motor exercises are currently used, and adds to the knowledge base regarding Australian speech-language pathology practice of non-speech oral motor exercises in the treatment of speech sound difficulties. Implications for Rehabilitation Non-speech oral motor exercises refer to oral motor activities which do not involve speech, but involve the manipulation or stimulation of oral structures including the lips, tongue, jaw, and soft palate. Non-speech oral motor exercises are intended to improve the function (e.g., movement, strength) of oral structures. The

  5. Cognitive control factors in speech perception at 11 months

    OpenAIRE

    Conboy, Barbara T.; Sommerville, Jessica A.; Kuhl, Patricia K.

    2008-01-01

    The development of speech perception during the first year reflects increasing attunement to native language features, but the mechanisms underlying this development are not completely understood. One previous study linked reductions in nonnative speech discrimination to performance on nonlinguistic tasks, while other studies have shown associations between speech perception and vocabulary growth. The present study examined relationships among these abilities in 11-month-old infants using a c...

  6. Musical expertise and foreign speech perception

    Directory of Open Access Journals (Sweden)

    Eduardo eMartínez-Montes

    2013-11-01

    Full Text Available The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT or equivalent that were either far from (Large deviants or close to (Small deviants the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception is discussed.

  7. Production-perception relationships during speech development

    Science.gov (United States)

    Menard, Lucie; Schwartz, Jean-Luc; Boe, Louis-Jean; Aubin, Jerome

    2005-04-01

    It has been shown that nonuniform growth of the supraglottal cavities, motor control development, and perceptual refinement shape the vowel systems during speech development. In this talk, we propose to investigate the role of perceptual constraints as a guide to the speakers task from birth to adulthood. Simulations with an articulatory-to-acoustic model, acoustic analyses of natural vowels, and results of perceptual tests provide evidence that the production-perception relationships evolve with age. At the perceptual level, results show that (i) linear combination of spectral peaks are good predictors of vowel targets, and (ii) focalization, defined as an acoustic pattern with close neighboring formants [J.-L. Schwartz, L.-J. Boe, N. Vallee, and C. Abry, J. Phonetics 25, 255-286 (1997)], is part of the speech task. At the production level, we propose that (i) frequently produced vowels in the baby's early sound inventory can in part be explained by perceptual templates, (ii) the achievement of these perceptual templates may require adaptive articulatory strategies for the child, compared with the adults, to cope with morphological differences. Results are discussed in the light of a perception for action control theory. [Work supported by the Social Sciences and Humanities Research Council of Canada.

  8. Perception drives production across sensory modalities: A network for sensorimotor integration of visual speech

    Science.gov (United States)

    Venezia, Jonathan H.; Fillmore, Paul; Matchin, William; Isenberg, A. Lisette; Hickok, Gregory; Fridriksson, Julius

    2015-01-01

    Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development. PMID:26608242

  9. Perception drives production across sensory modalities: A network for sensorimotor integration of visual speech.

    Science.gov (United States)

    Venezia, Jonathan H; Fillmore, Paul; Matchin, William; Isenberg, A Lisette; Hickok, Gregory; Fridriksson, Julius

    2016-02-01

    Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Contribution of Prosody in Audio-Visual Integration to Emotional Perception of Virtual Characters

    Directory of Open Access Journals (Sweden)

    Ekaterina Volkova

    2011-10-01

    Full Text Available Recent technology provides us with realistic looking virtual characters. Motion capture and elaborate mathematical models supply data for natural looking, controllable facial and bodily animations. With the help of computational linguistics and artificial intelligence, we can automatically assign emotional categories to appropriate stretches of text for a simulation of those social scenarios where verbal communication is important. All this makes virtual characters a valuable tool for creation of versatile stimuli for research on the integration of emotion information from different modalities. We conducted an audio-visual experiment to investigate the differential contributions of emotional speech and facial expressions on emotion identification. We used recorded and synthesized speech as well as dynamic virtual faces, all enhanced for seven emotional categories. The participants were asked to recognize the prevalent emotion of paired faces and audio. Results showed that when the voice was recorded, the vocalized emotion influenced participants' emotion identification more than the facial expression. However, when the voice was synthesized, facial expression influenced participants' emotion identification more than vocalized emotion. Additionally, individuals did worse on identifying either the facial expression or vocalized emotion when the voice was synthesized. Our experimental method can help to determine how to improve synthesized emotional speech.

  11. Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

    Directory of Open Access Journals (Sweden)

    José Antonio Palao Errando

    2007-10-01

    Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.

  12. Psychophysics of the McGurk and Other Audiovisual Speech Integration Effects

    Science.gov (United States)

    Jiang, Jintao; Bernstein, Lynne E.

    2011-01-01

    When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called "McGurk effect"), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for…

  13. Voice and Speech Quality Perception Assessment and Evaluation

    CERN Document Server

    Jekosch, Ute

    2005-01-01

    Foundations of Voice and Speech Quality Perception starts out with the fundamental question of: "How do listeners perceive voice and speech quality and how can these processes be modeled?" Any quantitative answers require measurements. This is natural for physical quantities but harder to imagine for perceptual measurands. This book approaches the problem by actually identifying major perceptual dimensions of voice and speech quality perception, defining units wherever possible and offering paradigms to position these dimensions into a structural skeleton of perceptual speech and voice quality. The emphasis is placed on voice and speech quality assessment of systems in artificial scenarios. Many scientific fields are involved. This book bridges the gap between two quite diverse fields, engineering and humanities, and establishes the new research area of Voice and Speech Quality Perception.

  14. Temporal regularity in speech perception: Is regularity beneficial or deleterious?

    Science.gov (United States)

    Geiser, Eveline; Shattuck-Hufnagel, Stefanie

    2012-04-01

    Speech rhythm has been proposed to be of crucial importance for correct speech perception and language learning. This study investigated the influence of speech rhythm in second language processing. German pseudo-sentences were presented to participants in two conditions: 'naturally regular speech rhythm' and an 'emphasized regular rhythm'. Nine expert English speakers with 3.5±1.6 years of German training repeated each sentence after hearing it once over headphones. Responses were transcribed using the International Phonetic Alphabet and analyzed for the number of correct, false and missing consonants as well as for consonant additions. The over-all number of correct reproductions of consonants did not differ between the two experimental conditions. However, speech rhythmicization significantly affected the serial position curve of correctly reproduced syllables. The results of this pilot study are consistent with the view that speech rhythm is important for speech perception.

  15. Review of Visual Speech Perception by Hearing and Hearing-Impaired People: Clinical Implications

    Science.gov (United States)

    Woodhouse, Lynn; Hickson, Louise; Dodd, Barbara

    2009-01-01

    Background: Speech perception is often considered specific to the auditory modality, despite convincing evidence that speech processing is bimodal. The theoretical and clinical roles of speech-reading for speech perception, however, have received little attention in speech-language therapy. Aims: The role of speech-read information for speech…

  16. Comparison of two cochlear implant coding strategies on speech perception.

    Science.gov (United States)

    Dillon, Margaret T; Buss, Emily; King, English R; Deres, Ellen J; Obarowski, Sarah N; Anderson, Meredith L; Adunka, Marcia C

    2016-11-01

    Assess whether differences in speech perception are observed after exclusive listening experience with high-definition continuous interleaved sampling (HDCIS) versus fine structure processing (FSP) coding strategies. Subjects were randomly assigned at initial activation of the external speech processor to receive the HDCIS or FSP coding strategy. Frequency filter assignments were consistent across subjects. The speech perception test battery included CNC words in quiet, HINT sentences in quiet and steady noise (+10 dB SNR), AzBio sentences in quiet and a 10-talker babble (+10 dB SNR), and BKB-SIN. Assessment intervals included 1, 3, and 6 months post-activation. Data from 22 subjects (11 with HDCIS and 11 with FSP) were assessed over time. Speech perception performance was not significantly different between groups. Speech perception performance was not significantly different after 6 months of listening experience with the HDCIS or FSP coding strategy.

  17. Giving speech a hand: gesture modulates activity in auditory cortex during speech perception.

    Science.gov (United States)

    Hubbard, Amy L; Wilson, Stephen M; Callan, Daniel E; Dapretto, Mirella

    2009-03-01

    Viewing hand gestures during face-to-face communication affects speech perception and comprehension. Despite the visible role played by gesture in social interactions, relatively little is known about how the brain integrates hand gestures with co-occurring speech. Here we used functional magnetic resonance imaging (fMRI) and an ecologically valid paradigm to investigate how beat gesture-a fundamental type of hand gesture that marks speech prosody-might impact speech perception at the neural level. Subjects underwent fMRI while listening to spontaneously-produced speech accompanied by beat gesture, nonsense hand movement, or a still body; as additional control conditions, subjects also viewed beat gesture, nonsense hand movement, or a still body all presented without speech. Validating behavioral evidence that gesture affects speech perception, bilateral nonprimary auditory cortex showed greater activity when speech was accompanied by beat gesture than when speech was presented alone. Further, the left superior temporal gyrus/sulcus showed stronger activity when speech was accompanied by beat gesture than when speech was accompanied by nonsense hand movement. Finally, the right planum temporale was identified as a putative multisensory integration site for beat gesture and speech (i.e., here activity in response to speech accompanied by beat gesture was greater than the summed responses to speech alone and beat gesture alone), indicating that this area may be pivotally involved in synthesizing the rhythmic aspects of both speech and gesture. Taken together, these findings suggest a common neural substrate for processing speech and gesture, likely reflecting their joint communicative role in social interactions.

  18. Treating visual speech perception to improve speech production in nonfluent aphasia.

    Science.gov (United States)

    Fridriksson, Julius; Baker, Julie M; Whiteside, Janet; Eoute, David; Moser, Dana; Vesselinov, Roumen; Rorden, Chris

    2009-03-01

    Several recent studies have revealed modulation of the left frontal lobe speech areas not only during speech production but also for speech perception. Crucially, the frontal lobe areas highlighted in these studies are the same ones that are involved in nonfluent aphasia. Based on these findings, this study examined the utility of targeting visual speech perception to improve speech production in nonfluent aphasia. Ten patients with chronic nonfluent aphasia underwent computerized language treatment utilizing picture-word matching. To examine the effect of visual speech perception on picture naming, 2 treatment phases were compared-one that included matching pictures to heard words and another in which pictures were matched to heard words accompanied by a video of the speaker's mouth presented on the computer screen. The results revealed significantly improved picture naming of both trained and untrained items after treatment when it included a visual speech component (ie, seeing the speaker's mouth). In contrast, the treatment phase in which pictures were only matched to heard words did not result in statistically significant improvement of picture naming. The findings suggest that focusing on visual speech perception can significantly improve speech production in nonfluent aphasia and may provide an alternative approach to treat a disorder in which speech production seldom improves much in the chronic phase of stroke.

  19. Loudness perception and speech intensity control in Parkinson's disease.

    Science.gov (United States)

    Clark, Jenna P; Adams, Scott G; Dykstra, Allyson D; Moodie, Shane; Jog, Mandar

    2014-01-01

    The aim of this study was to examine loudness perception in individuals with hypophonia and Parkinson's disease. The participants included 17 individuals with hypophonia related to Parkinson's disease (PD) and 25 age-equivalent controls. The three loudness perception tasks included a magnitude estimation procedure involving a sentence spoken at 60, 65, 70, 75 and 80 dB SPL, an imitation task involving a sentence spoken at 60, 65, 70, 75 and 80 dB SPL, and a magnitude production procedure involving the production of a sentence at five different loudness levels (habitual, two and four times louder and two and four times quieter). The participants with PD produced a significantly different pattern and used a more restricted range than the controls in their perception of speech loudness, imitation of speech intensity, and self-generated estimates of speech loudness. The results support a speech loudness perception deficit in PD involving an abnormal perception of externally generated and self-generated speech intensity. Readers will recognize that individuals with hypophonia related to Parkinson's disease may demonstrate a speech loudness perception deficit involving the abnormal perception of externally generated and self-generated speech intensity. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. Sound frequency affects speech emotion perception: Results from congenital amusia

    Directory of Open Access Journals (Sweden)

    Sydney eLolli

    2015-09-01

    Full Text Available Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody (MBEP were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under band-pass and unfiltered speech conditions. Results showed a significant correlation between pitch discrimination threshold and emotion identification accuracy for band-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold > 16 Hz performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between band-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation.

  1. How does cognitive load influence speech perception? An encoding hypothesis.

    Science.gov (United States)

    Mitterer, Holger; Mattys, Sven L

    2017-01-01

    Two experiments investigated the conditions under which cognitive load exerts an effect on the acuity of speech perception. These experiments extend earlier research by using a different speech perception task (four-interval oddity task) and by implementing cognitive load through a task often thought to be modular, namely, face processing. In the cognitive-load conditions, participants were required to remember two faces presented before the speech stimuli. In Experiment 1, performance in the speech-perception task under cognitive load was not impaired in comparison to a no-load baseline condition. In Experiment 2, we modified the load condition minimally such that it required encoding of the two faces simultaneously with the speech stimuli. As a reference condition, we also used a visual search task that in earlier experiments had led to poorer speech perception. Both concurrent tasks led to decrements in the speech task. The results suggest that speech perception is affected even by loads thought to be processed modularly, and that, critically, encoding in working memory might be the locus of interference.

  2. Perception of Sung Speech in Bimodal Cochlear Implant Users

    Directory of Open Access Journals (Sweden)

    Joseph D. Crew

    2016-11-01

    Full Text Available Combined use of a hearing aid (HA and cochlear implant (CI has been shown to improve CI users’ speech and music performance. However, different hearing devices, test stimuli, and listening tasks may interact and obscure bimodal benefits. In this study, speech and music perception were measured in bimodal listeners for CI-only, HA-only, and CI + HA conditions, using the Sung Speech Corpus, a database of monosyllabic words produced at different fundamental frequencies. Sentence recognition was measured using sung speech in which pitch was held constant or varied across words, as well as for spoken speech. Melodic contour identification (MCI was measured using sung speech in which the words were held constant or varied across notes. Results showed that sentence recognition was poorer with sung speech relative to spoken, with little difference between sung speech with a constant or variable pitch; mean performance was better with CI-only relative to HA-only, and best with CI + HA. MCI performance was better with constant words versus variable words; mean performance was better with HA-only than with CI-only and was best with CI + HA. Relative to CI-only, a strong bimodal benefit was observed for speech and music perception. Relative to the better ear, bimodal benefits remained strong for sentence recognition but were marginal for MCI. While variations in pitch and timbre may negatively affect CI users’ speech and music perception, bimodal listening may partially compensate for these deficits.

  3. Inverse Effectiveness and Multisensory Interactions in Visual Event-Related Potentials with Audiovisual Speech

    OpenAIRE

    Stevenson, Ryan A.; Bushmakin, Maxim; Kim, Sunah; Wallace, Mark T.; Puce, Aina; James, Thomas W

    2012-01-01

    In recent years, it has become evident that neural responses previously considered to be unisensory can be modulated by sensory input from other modalities. In this regard, visual neural activity elicited to viewing a face is strongly influenced by concurrent incoming auditory information, particularly speech. Here, we applied an additive-factors paradigm aimed at quantifying the impact that auditory speech has on visual event-related potentials (ERPs) elicited to visual speech. These multise...

  4. Brain-speech alignment enhances auditory cortical responses and speech perception.

    Science.gov (United States)

    Saoud, Houda; Josse, Goulven; Bertasi, Eric; Truy, Eric; Chait, Maria; Giraud, Anne-Lise

    2012-01-04

    Asymmetry in auditory cortical oscillations could play a role in speech perception by fostering hemispheric triage of information across the two hemispheres. Due to this asymmetry, fast speech temporal modulations relevant for phonemic analysis could be best perceived by the left auditory cortex, while slower modulations conveying vocal and paralinguistic information would be better captured by the right one. It is unclear, however, whether and how early oscillation-based selection influences speech perception. Using a dichotic listening paradigm in human participants, where we provided different parts of the speech envelope to each ear, we show that word recognition is facilitated when the temporal properties of speech match the rhythmic properties of auditory cortices. We further show that the interaction between speech envelope and auditory cortices rhythms translates in their level of neural activity (as measured with fMRI). In the left auditory cortex, the neural activity level related to stimulus-brain rhythm interaction predicts speech perception facilitation. These data demonstrate that speech interacts with auditory cortical rhythms differently in right and left auditory cortex, and that in the latter, the interaction directly impacts speech perception performance.

  5. Elderly perception of speech from a computer

    Science.gov (United States)

    Black, Alan; Eskenazi, Maxine; Simmons, Reid

    2002-05-01

    An aging population still needs to access information, such as bus schedules. It is evident that they will be doing so using computers and especially interfaces using speech input and output. This is a preliminary study to the use of synthetic speech for the elderly. In it twenty persons between the ages of 60 and 80 were asked to listen to speech emitted by a robot (CMU's VIKIA) and to write down what they heard. All of the speech was natural prerecorded speech (not synthetic) read by one female speaker. There were four listening conditions: (a) only speech emitted, (b) robot moves before emitting speech, (c) face has lip movement during speech, (d) both (b) and (c). There were very few errors for conditions (b), (c), and (d), but errors existed for condition (a). The presentation will discuss experimental conditions, show actual figures and try to draw conclusions for speech communication between computers and the elderly.

  6. Music training and speech perception: a gene–environment interaction

    National Research Council Canada - National Science Library

    Schellenberg, E. Glenn

    2015-01-01

    Claims of beneficial side effects of music training are made for many different abilities, including verbal and visuospatial abilities, executive functions, working memory, IQ, and speech perception in particular...

  7. Speech Perception as a Cognitive Process: The Interactive Activation Model.

    Science.gov (United States)

    Elman, Jeffrey L.; McClelland, James L.

    Research efforts to model speech perception in terms of a processing system in which knowledge and processing are distributed over large numbers of highly interactive--but computationally primative--elements are described in this report. After discussing the properties of speech that demand a parallel interactive processing system, the report…

  8. Individual Differences in Premotor and Motor Recruitment during Speech Perception

    Science.gov (United States)

    Szenkovits, Gayaneh; Peelle, Jonathan E.; Norris, Dennis; Davis, Matthew H.

    2012-01-01

    Although activity in premotor and motor cortices is commonly observed in neuroimaging studies of spoken language processing, the degree to which this activity is an obligatory part of everyday speech comprehension remains unclear. We hypothesised that rather than being a unitary phenomenon, the neural response to speech perception in motor regions…

  9. Cognitive Control Factors in Speech Perception at 11 Months

    Science.gov (United States)

    Conboy, Barbara T.; Sommerville, Jessica A.; Kuhl, Patricia K.

    2008-01-01

    The development of speech perception during the 1st year reflects increasing attunement to native language features, but the mechanisms underlying this development are not completely understood. One previous study linked reductions in nonnative speech discrimination to performance on nonlinguistic tasks, whereas other studies have shown…

  10. Beat Gestures Modulate Auditory Integration in Speech Perception

    Science.gov (United States)

    Biau, Emmanuel; Soto-Faraco, Salvador

    2013-01-01

    Spontaneous beat gestures are an integral part of the paralinguistic context during face-to-face conversations. Here we investigated the time course of beat-speech integration in speech perception by measuring ERPs evoked by words pronounced with or without an accompanying beat gesture, while participants watched a spoken discourse. Words…

  11. Speech perception of noise with binary gains

    DEFF Research Database (Denmark)

    Wang, DeLiang; Kjems, Ulrik; Pedersen, Michael Syskind

    2008-01-01

    For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed by the i......For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed...

  12. Caveat Emptor: The Meaning of Perception and Integration in Speech Perception

    OpenAIRE

    Dominic Massaro

    2009-01-01

    A recent letter^1^ claimed integration of auditory and tactile information in speech perception. Although I have been an advocate of multisensory integration, neither perception nor integration was sufficiently formalized, operationalized, and tested to support this claim.

  13. Plasticity in the human speech motor system drives changes in speech perception.

    Science.gov (United States)

    Lametti, Daniel R; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M; Ostry, David J

    2014-07-30

    Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds. Copyright © 2014 the authors 0270-6474/14/3410339-08$15.00/0.

  14. Visual speech influences speech perception immediately but not automatically.

    Science.gov (United States)

    Mitterer, Holger; Reinisch, Eva

    2017-02-01

    Two experiments examined the time course of the use of auditory and visual speech cues to spoken word recognition using an eye-tracking paradigm. Results of the first experiment showed that the use of visual speech cues from lipreading is reduced if concurrently presented pictures require a division of attentional resources. This reduction was evident even when listeners' eye gaze was on the speaker rather than the (static) pictures. Experiment 2 used a deictic hand gesture to foster attention to the speaker. At the same time, the visual processing load was reduced by keeping the visual display constant over a fixed number of successive trials. Under these conditions, the visual speech cues from lipreading were used. Moreover, the eye-tracking data indicated that visual information was used immediately and even earlier than auditory information. In combination, these data indicate that visual speech cues are not used automatically, but if they are used, they are used immediately.

  15. Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

    Directory of Open Access Journals (Sweden)

    Vincent Aubanel

    2016-08-01

    Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

  16. Perception of Speech by Individuals with Parkinson's Disease: A Review

    Science.gov (United States)

    Kwan, Lorinda C.; Whitehill, Tara L.

    2011-01-01

    A few clinical reports and empirical studies have suggested a possible deficit in the perception of speech in individuals with Parkinson's disease. In this paper, these studies are reviewed in an attempt to support clinical anecdotal observations by relevant empirical research findings. The combined evidence suggests a possible deficit in patients' perception of their own speech loudness. Other research studies on the perception of speech in this population were reviewed, in a broader scope of the perception of emotional prosody. These studies confirm that Parkinson's disease specifically impairs patients' perception of verbal emotions. However, explanations of the nature and causes of this perceptual deficit are still limited. Future research directions are suggested. PMID:21961077

  17. Perception of Speech by Individuals with Parkinson's Disease: A Review

    Directory of Open Access Journals (Sweden)

    Lorinda C. Kwan

    2011-01-01

    Full Text Available A few clinical reports and empirical studies have suggested a possible deficit in the perception of speech in individuals with Parkinson's disease. In this paper, these studies are reviewed in an attempt to support clinical anecdotal observations by relevant empirical research findings. The combined evidence suggests a possible deficit in patients' perception of their own speech loudness. Other research studies on the perception of speech in this population were reviewed, in a broader scope of the perception of emotional prosody. These studies confirm that Parkinson's disease specifically impairs patients' perception of verbal emotions. However, explanations of the nature and causes of this perceptual deficit are still limited. Future research directions are suggested.

  18. Assessing the effects of audiovisual semantic congruency on the perception of a bistable figure.

    Science.gov (United States)

    Hsiao, Jhih-Yun; Chen, Yi-Chuan; Spence, Charles; Yeh, Su-Ling

    2012-06-01

    Bistable figures provide a fascinating window through which to explore human visual awareness. Here we demonstrate for the first time that the semantic context provided by a background auditory soundtrack (the voice of a young or old female) can modulate an observer's predominant percept while watching the bistable "my wife or my mother-in-law" figure (Experiment 1). The possibility of a response-bias account-that participants simply reported the percept that happened to be congruent with the soundtrack that they were listening to-was excluded in Experiment 2. We further demonstrate that this crossmodal semantic effect was additive with the manipulation of participants' visual fixation (Experiment 3), while it interacted with participants' voluntary attention (Experiment 4). These results indicate that audiovisual semantic congruency constrains the visual processing that gives rise to the conscious perception of bistable visual figures. Crossmodal semantic context therefore provides an important mechanism contributing to the emergence of visual awareness. Copyright © 2012 Elsevier Inc. All rights reserved.

  19. Development and validation of the Mandarin speech perception test.

    Science.gov (United States)

    Fu, Qian-Jie; Zhu, Meimei; Wang, Xiaosong

    2011-06-01

    Currently there are few standardized speech testing materials for Mandarin-speaking cochlear implant (CI) listeners. In this study, Mandarin speech perception (MSP) sentence test materials were developed and validated in normal-hearing subjects listening to acoustic simulations of CI processing. Percent distribution of vowels, consonants, and tones within each MSP sentence list was similar to that observed across commonly used Chinese characters. There was no significant difference in sentence recognition across sentence lists. Given the phonetic balancing within lists and the validation with spectrally degraded speech, the present MSP test materials may be useful for assessing speech performance of Mandarin-speaking CI listeners. © 2011 Acoustical Society of America

  20. Listeners' perceptions of speech and language disorders.

    Science.gov (United States)

    Allard, Emily R; Williams, Dale F

    2008-01-01

    Using semantic differential scales with nine trait pairs, 445 adults rated five audio-taped speech samples, one depicting an individual without a disorder and four portraying communication disorders. Statistical analyses indicated that the no disorder sample was rated higher with respect to the trait of employability than were the articulation, voice, and language disorder conditions; and higher in self-esteem than the fluency, voice, and language disorders. In addition, there were differences among the disorders. Most notably, the language disordered condition was rated significantly lower in decisiveness and reliability and higher in stress level than all other conditions. Within-subject analyses indicated that the variables of age, gender, exposure to individuals with communication disorders, and urban versus rural residency did not affect ratings. These results support previous research indicating the existence of negative stereotypes toward individuals with communication disorders. In addition, they reveal differences in how various disorders were perceived. Participants will be able to: (1) identify the different methods investigators have used to examine perceptions toward individuals with communicative disorder, (2) recognize that there are differences in how the various communicative disorders are perceived, and (3) discuss the need for public education in order to dispel stereotypes associated with communicative disorders.

  1. Individual differences in speech-in-noise perception parallel neural speech processing and attention in preschoolers.

    Science.gov (United States)

    Thompson, Elaine C; Woodruff Carr, Kali; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina

    2017-02-01

    From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3-5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ∼12 months), we followed a cohort of 59 preschoolers, ages 3.0-4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Neural correlates of quality perception for complex speech signals

    CERN Document Server

    Antons, Jan-Niklas

    2015-01-01

    This book interconnects two essential disciplines to study the perception of speech: Neuroscience and Quality of Experience, which to date have rarely been used together for the purposes of research on speech quality perception. In five key experiments, the book demonstrates the application of standard clinical methods in neurophysiology on the one hand, and of methods used in fields of research concerned with speech quality perception on the other. Using this combination, the book shows that speech stimuli with different lengths and different quality impairments are accompanied by physiological reactions related to quality variations, e.g., a positive peak in an event-related potential. Furthermore, it demonstrates that – in most cases – quality impairment intensity has an impact on the intensity of physiological reactions.

  3. Evidence for Cerebellar Contributions to Adaptive Plasticity in Speech Perception.

    Science.gov (United States)

    Guediche, Sara; Holt, Lori L; Laurent, Patryk; Lim, Sung-Joo; Fiez, Julie A

    2015-07-01

    Human speech perception rapidly adapts to maintain comprehension under adverse listening conditions. For example, with exposure listeners can adapt to heavily accented speech produced by a non-native speaker. Outside the domain of speech perception, adaptive changes in sensory and motor processing have been attributed to cerebellar functions. The present functional magnetic resonance imaging study investigates whether adaptation in speech perception also involves the cerebellum. Acoustic stimuli were distorted using a vocoding plus spectral-shift manipulation and presented in a word recognition task. Regions in the cerebellum that showed differences before versus after adaptation were identified, and the relationship between activity during adaptation and subsequent behavioral improvements was examined. These analyses implicated the right Crus I region of the cerebellum in adaptive changes in speech perception. A functional correlation analysis with the right Crus I as a seed region probed for cerebral cortical regions with covarying hemodynamic responses during the adaptation period. The results provided evidence of a functional network between the cerebellum and language-related regions in the temporal and parietal lobes of the cerebral cortex. Consistent with known cerebellar contributions to sensorimotor adaptation, cerebro-cerebellar interactions may support supervised learning mechanisms that rely on sensory prediction error signals in speech perception. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Children's perception of their synthetically corrected speech production.

    Science.gov (United States)

    Strömbergsson, Sofia; Wengelin, Asa; House, David

    2014-06-01

    We explore children's perception of their own speech - in its online form, in its recorded form, and in synthetically modified forms. Children with phonological disorder (PD) and children with typical speech and language development (TD) performed tasks of evaluating accuracy of the different types of speech stimuli, either immediately after having produced the utterance or after a delay. In addition, they performed a task designed to assess their ability to detect synthetic modification. Both groups showed high performance in tasks involving evaluation of other children's speech, whereas in tasks of evaluating one's own speech, the children with PD were less accurate than their TD peers. The children with PD were less sensitive to misproductions in immediate conjunction with their production of an utterance, and more accurate after a delay. Within-category modification often passed undetected, indicating a satisfactory quality of the generated speech. Potential clinical benefits of using corrective re-synthesis are discussed.

  5. Speech and music perception with the new fine structure speech coding strategy: preliminary results.

    Science.gov (United States)

    Arnoldner, Christoph; Riss, Dominik; Brunner, Markus; Durisin, Martin; Baumgartner, Wolf-Dieter; Hamzavi, Jafar-Sasan

    2007-12-01

    Taking into account the excellent results with significant improvements in the speech tests and the very high satisfaction of the patients using the new strategy, this first implementation of a fine structure strategy could offer a new quality of hearing with cochlear implants (CIs). This study consisted of an intra-individual comparison of speech recognition, music perception and patient preference when subjects used two different speech coding strategies with a MedEl Pulsar CI: continuous interleaved sampling (CIS) and the new fine structure processing (FSP) strategy. In contrast to envelope-based strategies, the FSP strategy also delivers subtle pitch and timing differences of sound to the user and is thereby supposed to enhance speech perception in noise and increase the quality of music perception. This was a prospective study assessing performance with two different speech coding strategies. The setting was a CI programme at an academic tertiary referral centre. Fourteen post-lingually deaf patients using a MedEl Pulsar CI with a mean CI experience of 0.98 years were supplied with the new FSP speech coding strategy. Subjects consecutively used the two different speech coding strategies. Speech and music tests were performed with the previously fitted CIS strategy, immediately after fitting with the new FSP strategy and 4, 8 and 12 weeks later. The main outcome measures were individual performance and subjective assessment of two different speech processors. Speech and music test scores improved statistically significantly after conversion from CIS to FSP strategy. Twelve of 14 patients preferred the new FSP speech processing strategy over the CIS strategy.

  6. Bilingualism affects audiovisual phoneme identification.

    Science.gov (United States)

    Burfin, Sabine; Pascalis, Olivier; Ruiz Tada, Elisa; Costa, Albert; Savariaux, Christophe; Kandel, Sonia

    2014-01-01

    We all go through a process of perceptual narrowing for phoneme identification. As we become experts in the languages we hear in our environment we lose the ability to identify phonemes that do not exist in our native phonological inventory. This research examined how linguistic experience-i.e., the exposure to a double phonological code during childhood-affects the visual processes involved in non-native phoneme identification in audiovisual speech perception. We conducted a phoneme identification experiment with bilingual and monolingual adult participants. It was an ABX task involving a Bengali dental-retroflex contrast that does not exist in any of the participants' languages. The phonemes were presented in audiovisual (AV) and audio-only (A) conditions. The results revealed that in the audio-only condition monolinguals and bilinguals had difficulties in discriminating the retroflex non-native phoneme. They were phonologically "deaf" and assimilated it to the dental phoneme that exists in their native languages. In the audiovisual presentation instead, both groups could overcome the phonological deafness for the retroflex non-native phoneme and identify both Bengali phonemes. However, monolinguals were more accurate and responded quicker than bilinguals. This suggests that bilinguals do not use the same processes as monolinguals to decode visual speech.

  7. Bilingualism affects audiovisual phoneme identification

    Directory of Open Access Journals (Sweden)

    Sabine eBurfin

    2014-10-01

    Full Text Available We all go through a process of perceptual narrowing for phoneme identification. As we become experts in the languages we hear in our environment we lose the ability to identify phonemes that do not exist in our native phonological inventory. This research examined how linguistic experience –i.e., the exposure to a double phonological code during childhood– affects the visual processes involved in non-native phoneme identification in audiovisual speech perception. We conducted a phoneme identification experiment with bilingual and monolingual adult participants. It was an ABX task involving a Bengali dental-retroflex contrast that does not exist in any of the participants’ languages. The phonemes were presented in audiovisual (AV and audio-only (A conditions. The results revealed that in the audio-only condition monolinguals and bilinguals had difficulties in discriminating the retroflex non-native phoneme. They were phonologically deaf and assimilated it to the dental phoneme that exists in their native languages. In the audiovisual presentation instead, both groups could overcome the phonological deafness for the retroflex non-native phoneme and identify both Bengali phonemes. However, monolinguals were more accurate and responded quicker than bilinguals. This suggests that bilinguals do not use the same processes as monolinguals to decode visual speech.

  8. Classifying Schizotypy Using an Audiovisual Emotion Perception Test and Scalp Electroencephalography

    Directory of Open Access Journals (Sweden)

    Ji Woon Jeong

    2017-09-01

    Full Text Available Schizotypy refers to the personality trait of experiencing “psychotic” symptoms and can be regarded as a predisposition of schizophrenia-spectrum psychopathology (Raine, 1991. Cumulative evidence has revealed that individuals with schizotypy, as well as schizophrenia patients, have emotional processing deficits. In the present study, we investigated multimodal emotion perception in schizotypy and implemented the machine learning technique to find out whether a schizotypy group (ST is distinguishable from a control group (NC, using electroencephalogram (EEG signals. Forty-five subjects (30 ST and 15 NC were divided into two groups based on their scores on a Schizotypal Personality Questionnaire. All participants performed an audiovisual emotion perception test while EEG was recorded. After the preprocessing stage, the discriminatory features were extracted using a mean subsampling technique. For an accurate estimation of covariance matrices, the shrinkage linear discriminant algorithm was used. The classification attained over 98% accuracy and zero rate of false-positive results. This method may have important clinical implications in discriminating those among the general population who have a subtle risk for schizotypy, requiring intervention in advance.

  9. Perception of words and pitch patterns in song and speech

    Directory of Open Access Journals (Sweden)

    Julia eMerrill

    2012-03-01

    Full Text Available This fMRI study examines shared and distinct cortical areas involved in the auditory perception of song and speech at the level of their underlying constituents: words, pitch and rhythm. Univariate and multivariate analyses were performed on the brain activity patterns of six conditions, arranged in a subtractive hierarchy: sung sentences including words, pitch and rhythm; hummed speech prosody and song melody containing only pitch patterns and rhythm; as well as the pure musical or speech rhythm.Systematic contrasts between these balanced conditions following their hierarchical organization showed a great overlap between song and speech at all levels in the bilateral temporal lobe, but suggested a differential role of the inferior frontal gyrus (IFG and intraparietal sulcus (IPS in processing song and speech. The left IFG was involved in word- and pitch-related processing in speech, the right IFG in processing pitch in song.Furthermore, the IPS showed sensitivity to discrete pitch relations in song as opposed to the gliding pitch in speech. Finally, the superior temporal gyrus and premotor cortex coded for general differences between words and pitch patterns, irrespective of whether they were sung or spoken. Thus, song and speech share many features which are reflected in a fundamental similarity of brain areas involved in their perception. However, fine-grained acoustic differences on word and pitch level are reflected in the activity of IFG and IPS.

  10. Speech perception at the interface of neurobiology and linguistics.

    Science.gov (United States)

    Poeppel, David; Idsardi, William J; van Wassenhove, Virginie

    2008-03-12

    Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.

  11. Development and validation of the Mandarin speech perception test

    OpenAIRE

    Fu, Qian-Jie; Zhu, Meimei; Wang, Xiaosong

    2011-01-01

    Currently there are few standardized speech testing materials for Mandarin-speaking cochlear implant (CI) listeners. In this study, Mandarin speech perception (MSP) sentence test materials were developed and validated in normal-hearing subjects listening to acoustic simulations of CI processing. Percent distribution of vowels, consonants, and tones within each MSP sentence list was similar to that observed across commonly used Chinese characters. There was no significant difference in sentenc...

  12. Speech Acquisition in Meetings with an Audio-Visual Sensor Array

    OpenAIRE

    McCowan, Iain A.; Krishna, Maganti Hari; Gatica-Perez, Daniel; Moore, Darren; Ba, Silèye O.

    2005-01-01

    Close-talk headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio -needed for recognition tasks- than single distant microphones. However, in multi-party conversational settings like meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intr...

  13. Categorical perception of speech sounds via the tactile mode.

    Science.gov (United States)

    Collins, M J; Hurtig, R R

    1985-12-01

    The usefulness of tactile devices as aids to lipreading has been established. However, maximum usefulness in reducing the ambiguity of lipreading cues and/or use of tactile devices as a substitute for audition may be dependent on phonemic recognition via tactile signals alone. In the present study, a categorical perception paradigm was used to evaluate tactile perception of speech sounds in comparison to auditory perception. The results show that speech signals delivered by tactile stimulation can be categorically perceived on a voice-onset time (VOT) continuum. The boundary for the voiced-voiceless distinction falls at longer VOTs for tactile than for auditory perception. It is concluded that the procedure is useful for determining characteristics of tactile perception and for prosthesis evaluation.

  14. The effects of noise vocoding on speech quality perception.

    Science.gov (United States)

    Anderson, Melinda C; Arehart, Kathryn H; Kates, James M

    2014-03-01

    Speech perception depends on access to spectral and temporal acoustic cues. Temporal cues include slowly varying amplitude changes (i.e. temporal envelope, TE) and quickly varying amplitude changes associated with the center frequency of the auditory filter (i.e. temporal fine structure, TFS). This study quantifies the effects of TFS randomization through noise vocoding on the perception of speech quality by parametrically varying the amount of original TFS available above 1500Hz. The two research aims were: 1) to establish the role of TFS in quality perception, and 2) to determine if the role of TFS in quality perception differs between subjects with normal hearing and subjects with sensorineural hearing loss. Ratings were obtained from 20 subjects (10 with normal hearing and 10 with hearing loss) using an 11-point quality scale. Stimuli were processed in three different ways: 1) A 32-channel noise-excited vocoder with random envelope fluctuations in the noise carrier, 2) a 32-channel noise-excited vocoder with the noise-carrier envelope smoothed, and 3) removal of high-frequency bands. Stimuli were presented in quiet and in babble noise at 18dB and 12dB signal-to-noise ratios. TFS randomization had a measurable detrimental effect on quality ratings for speech in quiet and a smaller effect for speech in background babble. Subjects with normal hearing and subjects with sensorineural hearing loss provided similar quality ratings for noise-vocoded speech. Copyright © 2013 Elsevier B.V. All rights reserved.

  15. Speech perception in noise in unilateral hearing loss.

    Science.gov (United States)

    Mondelli, Maria Fernanda Capoani Garcia; Dos Santos, Marina de Marchi; José, Maria Renata

    2016-01-01

    Unilateral hearing loss is characterized by a decrease of hearing in one ear only. In the presence of ambient noise, individuals with unilateral hearing loss are faced with greater difficulties understanding speech than normal listeners. To evaluate the speech perception of individuals with unilateral hearing loss in speech perception with and without competitive noise, before and after the hearing aid fitting process. The study included 30 adults of both genders diagnosed with moderate or severe sensorineural unilateral hearing loss using the Hearing In Noise Test - Hearing In Noise Test-Brazil, in the following scenarios: silence, frontal noise, noise to the right, and noise to the left, before and after the hearing aid fitting process. The study participants had a mean age of 41.9 years and most of them presented right unilateral hearing loss. In all cases evaluated with Hearing In Noise Test, a better performance in speech perception was observed with the use of hearing aids. Using the Hearing In Noise Test-Brazil test evaluation, individuals with unilateral hearing loss demonstrated better performance in speech perception when using hearing aids, both in silence and in situations with a competing noise, with use of hearing aids. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.

  16. Temporal cortex activation to audiovisual speech in normal-hearing and cochlear implant users measured with functional near-infrared spectroscopy

    Directory of Open Access Journals (Sweden)

    Luuk P.H. van de Rijt

    2016-02-01

    Full Text Available BackgroundSpeech understanding may rely not only on auditory, but also on visual information. Non-invasive functional neuroimaging techniques can expose the neural processes underlying the integration of multisensory processes required for speech understanding in humans. Nevertheless, noise (from fMRI limits the usefulness in auditory experiments, and electromagnetic artefacts caused by electronic implants worn by subjects can severely distort the scans (EEG, fMRI. Therefore, we assessed audio-visual activation of temporal cortex with a silent, optical neuroimaging technique: functional near-infrared spectroscopy (fNIRS.MethodsWe studied temporal cortical activation as represented by concentration changes of oxy- and deoxy-hemoglobin in four, easy-to-apply fNIRS optical channels of 33 normal-hearing adult subjects and 5 post-lingually deaf cochlear implant (CI users in response to supra-threshold unisensory auditory and visual, as well as to congruent auditory-visual speech stimuli. ResultsActivation effects were not visible from single fNIRS channels. However, by discounting physiological noise through reference channel subtraction, auditory, visual and audiovisual speech stimuli evoked concentration changes for all sensory modalities in both cohorts (p<0.001. Auditory stimulation evoked larger concentration changes than visual stimuli (p<0.001. A saturation effect was observed for the audiovisual condition.ConclusionsPhysiological, systemic noise can be removed from fNIRS signals by reference channel subtraction. The observed multisensory enhancement of an auditory cortical channel can be plausibly described by a simple addition of the auditory and visual signals with saturation.

  17. A New Development in Audiovisual Translation Studies: Focus on Target Audience Perception

    Directory of Open Access Journals (Sweden)

    John Denton

    2013-03-01

    Full Text Available Audiovisual translation is now a well-established sub-discipline of Translation Studies (TS: a position that it has reached over the last twenty years or so. Italian scholars and professionals in the field have made a substantial contribution to this successful development, a brief overview of which will be given in the first part of this article, inevitably concentrating on dubbing in the Italian context. Special attention will be devoted to the question of target audience perception, an area where researchers in the University of Bologna at Forlì have excelled. The second part of the article applies the methodology followed by the above mentioned researchers in a case study of how Italian end users perceive the dubbed version of the British film The History Boys (2006, which contains a plethora of culture-specific verbal and visual references to the English education system. The aim of the study was to ascertain: a whether translation/adaptation allows the transmission in this admittedly constrained medium of all the intended culture-bound issues, only too well known to the source audience, and, if so, to what extent, and b whether the target audience respondents to the e-questionnaire used were aware that they were missing information. The linked, albeit controversial, issue of quality assessment will also be addressed.

  18. Investigating Speech Perception in Children with Dyslexia: Is There Evidence of a Consistent Deficit in Individuals?

    Science.gov (United States)

    Messaoud-Galusi, Souhila; Hazan, Valerie; Rosen, Stuart

    2011-01-01

    Purpose: The claim that speech perception abilities are impaired in dyslexia was investigated in a group of 62 children with dyslexia and 51 average readers matched in age. Method: To test whether there was robust evidence of speech perception deficits in children with dyslexia, speech perception in noise and quiet was measured using 8 different…

  19. Speech perception in noise by monolingual, bilingual and trilingual listeners.

    Science.gov (United States)

    Tabri, Dollen; Abou Chacra, Kim Michelle Smith; Pring, Tim

    2011-01-01

    There is strong evidence that bilinguals have a deficit in speech perception for their second language compared with monolingual speakers under unfavourable listening conditions (e.g., noise or reverberation), despite performing similarly to monolingual speakers under quiet conditions. This deficit persists for speakers highly proficient in their second language and is greater in those who learned the language later in life. These findings have important educational implications because the number of multilingual children is increasing worldwide, and many of these children are being taught in their non-native language under poor classroom acoustic conditions. The performance of monolingual, bilingual and trilingual speakers on an English speech perception task was examined in both quiet and noisy conditions. Trilingual performance was compared with that of monolingual and bilingual speakers. Monolingual speakers of English and early bilingual and trilingual speakers (i.e., acquired English as a second/third language before the age of 6 years) were recruited. Their fluency in English was tested by interview and by a questionnaire assessing their knowledge and use of the language. Audiological evaluation confirmed normal hearing in all participants. English speech perception was tested in quiet and in different levels of noise (50, 55, 60, 65 and 70 dB SPL) using the Speech Perception in Noise (SPIN) Test. Bilingual and trilingual listeners performed similarly to monolingual listeners in quiet conditions, but their performance declined more rapidly in noise and was significantly poorer at 65 and 70 dB SPL. Trilingual listeners performed less well than bilinguals at these noise levels, but not significantly so. A subgroup of five bilingual speakers who learned Arabic and English simultaneously since birth were poorer at higher levels of noise than monolinguals, but not significantly so. The results replicate previous findings of poorer speech perception in noise with

  20. Visual Cues Contribute Differentially to Audiovisual Perception of Consonants and Vowels in Improving Recognition and Reducing Cognitive Demands in Listeners with Hearing Impairment Using Hearing Aids

    Science.gov (United States)

    Moradi, Shahram; Lidestam, Bjorn; Danielsson, Henrik; Ng, Elaine Hoi Ning; Ronnberg, Jerker

    2017-01-01

    Purpose: We sought to examine the contribution of visual cues in audiovisual identification of consonants and vowels--in terms of isolation points (the shortest time required for correct identification of a speech stimulus), accuracy, and cognitive demands--in listeners with hearing impairment using hearing aids. Method: The study comprised 199…

  1. An interactive model of auditory-motor speech perception.

    Science.gov (United States)

    Liebenthal, Einat; Möttönen, Riikka

    2017-12-18

    Mounting evidence indicates a role in perceptual decoding of speech for the dorsal auditory stream connecting between temporal auditory and frontal-parietal articulatory areas. The activation time course in auditory, somatosensory and motor regions during speech processing is seldom taken into account in models of speech perception. We critically review the literature with a focus on temporal information, and contrast between three alternative models of auditory-motor speech processing: parallel, hierarchical, and interactive. We argue that electrophysiological and transcranial magnetic stimulation studies support the interactive model. The findings reveal that auditory and somatomotor areas are engaged almost simultaneously, before 100 ms. There is also evidence of early interactions between auditory and motor areas. We propose a new interactive model of auditory-motor speech perception in which auditory and articulatory somatomotor areas are connected from early stages of speech processing. We also discuss how attention and other factors can affect the timing and strength of auditory-motor interactions and propose directions for future research. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Heartbeat perception in social anxiety before and during speech anticipation.

    Science.gov (United States)

    Stevens, Stephan; Gerlach, Alexander L; Cludius, Barbara; Silkens, Anna; Craske, Michelle G; Hermann, Christiane

    2011-02-01

    According to current cognitive models of social phobia, individuals with social anxiety create a distorted image of themselves in social situations, relying, at least partially, on interoceptive cues. We investigated differences in heartbeat perception as a proxy of interoception in 48 individuals high and low in social anxiety at baseline and while anticipating a public speech. Results revealed lower error scores for high fearful participants both at baseline and during speech anticipation. Speech anticipation improved heartbeat perception in both groups only marginally. Eight of nine accurate perceivers as determined using a criterion of maximum difference between actual and counted beats were high socially anxious. Higher interoceptive accuracy might increase the risk of misinterpreting physical symptoms as visible signs of anxiety which then trigger negative evaluation by others. Treatment should take into account that in socially anxious individuals perceived physical arousal is likely to be accurate rather than false alarm. Copyright © 2010 Elsevier Ltd. All rights reserved.

  3. Do temporal processes underlie left hemisphere dominance in speech perception?

    Science.gov (United States)

    Scott, Sophie K; McGettigan, Carolyn

    2013-10-01

    It is not unusual to find it stated as a fact that the left hemisphere is specialized for the processing of rapid, or temporal aspects of sound, and that the dominance of the left hemisphere in the perception of speech can be a consequence of this specialization. In this review we explore the history of this claim and assess the weight of this assumption. We will demonstrate that instead of a supposed sensitivity of the left temporal lobe for the acoustic properties of speech, it is the right temporal lobe which shows a marked preference for certain properties of sounds, for example longer durations, or variations in pitch. We finish by outlining some alternative factors that contribute to the left lateralization of speech perception. Copyright © 2013. Published by Elsevier Inc.

  4. Performance of children with mental retardation after cochlear implantation: speech perception, speech intelligibility, and language development.

    Science.gov (United States)

    Lee, Young-Mee; Kim, Lee-Suk; Jeong, Sung-Wook; Kim, Jeong-Seo; Chung, Seung-Hyun

    2010-08-01

    Children with mental retardation (MR) obtain demonstrable benefit from cochlear implantation, and their postoperative performance was tempered by the degree of MR. The purpose of this study was to investigate the performance of children with MR after implantation, and to explore their progress according to the degree of MR. Fifteen implanted children with MR were included. Progress in speech perception, speech intelligibility, and language was measured using Categories of Auditory Performance, monosyllabic word test, Speech Intelligibility Rating, and Language Scale before and after implantation. We retrospectively examined outcomes and explored the association between the progress and the degree of MR after implantation. We compared monosyllabic word test scores using repeated-measures ANOVA. Speech perception and speech intelligibility for children with mild MR improved consistently after implantation. After implantation, monosyllabic word test scores did not differ significantly between children with mild MR and children with no additional disabilities. Although language development of children with mild MR was slow, they could communicate verbally 3 years after implantation. Children with moderate MR progressed more slowly and had limitations in speech and language development, and these children could communicate by vocalization and gesture 3 years after implantation.

  5. Visual speech acts differently than lexical context in supporting speech perception.

    Science.gov (United States)

    Samuel, Arthur G; Lieblich, Jerrold

    2014-08-01

    The speech signal is often badly articulated, and heard under difficult listening conditions. To deal with these problems, listeners make use of various types of context. In the current study, we examine a type of context that in previous work has been shown to affect how listeners report what they hear: visual speech (i.e., the visible movements of the speaker's articulators). Despite the clear utility of this type of context under certain conditions, prior studies have shown that visually driven phonetic percepts (via the "McGurk" effect) are not "real" enough to affect perception of later-occurring speech; such percepts have not produced selective adaptation effects. This failure contrasts with successful adaptation by sounds that are generated by lexical context-the word that a sound occurs within. We demonstrate here that this dissociation is robust, leading to the conclusion that visual and lexical contexts operate differently. We suggest that the dissociation reflects the dual nature of speech as both a perceptual object and a linguistic object. Visual speech seems to contribute directly to the computations of the perceptual object but not the linguistic one, while lexical context is used in both types of computations.

  6. Multisensory Speech Perception by Profoundly Hearing-Impaired Children.

    Science.gov (United States)

    Lynch, Michael P.; And Others

    1989-01-01

    Eight profoundly hearing-impaired children, aged 5-11, received tactual word recognition training with tactual speech perception aids. Following training, subjects were tested on trained words and new words. Performance was significantly better on both sets of words when words were presented with a combined condition of tactual aid and aided…

  7. Speech Perception Deficits by Chinese Children with Phonological Dyslexia

    Science.gov (United States)

    Liu, Wenli; Shu, Hua; Yang, Yufang

    2009-01-01

    Findings concerning the relation between dyslexia and speech perception deficits are inconsistent in the literature. This study examined the relation in Chinese children using a more homogeneous sample--children with phonological dyslexia. Two experimental tasks were administered to a group of Chinese children with phonological dyslexia, a group…

  8. Vocabulary Facilitates Speech Perception in Children with Hearing Aids

    Science.gov (United States)

    Klein, Kelsey E.; Walker, Elizabeth A.; Kirby, Benjamin; McCreery, Ryan W.

    2017-01-01

    Purpose: We examined the effects of vocabulary, lexical characteristics (age of acquisition and phonotactic probability), and auditory access (aided audibility and daily hearing aid [HA] use) on speech perception skills in children with HAs. Method: Participants included 24 children with HAs and 25 children with normal hearing (NH), ages 5-12…

  9. Cross-language and second language speech perception

    DEFF Research Database (Denmark)

    Bohn, Ocke-Schwen

    2017-01-01

    This chapter provides an overview of the main research questions and findings in the areas of second language and cross-language speech perception research, and of the most widely used models that have guided this research. The overview is structured in a way that addresses three overarching topi...

  10. Are there really interactive processes in speech perception?

    NARCIS (Netherlands)

    McQueen, J.M.; Norris, D.G.; Cutler, A.

    2006-01-01

    On both empirical and theoretical grounds, we argue that the affirmative answer of McClelland et al. [1] is premature. Contrary to the predictions of the TRACE model, which postulates interactive processing in speech perception, there is no lexically mediated compensation for coarticulation when

  11. Auditory Sensitivity, Speech Perception, and Reading Development and Impairment

    Science.gov (United States)

    Zhang, Juan; McBride-Chang, Catherine

    2010-01-01

    While the importance of phonological sensitivity for understanding reading acquisition and impairment across orthographies is well documented, what underlies deficits in phonological sensitivity is not well understood. Some researchers have argued that speech perception underlies variability in phonological representations. Others have…

  12. Visual Influences on Speech Perception in Children with Autism

    Science.gov (United States)

    Iarocci, Grace; Rombough, Adrienne; Yager, Jodi; Weeks, Daniel J.; Chua, Romeo

    2010-01-01

    The bimodal perception of speech sounds was examined in children with autism as compared to mental age--matched typically developing (TD) children. A computer task was employed wherein only the mouth region of the face was displayed and children reported what they heard or saw when presented with consonant-vowel sounds in unimodal auditory…

  13. The Role of the Listener's State in Speech Perception

    Science.gov (United States)

    Viswanathan, Navin

    2009-01-01

    Accounts of speech perception disagree on whether listeners perceive the acoustic signal (Diehl, Lotto, & Holt, 2004) or the vocal tract gestures that produce the signal (e.g., Fowler, 1986). In this dissertation, I outline a research program using a phenomenon called "perceptual compensation for coarticulation" (Mann, 1980) to examine this…

  14. Prosody Production and Perception with Conversational Speech

    Science.gov (United States)

    Mo, Yoonsook

    2010-01-01

    Speech utterances are more than the linear concatenation of individual phonemes or words. They are organized by prosodic structures comprising phonological units of different sizes (e.g., syllable, foot, word, and phrase) and the prominence relations among them. As the linguistic structure of spoken languages, prosody serves an important function…

  15. Listeners' Perceptions of Speech and Language Disorders

    Science.gov (United States)

    Allard, Emily R.; Williams, Dale F.

    2008-01-01

    Using semantic differential scales with nine trait pairs, 445 adults rated five audio-taped speech samples, one depicting an individual without a disorder and four portraying communication disorders. Statistical analyses indicated that the no disorder sample was rated higher with respect to the trait of employability than were the articulation,…

  16. Visual prosody and speech intelligibility: head movement improves auditory speech perception.

    Science.gov (United States)

    Munhall, K G; Jones, Jeffery A; Callan, Daniel E; Kuratate, Takaaki; Vatikiotis-Bateson, Eric

    2004-02-01

    People naturally move their heads when they speak, and our study shows that this rhythmic head motion conveys linguistic information. Three-dimensional head and face motion and the acoustics of a talker producing Japanese sentences were recorded and analyzed. The head movement correlated strongly with the pitch (fundamental frequency) and amplitude of the talker's voice. In a perception study, Japanese subjects viewed realistic talking-head animations based on these movement recordings in a speech-in-noise task. The animations allowed the head motion to be manipulated without changing other characteristics of the visual or acoustic speech. Subjects correctly identified more syllables when natural head motion was present in the animation than when it was eliminated or distorted. These results suggest that nonverbal gestures such as head movements play a more direct role in the perception of speech than previously known.

  17. Practical evaluation procedure to assess and remediate speech perception skills.

    Science.gov (United States)

    Vergara, K C; Miskiel, L W; Oller, D K; Eilers, R E

    1997-01-01

    The University of Miami/Dade County Public Schools Model Program for the Deaf and Hard of Hearing is a research and training effort dedicated to the utilization of sensory aids including hearing aids, tactual vocoders, and cochlear implants. The program's teachers and clinicians follow the Miami Cochlear Implant, Auditory, and Tactile Skills (CHATS) Curriculum for the development of individualized speech perception and production goals. A series of peech perception tests has been used for the past five years to evaluate the children's progress. The test battery, administered at six month intervals, is extensive and impractical for school clinicians and teachers to administer to their students. To assist teachers and clinicians in the process of selecting appropriate goals and objectives for sensory aid training, a speech perception test has been developed to accompany the curriculum. This paper includes a discussion of the test design as it correlates with the curriculum.

  18. [Speech perception test in Italian language for profoundly deaf children].

    Science.gov (United States)

    Genovese, E; Orzan, E; Turrini, M; Babighian, G; Arslan, E

    1995-10-01

    Speech perception tests are an important part of procedures for diagnosing pre-verbal hearing loss. Merely establishing a child's hearing threshold with and without a hearing aid is not sufficient to ensure an adequate evaluation with a view to selecting cases suitable for cochlear implants because it fails to indicate the real benefit obtained from using a conventional hearing aid reliably. Speech perception tests have proved useful not only for patient selection, but also for subsequent evaluation of the efficacy of new hearing aids, such as tactile devices and cochlear implants. In clinical practice, the tests most commonly adopted with small children are: The Auditory Comprehension Test (ACT), Discrimination after Training (DAT), Monosyllable, Trochee, Spondee tests (MTS), Glendonald Auditory Screening Priocedure (GASP), Early Speech Perception Test (ESP), Rather than considering specific results achieved in individual cases, reference is generally made to the four speech perception classes proposed by Moog and Geers of the CID of St. Louis. The purpose of this classification, made on the results obtained with suitably differentiated tests according to the child's age and language ability, is to detect differences in perception of a spoken message in ideal listening conditions. To date, no italian language speech perception test has been designed to establish the assessment of speech perception level in children with profound hearing impairment. We attempted, therefore, to adapt the existing English tests to the Italian language taking into consideration the differences between the two languages. Our attention focused on the ESP test since it can be applied to even very small children (2 years old). The ESP is proposed in a standard version for hearing-impaired children over the age of 6 years and in a simplified version for younger children. The rationale we used for selecting Italian words reflect the rationale established for the original version, but the

  19. Perception of co-speech gestures in aphasic patients: a visual exploration study during the observation of dyadic conversations.

    Science.gov (United States)

    Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M

    2015-03-01

    Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. The effects of speech motor preparation on auditory perception

    Science.gov (United States)

    Myers, John

    Perception and action are coupled via bidirectional relationships between sensory and motor systems. Motor systems influence sensory areas by imparting a feedforward influence on sensory processing termed "motor efference copy" (MEC). MEC is suggested to occur in humans because speech preparation and production modulate neural measures of auditory cortical activity. However, it is not known if MEC can affect auditory perception. We tested the hypothesis that during speech preparation auditory thresholds will increase relative to a control condition, and that the increase would be most evident for frequencies that match the upcoming vocal response. Participants performed trials in a speech condition that contained a visual cue indicating a vocal response to prepare (one of two frequencies), followed by a go signal to speak. To determine threshold shifts, voice-matched or -mismatched pure tones were presented at one of three time points between the cue and target. The control condition was the same except the visual cues did not specify a response and subjects did not speak. For each participant, we measured f0 thresholds in isolation from the task in order to establish baselines. Results indicated that auditory thresholds were highest during speech preparation, relative to baselines and a non-speech control condition, especially at suprathreshold levels. Thresholds for tones that matched the frequency of planned responses gradually increased over time, but sharply declined for the mismatched tones shortly before targets. Findings support the hypothesis that MEC influences auditory perception by modulating thresholds during speech preparation, with some specificity relative to the planned response. The threshold increase in tasks vs. baseline may reflect attentional demands of the tasks.

  1. Theta Brain Rhythms Index Perceptual Narrowing in Infant Speech Perception

    Directory of Open Access Journals (Sweden)

    Alexis eBosseler

    2013-10-01

    Full Text Available The development of speech perception shows a dramatic transition between infancy and adulthood. Between 6 and 12 months, infants’ initial ability to discriminate all phonetic units across the worlds’ languages narrows—native discrimination increases while nonnative discrimination shows a steep decline. We used magnetoencephalography (MEG to examine whether brain oscillations in the theta band (4-8Hz, reflecting increases in attention and cognitive effort, would provide a neural measure of the perceptual narrowing phenomenon in speech. Using an oddball paradigm, we varied speech stimuli in two dimensions, stimulus frequency (frequent vs. infrequent and language (native vs. nonnative speech syllables and tested 6-month-old infants, 12-month-old infants, and adults. We hypothesized that 6-month-old infants would show increased relative theta power (RTP for frequent syllables, regardless of their status as native or nonnative syllables, reflecting young infants’ attention and cognitive effort in response to highly frequent stimuli (statistical learning. In adults, we hypothesized increased RTP for nonnative stimuli, regardless of their presentation frequency, reflecting increased cognitive effort for nonnative phonetic categories. The 12-month-old infants were expected to show a pattern in transition, but one more similar to adults than to 6-month-old infants. The MEG brain rhythm results supported these hypotheses. We suggest that perceptual narrowing in speech perception is governed by an implicit learning process. This learning process involves an implicit shift in attention from frequent events (infants to learned categories (adults. Theta brain oscillatory activity may provide an index of perceptual narrowing beyond speech, and would offer a test of whether the early speech learning process is governed by domain-general or domain-specific processes.

  2. Linking speech perception and neurophysiology: speech decoding guided bycascaded oscillators locked to the input rhythm

    Directory of Open Access Journals (Sweden)

    Oded eGhitza

    2011-06-01

    Full Text Available The premise of this study is that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Cascaded cortical oscillations in the theta, beta and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these oscillations remain phase-locked to the auditory input rhythm. A model (Tempo is presented which is capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of packaging rate (Ghitza and Greenberg, 2009. The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate is poor (above 50% word error rate, but is substantially restored when the information stream is re-packaged by the insertion of silence gaps in between successive compressed-signal intervals – a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture.

  3. Towards an understanding of speech and song perception.

    Science.gov (United States)

    van Besouw, Rachel M; Howard, David M; Ternström, Sten

    2005-01-01

    The human singing voice plays an important role in music of all societies. It is an extremely flexible instrument and is capable of producing a tremendous range of sounds. As such, the human voice can be hard to classify and poses a major challenge for automatic audio discrimination and classification systems. Speech/song discrimination is an implicit goal of speech/music discrimination, where a division is sought between speech and song, such that the singing voice can be grouped together with other musical instruments in the same category. However, the division between speech and song is unclear and even human attempts at speech/song discrimination can be highly subjective and open to discussion. In this paper we present the results of a test that was designed to investigate differences in auditory perception for speech and song. Twenty-four subjects were instructed to attend to either the words or pitch, or both words and pitch of context-free spoken and sung phrases. After presentation of each phrase, subjects were asked to either type the words that they recalled, or select the correct pitch contour from a choice of four graphical representations, or do both, depending on the task specified before presentation of the phrase. The results of the experiment show a decrease in the amount of linguistic information retained by subjects for sung phrases and also a decrease in accuracy of response for the sung phrases when subjects attended to both words and pitch instead of words or pitch alone.

  4. Spatial and temporal modifications of multitalker speech can improve speech perception in older adults.

    Science.gov (United States)

    Gygi, Brian; Shafiro, Valeriy

    2014-04-01

    Speech perception in multitalker environments often requires listeners to divide attention among several concurrent talkers before focusing on one talker with pertinent information. Such attentionally demanding tasks are particularly difficult for older adults due both to age-related hearing loss (presbacusis) and general declines in attentional processing and associated cognitive abilities. This study investigated two signal-processing techniques that have been suggested as a means of improving speech perception accuracy of older adults: time stretching and spatial separation of target talkers. Stimuli in each experiment comprised 2-4 fixed-form utterances in which listeners were asked to consecutively 1) detect concurrently spoken keywords in the beginning of the utterance (divided attention); and, 2) identify additional keywords from only one talker at the end of the utterance (selective attention). In Experiment 1, the overall tempo of each utterance was unaltered or slowed down by 25%; in Experiment 2 the concurrent utterances were spatially coincident or separated across a 180-degree hemifield. Both manipulations improved performance for elderly adults with age-appropriate hearing on both tasks. Increasing the divided attention load by attending to more concurrent keywords had a marked negative effect on performance of the selective attention task only when the target talker was identified by a keyword, but not by spatial location. These findings suggest that the temporal and spatial modifications of multitalker speech improved perception of multitalker speech primarily by reducing competition among cognitive resources required to perform attentionally demanding tasks. Published by Elsevier B.V.

  5. Electrophysiological correlates of predictive coding of auditory location in the perception of natural audiovisual events

    Directory of Open Access Journals (Sweden)

    Jeroen eStekelenburg

    2012-05-01

    Full Text Available In many natural audiovisual events (e.g., a clap of the two hands, the visual signal precedes the sound and thus allows observers to predict when, where, and which sound will occur. Previous studies have already reported that there are distinct neural correlates of temporal (when versus phonetic/semantic (which content on audiovisual integration. Here we examined the effect of visual prediction of auditory location (where in audiovisual biological motion stimuli by varying the spatial congruency between the auditory and visual part of the audiovisual stimulus. Visual stimuli were presented centrally, whereas auditory stimuli were presented either centrally or at 90° azimuth. Typical subadditive amplitude reductions (AV – V < A were found for the auditory N1 and P2 for spatially congruent and incongruent conditions. The new finding is that the N1 suppression was larger for spatially congruent stimuli. A very early audiovisual interaction was also found at 30-50 ms in the spatially congruent condition, while no effect of congruency was found on the suppression of the P2. This indicates that visual prediction of auditory location can be coded very early in auditory processing.

  6. The influence of non-native language proficiency on speech perception perfomance

    NARCIS (Netherlands)

    Kilman, L.; Zekveld, A.A.; Hallgren, M.; Ronnberg, J.

    2014-01-01

    The present study examined to what extent proficiency in a non-native language influences speech perception in noise. We explored how English proficiency affected native (Swedish) and non-native (English) speech perception in four speech reception threshold (SRT) conditions, including two energetic

  7. Speech Perception among School-Aged Skilled and Less Skilled Readers

    Science.gov (United States)

    Wayland, Ratree P.; Eckhouse, Erin; Lombardino, Linda; Roberts, Rosalyn

    2010-01-01

    This study investigated the relationship between speech perception, phonological processing and reading skills among school-aged children classified as "skilled" and "less skilled" readers based on their ability to read words, decode non-words, and comprehend short passages. Three speech perception tasks involving categorization of speech continua…

  8. Whether long-term tracking of speech rate affects perception depends on who is talking

    NARCIS (Netherlands)

    Maslowski, M.; Meyer, A.S.; Bosker, H.R.

    2017-01-01

    Speech rate is known to modulate perception of temporally ambiguous speech sounds. For instance, a vowel may be perceived as short when the immediate speech context is slow, but as long when the context is fast. Yet, effects of long-term tracking of speech rate are largely unexplored. Two

  9. [Speech perception with hearing aids in comparison to pure-tone hearing loss].

    Science.gov (United States)

    Hoppe, U; Hast, A; Hocke, T

    2014-06-01

    Speech perception is the most important social task of the auditory system. Consequently, speech audiometry is essential to evaluate hearing aid benefit. The aim of the study was to describe the correlation between pure-tone hearing loss and speech perception. In particular, pure-tone audiogram, speech audiogram, and speech perception with hearing aids were compared. In a retrospective study, 102 hearing aid users with bilateral sensorineural hearing loss were included. Pure-tone loss (PTA) was correlated to monosyllabic perception at 65 dB with hearing aid and with maximum monosyllabic perception with headphones. Speech perception as a function of hearing loss can be represented by a sigmoid function. However, for higher degrees of hearing loss, substantial deviations are observed. Maximum monosyllabic perception with headphones is usually not achieved with hearing aids at standard speech levels of 65 dB. For larger groups, average pure-tone hearing loss and speech perception correlate significantly. However, prognosis for individuals is not possible. In particular for higher degrees of hearing loss substantial deviations could be observed. Speech performance with hearing aids cannot be predicted sufficiently from speech audiograms. Above the age of 80, speech perception is significantly worse.

  10. Perception audiovisuelle de la parole chez le sourd postlingual implanté cochléaire et le sujet normo-entendant : étude longitudinale psychophysique et neurofonctionnelle

    OpenAIRE

    Rouger, Julien

    2007-01-01

    Our present work consisted in studying perceptual and neuronal mechanisms involved during audiovisual speech perception in postlingually deaf cochlear-implant patients and normally hearing controls. With this aim in view we tested audiovisual performance for implanted patients during longitudinal follow-ups using behavioral methods and positron emission tomography neuroimagery, as well as incongruent audiovisual speech (McGurk paradigm). In order to achieve appropriate comparisons, control su...

  11. The effect of handedness in tactile speech perception.

    Science.gov (United States)

    Sarant, J Z; Cowan, R S; Blamey, P J; Galvin, K L; Clark, G M

    1993-01-01

    This study examined differential performance of normally hearing subjects using a tactile device on the dominant versus non-dominant hand. The study evaluated whether tactual sensitivity for non-speech stimuli was greater for the dominant hand as compared with the non-dominant hand, and secondly, whether there was an advantage for speech presented tactually to the dominant hand, resulting from a preferential pathway to the language processing area in the left cerebral hemisphere. Evaluations of threshold pulse width, dynamic ranges, paired electrode identification, and a closed-set tactual pattern discrimination test battery showed no difference in tactual sensitivity measures between the two hands. Speech perception was assessed with closed sets of vowels and consonants and with open-set Harvey Gardner (HG) words and Arthur Boothroyd (AB) words. Group mean scores were higher in each of the tactually aided conditions as compared with the unaided conditions for speech tests, with the exception of AB words in the tactile plus lip-reading plus audition/lip-reading plus audition condition on the right hand. Overall mean scores on the closed-set vowel test and on open-set HG and AB words were significantly higher for the tactually aided condition as compared with the unaided condition. Comparison of performance between the dominant and non-dominant hand showed a significant advantage for the dominant hand on the closed-set vowel test only. No significant differences between hands in either tactually aided or unaided conditions were evident for any of the other speech perception tests. Factors influencing this result could have been variations in degree of difficulty of the tests, the amount of training subjects received, or the training strategy employed. Although an advantage to presenting speech through the dominant hand may exist, it is unlikely to be great enough to outweigh possible restrictions on everyday use.

  12. How the demographic makeup of our community influences speech perception.

    Science.gov (United States)

    Lev-Ari, Shiri; Peperkamp, Sharon

    2016-06-01

    Speech perception is known to be influenced by listeners' expectations of the speaker. This paper tests whether the demographic makeup of individuals' communities can influence their perception of foreign sounds by influencing their expectations of the language. Using online experiments with participants from all across the U.S. and matched census data on the proportion of Spanish and other foreign language speakers in participants' communities, this paper shows that the demographic makeup of individuals' communities influences their expectations of foreign languages to have an alveolar trill versus a tap (Experiment 1), as well as their consequent perception of these sounds (Experiment 2). Thus, the paper shows that while individuals' expectations of foreign language to have a trill occasionally lead them to misperceive a tap in a foreign language as a trill, a higher proportion of non-trill language speakers in one's community decreases this likelihood. These results show that individuals' environment can influence their perception by shaping their linguistic expectations.

  13. The perception of visible speech: estimation of speech rate and detection of time reversals.

    Science.gov (United States)

    Viviani, Paolo; Figliozzi, Francesca; Lacquaniti, Francesco

    2011-11-01

    Four experiments investigated the perception of visible speech. Experiment 1 addressed the perception of speech rate. Observers were shown video-clips of the lower face of actors speaking at their spontaneous rate. Then, they were shown muted versions of the video-clips, which were either accelerated or decelerated. The task (scaling) was to compare visually the speech rate of the stimulus to the spontaneous rate of the actor being shown. Rate estimates were accurate when the video-clips were shown in the normal direction (forward mode). In contrast, speech rate was underestimated when the video-clips were shown in reverse (backward mode). Experiments 2-4 (2AFC) investigated how accurately one discriminates forward and backward speech movements. Unlike in Experiment 1, observers were never exposed to the sound track of the video-clips. Performance was well above chance when playback mode was crossed with rate modulation, and the number of repetitions of the stimuli allowed some amount of speechreading to take place in forward mode (Experiment 2). In Experiment 3, speechreading was made much more difficult by using a different and larger set of muted video-clips. Yet, accuracy decreased only slightly with respect to Experiment 2. Thus, kinematic rather then speechreading cues are most important for discriminating movement direction. Performance worsened, but remained above chance level when the same stimuli of Experiment 3 were rotated upside down (Experiment 4). We argue that the results are in keeping with the hypothesis that visual perception taps into implicit motor competence. Thus, lawful instances of biological movements (forward stimuli) are processed differently from backward stimuli representing movements that the observer cannot perform.

  14. Speech perception with tactile support in adverse listening conditions

    Science.gov (United States)

    Drullman, Rob; Bronkhorst, Adelbert W.

    2002-05-01

    Since long, different methods of vibrotactile stimulation have been used as an aid for speech perception by some people with severe hearing impairment. The fact that experiments have shown (limited) benefits proves that tactile information can indeed give some support. In our research program on multimodal interfaces, we wondered if normal hearing listeners could benefit from tactile information when speech was presented in adverse listening conditions. Therefore, we set up a pilot experiment with a male speaker against a background of one, two, four or eight competing male speakers or speech noise. Sound was presented diotically to the subjects and the speech-reception threshold (SRT) for short sentences was measured. The temporal envelope (0-30 Hz) of the speech signal was computed in real time and led to the tactile transducer (MiniVib), which was fixed to the index finger. First results show a significant drop in SRT of about 3 dB when using tactile stimulation in the condition of one competing speaker. In the other conditions no significant effects were found, but there is a trend of a decrease of the SRT when tactile information is given. We will discuss the results of further experiments.

  15. Aero-tactile integration in speech perception

    OpenAIRE

    Gick, Bryan; Derrick, Donald

    2009-01-01

    Visual information from a speaker’s face can enhance1 or interfere with2 accurate auditory perception. This integration of information across auditory and visual streams has been observed in functional imaging studies3,4, and has typically been attributed to the frequency and robustness with which perceivers jointly encounter event-specific information from these two modalities5. Adding the tactile modality has long been considered a crucial next step in understanding multisensory integration...

  16. Haptic and visual information speed up the neural processing of auditory speech in live dyadic interactions.

    Science.gov (United States)

    Treille, Avril; Cordeboeuf, Camille; Vilain, Coriandre; Sato, Marc

    2014-05-01

    Speech can be perceived not only by the ear and by the eye but also by the hand, with speech gestures felt from manual tactile contact with the speaker׳s face. In the present electro-encephalographic study, early cross-modal interactions were investigated by comparing auditory evoked potentials during auditory, audio-visual and audio-haptic speech perception in dyadic interactions between a listener and a speaker. In line with previous studies, early auditory evoked responses were attenuated and speeded up during audio-visual compared to auditory speech perception. Crucially, shortened latencies of early auditory evoked potentials were also observed during audio-haptic speech perception. Altogether, these results suggest early bimodal interactions during live face-to-face and hand-to-face speech perception in dyadic interactions. Copyright © 2014. Published by Elsevier Ltd.

  17. Multisensory integration of speech signals: the relationship between space and time.

    Science.gov (United States)

    Jones, Jeffery A; Jarick, Michelle

    2006-10-01

    Integrating audiovisual cues for simple events is affected when sources are separated in space and time. By contrast, audiovisual perception of speech appears resilient when either spatial or temporal disparities exist. We investigated whether speech perception is sensitive to the combination of spatial and temporal inconsistencies. Participants heard the bisyllable /aba/ while seeing a face produce the incongruent bisyllable /ava/. We tested the level of visual influence over auditory perception when the sound was asynchronous with respect to facial motion (from -360 to +360 ms) and emanated from five locations equidistant to the participant. Although an interaction was observed, it was not related to participants' perception of synchrony, nor did it indicate a linear relationship between the effect of spatial and temporal discrepancies. We conclude that either the complexity of the signal or the nature of the task reduces reliance on spatial and temporal contiguity for audiovisual speech perception.

  18. Speech perception studies using a multichannel electrotactile speech processor, residual hearing, and lipreading.

    Science.gov (United States)

    Cowan, R S; Alcantara, J I; Whitford, L A; Blamey, P J; Clark, G M

    1989-06-01

    Three studies are reported on the speech perception of normally hearing and hearing-impaired adults using combinations of visual, auditory, and tactile input. In study 1, mean scores for four normally hearing subjects showed that addition of tactile information, provided through the multichannel electrotactile speech processor, to either audition alone (300-Hz low-pass-filtered speech) or lipreading plus audition resulted in significant improvements in phoneme and word discrimination scores. Information transmission analyses demonstrated the effectiveness of the tactile aid in providing cues to duration, F1 and F2 features for vowels, and manner of articulation features for consonants, especially features requiring detection and discrimination of high-frequency information. In study 2, six different cutoff frequencies were used for a low-pass-filtered auditory signal. Mean scores for vowel and consonant identification were significantly higher with the addition of tactile input to audition alone at each cutoff frequency up to 1500 Hz. The mean speechtracking rate was also significantly increased by the additional tactile input up to 1500 Hz. Study 3 examined speech discrimination of three hearing-impaired adults. Additional information available through the tactile aid was shown to improve speech discrimination scores; however, the degree of increase was inversely related to the level of residual hearing. Results indicate that the electrotactile aid may be useful for patients with little residual hearing and for the severely to profoundly hearing impaired, who could benefit from the high-frequency information presented through the tactile modality, but unavailable through hearing aids.

  19. Trainable Videorealistic Speech Animation

    Science.gov (United States)

    2006-01-01

    in movies; virtual avatars in chatrooms; very low bitrate coding schemes (such as MPEG4); and studies of visual speech production and perception . The...audiovisual corpus of a human subject uttering various utter- ances was recorded. Recording was performed at a TV studio against a blue “ chroma -key... lighting conditions, and 3) changes in viewpoint. All these limi- tations can be alleviated by extending our approach from 2D to 3D. It is possible to

  20. Perception and Temporal Properties of Speech

    Science.gov (United States)

    1990-07-26

    23 13 4 Perception and the Temporal Properties of Spech _ 2. PtRSONAL AUTHORIS| Peter C. Gordon A1*.eF’oRA_ E eO o 13b TIME COvERED 114, oAT o,.EFO. o... free . My philosophy professors con/STRUCT treehouses in their spare time. The physics papers ab/STRACT new predictions from Einstein’s laws. Ambiguous...conch shell. Dunkin Donut’s RE/fills of coffee are free . My philosophy professor’s CON/struct is inherently flawed. The physics paper’s AB/stract was hard

  1. Critique: auditory form and gestural topology in the perception of speech.

    Science.gov (United States)

    Remez, R E

    1996-03-01

    Some influential accounts of speech perception have asserted that the goal of perception is to recover the articulatory gestures that create the acoustic signal, while others have proposed that speech perception proceeds by a method of acoustic categorization of signal elements. These accounts have been frustrated by difficulties in identifying a set of primitive articulatory constituents underlying speech production, and a set of primitive acoustic-auditory elements underlying speech perception. An argument by Lindblom favors an account of production and perception based on the auditory form of speech and its cognitive elaboration, rejecting the aim of defining a set of articulatory primitives by appealing to theoretical principle, while recognizing the empirical difficulty of identifying a set of acoustic or auditory primitives. An examination of this thesis found opportunities to defend some of its conclusions with independent evidence, but favors a characterization of the constituents of speech perception as linguistic rather than as articulatory or acoustic.

  2. The relationship of speech intelligibility with hearing sensitivity, cognition, and perceived hearing difficulties varies for different speech perception tests

    Directory of Open Access Journals (Sweden)

    Antje eHeinrich

    2015-06-01

    Full Text Available Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests.Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study.Forty-four listeners aged between 50-74 years with mild SNHL were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet, to medium (digit triplet perception in speech-shaped noise to high (sentence perception in modulated noise; cognitive tests of attention, memory, and nonverbal IQ; and self-report questionnaires of general health-related and hearing-specific quality of life.Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that auditory environments pose on

  3. The relationship of speech intelligibility with hearing sensitivity, cognition, and perceived hearing difficulties varies for different speech perception tests

    Science.gov (United States)

    Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.

    2015-01-01

    Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests. Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study. Forty-four listeners aged between 50 and 74 years with mild sensorineural hearing loss were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet), to medium (digit triplet perception in speech-shaped noise) to high (sentence perception in modulated noise); cognitive tests of attention, memory, and non-verbal intelligence quotient; and self-report questionnaires of general health-related and hearing-specific quality of life. Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that

  4. Spectrotemporal Modulation Detection and Speech Perception by Cochlear Implant Users.

    Directory of Open Access Journals (Sweden)

    Jong Ho Won

    Full Text Available Spectrotemporal modulation (STM detection performance was examined for cochlear implant (CI users. The test involved discriminating between an unmodulated steady noise and a modulated stimulus. The modulated stimulus presents frequency modulation patterns that change in frequency over time. In order to examine STM detection performance for different modulation conditions, two different temporal modulation rates (5 and 10 Hz and three different spectral modulation densities (0.5, 1.0, and 2.0 cycles/octave were employed, producing a total 6 different STM stimulus conditions. In order to explore how electric hearing constrains STM sensitivity for CI users differently from acoustic hearing, normal-hearing (NH and hearing-impaired (HI listeners were also tested on the same tasks. STM detection performance was best in NH subjects, followed by HI subjects. On average, CI subjects showed poorest performance, but some CI subjects showed high levels of STM detection performance that was comparable to acoustic hearing. Significant correlations were found between STM detection performance and speech identification performance in quiet and in noise. In order to understand the relative contribution of spectral and temporal modulation cues to speech perception abilities for CI users, spectral and temporal modulation detection was performed separately and related to STM detection and speech perception performance. The results suggest that that slow spectral modulation rather than slow temporal modulation may be important for determining speech perception capabilities for CI users. Lastly, test-retest reliability for STM detection was good with no learning. The present study demonstrates that STM detection may be a useful tool to evaluate the ability of CI sound processing strategies to deliver clinically pertinent acoustic modulation information.

  5. Spectrotemporal Modulation Detection and Speech Perception by Cochlear Implant Users.

    Science.gov (United States)

    Won, Jong Ho; Moon, Il Joon; Jin, Sunhwa; Park, Heesung; Woo, Jihwan; Cho, Yang-Sun; Chung, Won-Ho; Hong, Sung Hwa

    2015-01-01

    Spectrotemporal modulation (STM) detection performance was examined for cochlear implant (CI) users. The test involved discriminating between an unmodulated steady noise and a modulated stimulus. The modulated stimulus presents frequency modulation patterns that change in frequency over time. In order to examine STM detection performance for different modulation conditions, two different temporal modulation rates (5 and 10 Hz) and three different spectral modulation densities (0.5, 1.0, and 2.0 cycles/octave) were employed, producing a total 6 different STM stimulus conditions. In order to explore how electric hearing constrains STM sensitivity for CI users differently from acoustic hearing, normal-hearing (NH) and hearing-impaired (HI) listeners were also tested on the same tasks. STM detection performance was best in NH subjects, followed by HI subjects. On average, CI subjects showed poorest performance, but some CI subjects showed high levels of STM detection performance that was comparable to acoustic hearing. Significant correlations were found between STM detection performance and speech identification performance in quiet and in noise. In order to understand the relative contribution of spectral and temporal modulation cues to speech perception abilities for CI users, spectral and temporal modulation detection was performed separately and related to STM detection and speech perception performance. The results suggest that that slow spectral modulation rather than slow temporal modulation may be important for determining speech perception capabilities for CI users. Lastly, test-retest reliability for STM detection was good with no learning. The present study demonstrates that STM detection may be a useful tool to evaluate the ability of CI sound processing strategies to deliver clinically pertinent acoustic modulation information.

  6. Visual Feedback of Tongue Movement for Novel Speech Sound Learning

    OpenAIRE

    Katz, William F.; Sonya eMehta

    2015-01-01

    Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV) information. Second language (L2) learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals). However, little is known about the role of viewing one's own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker's l...

  7. Subcortical Differentiation of Stop Consonants Relates to Reading and Speech-in-Noise Perception

    National Research Council Canada - National Science Library

    Jane Hornickel; Erika Skoe; Trent Nicol; Steven Zecker; Nina Kraus; Michael M. Merzenich

    2009-01-01

    Children with reading impairments have deficits in phonological awareness, phonemic categorization, speech-in-noise perception, and psychophysical tasks such as frequency and temporal discrimination...

  8. Bridging music and speech rhythm: rhythmic priming and audio-motor training affect speech perception.

    Science.gov (United States)

    Cason, Nia; Astésano, Corine; Schön, Daniele

    2015-02-01

    Following findings that musical rhythmic priming enhances subsequent speech perception, we investigated whether rhythmic priming for spoken sentences can enhance phonological processing - the building blocks of speech - and whether audio-motor training enhances this effect. Participants heard a metrical prime followed by a sentence (with a matching/mismatching prosodic structure), for which they performed a phoneme detection task. Behavioural (RT) data was collected from two groups: one who received audio-motor training, and one who did not. We hypothesised that 1) phonological processing would be enhanced in matching conditions, and 2) audio-motor training with the musical rhythms would enhance this effect. Indeed, providing a matching rhythmic prime context resulted in faster phoneme detection, thus revealing a cross-domain effect of musical rhythm on phonological processing. In addition, our results indicate that rhythmic audio-motor training enhances this priming effect. These results have important implications for rhythm-based speech therapies, and suggest that metrical rhythm in music and speech may rely on shared temporal processing brain resources. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Deficits in speech perception predict language learning impairment.

    Science.gov (United States)

    Ziegler, Johannes C; Pech-Georgel, Catherine; George, Florence; Alario, F-Xavier; Lorenzi, Christian

    2005-09-27

    Specific language impairment (SLI) is one of the most common childhood disorders, affecting 7% of children. These children experience difficulties in understanding and producing spoken language despite normal intelligence, normal hearing, and normal opportunities to learn language. The causes of SLI are still hotly debated, ranging from nonlinguistic deficits in auditory perception to high-level deficits in grammar. Here, we show that children with SLI have poorer-than-normal consonant identification when measured in ecologically valid conditions of stationary or fluctuating masking noise. The deficits persisted even in comparison with a younger group of normally developing children who were matched for language skills. This finding points to a fundamental deficit. Information transmission of all phonetic features (voicing, place, and manner) was impaired, although the deficits were strongest for voicing (e.g., difference between/b/and/p/). Children with SLI experienced perfectly normal "release from masking" (better identification in fluctuating than in stationary noise), which indicates a central deficit in feature extraction rather than deficits in low-level, temporal, and spectral auditory capacities. We further showed that speech identification in noise predicted language impairment to a great extent within the group of children with SLI and across all participants. Previous research might have underestimated this important link, possibly because speech perception has typically been investigated in optimal listening conditions using non-speech material. The present study suggests that children with SLI learn language deviantly because they inefficiently extract and manipulate speech features, in particular, voicing. This result offers new directions for the fast diagnosis and remediation of SLI.

  10. Only Behavioral But Not Self-Report Measures of Speech Perception Correlate with Cognitive Abilities

    Science.gov (United States)

    Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.

    2016-01-01

    Good speech perception and communication skills in everyday life are crucial for participation and well-being, and are therefore an overarching aim of auditory rehabilitation. Both behavioral and self-report measures can be used to assess these skills. However, correlations between behavioral and self-report speech perception measures are often low. One possible explanation is that there is a mismatch between the specific situations used in the assessment of these skills in each method, and a more careful matching across situations might improve consistency of results. The role that cognition plays in specific speech situations may also be important for understanding communication, as speech perception tests vary in their cognitive demands. In this study, the role of executive function, working memory (WM) and attention in behavioral and self-report measures of speech perception was investigated. Thirty existing hearing aid users with mild-to-moderate hearing loss aged between 50 and 74 years completed a behavioral test battery with speech perception tests ranging from phoneme discrimination in modulated noise (easy) to words in multi-talker babble (medium) and keyword perception in a carrier sentence against a distractor voice (difficult). In addition, a self-report measure of aided communication, residual disability from the Glasgow Hearing Aid Benefit Profile, was obtained. Correlations between speech perception tests and self-report measures were higher when specific speech situations across both were matched. Cognition correlated with behavioral speech perception test results but not with self-report. Only the most difficult speech perception test, keyword perception in a carrier sentence with a competing distractor voice, engaged executive functions in addition to WM. In conclusion, any relationship between behavioral and self-report speech perception is not mediated by a shared correlation with cognition. PMID:27242564

  11. Speech Perception Deficits in Mandarin-Speaking School-Aged Children with Poor Reading Comprehension

    Directory of Open Access Journals (Sweden)

    Huei-Mei Liu

    2017-12-01

    Full Text Available Previous studies have shown that children learning alphabetic writing systems who have language impairment or dyslexia exhibit speech perception deficits. However, whether such deficits exist in children learning logographic writing systems who have poor reading comprehension remains uncertain. To further explore this issue, the present study examined speech perception deficits in Mandarin-speaking children with poor reading comprehension. Two self-designed tasks, consonant categorical perception task and lexical tone discrimination task were used to compare speech perception performance in children (n = 31, age range = 7;4–10;2 with poor reading comprehension and an age-matched typically developing group (n = 31, age range = 7;7–9;10. Results showed that the children with poor reading comprehension were less accurate in consonant and lexical tone discrimination tasks and perceived speech contrasts less categorically than the matched group. The correlations between speech perception skills (i.e., consonant and lexical tone discrimination sensitivities and slope of consonant identification curve and individuals’ oral language and reading comprehension were stronger than the correlations between speech perception ability and word recognition ability. In conclusion, the results revealed that Mandarin-speaking children with poor reading comprehension exhibit less-categorized speech perception, suggesting that imprecise speech perception, especially lexical tone perception, is essential to account for reading learning difficulties in Mandarin-speaking children.

  12. Speech Perception Deficits in Mandarin-Speaking School-Aged Children with Poor Reading Comprehension

    Science.gov (United States)

    Liu, Huei-Mei; Tsao, Feng-Ming

    2017-01-01

    Previous studies have shown that children learning alphabetic writing systems who have language impairment or dyslexia exhibit speech perception deficits. However, whether such deficits exist in children learning logographic writing systems who have poor reading comprehension remains uncertain. To further explore this issue, the present study examined speech perception deficits in Mandarin-speaking children with poor reading comprehension. Two self-designed tasks, consonant categorical perception task and lexical tone discrimination task were used to compare speech perception performance in children (n = 31, age range = 7;4–10;2) with poor reading comprehension and an age-matched typically developing group (n = 31, age range = 7;7–9;10). Results showed that the children with poor reading comprehension were less accurate in consonant and lexical tone discrimination tasks and perceived speech contrasts less categorically than the matched group. The correlations between speech perception skills (i.e., consonant and lexical tone discrimination sensitivities and slope of consonant identification curve) and individuals’ oral language and reading comprehension were stronger than the correlations between speech perception ability and word recognition ability. In conclusion, the results revealed that Mandarin-speaking children with poor reading comprehension exhibit less-categorized speech perception, suggesting that imprecise speech perception, especially lexical tone perception, is essential to account for reading learning difficulties in Mandarin-speaking children.

  13. Tactile Aids for Speech Perception and Production by Hearing-Impaired People.

    Science.gov (United States)

    Weisenberger, Janet

    1989-01-01

    Laboratory results are presented which suggest that hearing-impaired individuals' speech perception can be enhanced through use of tactile aids with a number of tactile transducers conveying information about the spectral content of the speech signal, and speech production can be improved through experience using a multichannel tactile aid.…

  14. Children's Perception of Speech Produced in a Two-Talker Background

    Science.gov (United States)

    Baker, Mallory; Buss, Emily; Jacks, Adam; Taylor, Crystal; Leibold, Lori J.

    2014-01-01

    Purpose: This study evaluated the degree to which children benefit from the acoustic modifications made by talkers when they produce speech in noise. Method: A repeated measures design compared the speech perception performance of children (5-11 years) and adults in a 2-talker masker. Target speech was produced in a 2-talker background or in…

  15. Mapping the speech code: Cortical responses linking the perception and production of vowels

    NARCIS (Netherlands)

    Schuerman, W.L.; Meyer, A.S.; McQueen, J.M.

    2017-01-01

    The acoustic realization of speech is constrained by the physical mechanisms by which it is produced. Yet for speech perception, the degree to which listeners utilize experience derived from speech production has long been debated. In the present study, we examined how sensorimotor adaptation during

  16. Early Language Development of Children at Familial Risk of Dyslexia: Speech Perception and Production

    Science.gov (United States)

    Gerrits, Ellen; de Bree, Elise

    2009-01-01

    Speech perception and speech production were examined in 3-year-old Dutch children at familial risk of developing dyslexia. Their performance in speech sound categorisation and their production of words was compared to that of age-matched children with specific language impairment (SLI) and typically developing controls. We found that speech…

  17. Cholinergic Potentiation and Audiovisual Repetition-Imitation Therapy Improve Speech Production and Communication Deficits in a Person with Crossed Aphasia by Inducing Structural Plasticity in White Matter Tracts.

    Science.gov (United States)

    Berthier, Marcelo L; De-Torres, Irene; Paredes-Pacheco, José; Roé-Vellvé, Núria; Thurnhofer-Hemsi, Karl; Torres-Prioris, María J; Alfaro, Francisco; Moreno-Torres, Ignacio; López-Barroso, Diana; Dávila, Guadalupe

    2017-01-01

    Donepezil (DP), a cognitive-enhancing drug targeting the cholinergic system, combined with massed sentence repetition training augmented and speeded up recovery of speech production deficits in patients with chronic conduction aphasia and extensive left hemisphere infarctions (Berthier et al., 2014). Nevertheless, a still unsettled question is whether such improvements correlate with restorative structural changes in gray matter and white matter pathways mediating speech production. In the present study, we used pharmacological magnetic resonance imaging to study treatment-induced brain changes in gray matter and white matter tracts in a right-handed male with chronic conduction aphasia and a right subcortical lesion (crossed aphasia). A single-patient, open-label multiple-baseline design incorporating two different treatments and two post-treatment evaluations was used. The patient received an initial dose of DP (5 mg/day) which was maintained during 4 weeks and then titrated up to 10 mg/day and administered alone (without aphasia therapy) during 8 weeks (Endpoint 1). Thereafter, the drug was combined with an audiovisual repetition-imitation therapy (Look-Listen-Repeat, LLR) during 3 months (Endpoint 2). Language evaluations, diffusion weighted imaging (DWI), and voxel-based morphometry (VBM) were performed at baseline and at both endpoints in JAM and once in 21 healthy control males. Treatment with DP alone and combined with LLR therapy induced marked improvement in aphasia and communication deficits as well as in selected measures of connected speech production, and phrase repetition. The obtained gains in speech production remained well-above baseline scores even 4 months after ending combined therapy. Longitudinal DWI showed structural plasticity in the right frontal aslant tract and direct segment of the arcuate fasciculus with both interventions. VBM revealed no structural changes in other white matter tracts nor in cortical areas linked by these tracts. In

  18. Endogenous cortical rhythms determine cerebral specialization for speech perception and production

    DEFF Research Database (Denmark)

    Giraud, Anne-Lise; Kleinschmidt, Andreas; Poeppel, David

    2007-01-01

    Across multiple timescales, acoustic regularities of speech match rhythmic properties of both the auditory and motor systems. Syllabic rate corresponds to natural jaw-associated oscillatory rhythms, and phonemic length could reflect endogenous oscillatory auditory cortical properties. Hemispheric...... lateralization for speech could result from an asymmetry of cortical tuning, with left and right auditory areas differentially sensitive to spectro-temporal features of speech. Using simultaneous electroencephalographic (EEG) and functional magnetic resonance imaging (fMRI) recordings from humans, we show......, indicating coupling between temporal properties of speech perception and production. These data show that endogenous cortical rhythms provide temporal and spatial constraints on the neuronal mechanisms underlying speech perception and production....

  19. Audiovisual perception in adults with amblyopia: a study using the McGurk effect.

    Science.gov (United States)

    Narinesingh, Cindy; Wan, Michael; Goltz, Herbert C; Chandrakumar, Manokaraananthan; Wong, Agnes M F

    2014-04-24

    The effects on multisensory integration have rarely been examined in amblyopia. The McGurk effect is a well-established audiovisual illusion that is manifested when an auditory phoneme is presented concurrently with an incongruent visual phoneme. Visually healthy viewers will hear a phoneme that does not match the actual auditory stimulus, having been perceptually influenced by the visual phoneme. This study examines audiovisual integration in adults with amblyopia. Twenty-two subjects with amblyopia and 25 visually healthy controls participated. Participants viewed videos of combinations of visual and auditory phonemes, and were asked to report what they heard. Some videos had congruent video and audio (control), whereas others had incongruent video and audio (McGurk). The McGurk effect is strongest when the visual phoneme dominates over the audio phoneme, resulting in low auditory accuracy on the task. Adults with amblyopia demonstrated a weaker McGurk effect than visually healthy controls (P = 0.01). The difference was greatest when viewing monocularly with the amblyopic eye, and it was also evident when viewing binocularly or monocularly with the fellow eye. No correlations were found between the strength of the McGurk effect and either visual acuity or stereoacuity in subjects with amblyopia. Subjects with amblyopia and controls showed a similar response pattern to different speakers and syllables, and subjects with amblyopia consistently demonstrated a weaker effect than controls. Abnormal visual experience early in life can have negative consequences for audiovisual integration that persists into adulthood in people with amblyopia. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  20. On the Perception of Speech Sounds as Biologically Significant Signals1,2

    Science.gov (United States)

    Pisoni, David B.

    2012-01-01

    This paper reviews some of the major evidence and arguments currently available to support the view that human speech perception may require the use of specialized neural mechanisms for perceptual analysis. Experiments using synthetically produced speech signals with adults are briefly summarized and extensions of these results to infants and other organisms are reviewed with an emphasis towards detailing those aspects of speech perception that may require some need for specialized species-specific processors. Finally, some comments on the role of early experience in perceptual development are provided as an attempt to identify promising areas of new research in speech perception. PMID:399200

  1. The Effect of Technology and Testing Environment on Speech Perception Using Telehealth with Cochlear Implant Recipients

    Science.gov (United States)

    Goehring, Jenny L.; Hughes, Michelle L.; Baudhuin, Jacquelyn L.; Valente, Daniel L.; McCreery, Ryan W.; Diaz, Gina R.; Sanford, Todd; Harpster, Roger

    2012-01-01

    Purpose: In this study, the authors evaluated the effect of remote system and acoustic environment on speech perception via telehealth with cochlear implant recipients. Method: Speech perception was measured in quiet and in noise. Systems evaluated were Polycom visual concert (PVC) and a hybrid presentation system (HPS). Each system was evaluated…

  2. The Role of Broca's Area in Speech Perception: Evidence from Aphasia Revisited

    Science.gov (United States)

    Hickok, Gregory; Costanzo, Maddalena; Capasso, Rita; Miceli, Gabriele

    2011-01-01

    Motor theories of speech perception have been re-vitalized as a consequence of the discovery of mirror neurons. Some authors have even promoted a strong version of the motor theory, arguing that the motor speech system is critical for perception. Part of the evidence that is cited in favor of this claim is the observation from the early 1980s that…

  3. Speech Perception in Noise Deficits in Japanese Children with Reading Difficulties: Effects of Presentation Rate

    Science.gov (United States)

    Inoue, Tomohiro; Higashibara, Fumiko; Okazaki, Shinji; Maekawa, Hisao

    2011-01-01

    We examined the effects of presentation rate on speech perception in noise and its relation to reading in 117 typically developing (TD) children and 10 children with reading difficulties (RD) in Japan. Responses in a speech perception task were measured for speed, accuracy, and stability in two conditions that varied stimulus presentation rate:…

  4. Noise on, Voicing off: Speech Perception Deficits in Children with Specific Language Impairment

    Science.gov (United States)

    Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian

    2011-01-01

    Speech perception of four phonetic categories (voicing, place, manner, and nasality) was investigated in children with specific language impairment (SLI) (n=20) and age-matched controls (n=19) in quiet and various noise conditions using an AXB two-alternative forced-choice paradigm. Children with SLI exhibited robust speech perception deficits in…

  5. Speech Perception, Metalinguistic Awareness, Reading, and Vocabulary in Chinese-English Bilingual Children

    Science.gov (United States)

    Cheung, Him; Chung, Kevin Kien Hoa; Wong, Simpson Wai Lap; McBride-Chang, Catherine; Penney, Trevor Bruce; Ho, Connie Suk-Han

    2010-01-01

    In this study, we examined the intercorrelations among speech perception, metalinguistic (i.e., phonological and morphological) awareness, word reading, and vocabulary in a 1st language (L1) and a 2nd language (L2). Results from 3 age groups of Chinese-English bilingual children showed that speech perception was more predictive of reading and…

  6. The Development of the Mealings, Demuth, Dillon, and Buchholz Classroom Speech Perception Test

    Science.gov (United States)

    Mealings, Kiri T.; Demuth, Katherine; Buchholz, Jörg; Dillon, Harvey

    2015-01-01

    Purpose: Open-plan classroom styles are increasingly being adopted in Australia despite evidence that their high intrusive noise levels adversely affect learning. The aim of this study was to develop a new Australian speech perception task (the Mealings, Demuth, Dillon, and Buchholz Classroom Speech Perception Test) and use it in an open-plan…

  7. Speech perception using combinations of auditory, visual, and tactile information.

    Science.gov (United States)

    Blamey, P J; Cowan, R S; Alcantara, J I; Whitford, L A; Clark, G M

    1989-01-01

    Four normally-hearing subjects were trained and tested with all combinations of a highly-degraded auditory input, a visual input via lipreading, and a tactile input using a multichannel electrotactile speech processor. The speech perception of the subjects was assessed with closed sets of vowels, consonants, and multisyllabic words; with open sets of words and sentences, and with speech tracking. When the visual input was added to any combination of other inputs, a significant improvement occurred for every test. Similarly, the auditory input produced a significant improvement for all tests except closed-set vowel recognition. The tactile input produced scores that were significantly greater than chance in isolation, but combined less effectively with the other modalities. The addition of the tactile input did produce significant improvements for vowel recognition in the auditory-tactile condition, for consonant recognition in the auditory-tactile and visual-tactile conditions, and in open-set word recognition in the visual-tactile condition. Information transmission analysis of the features of vowels and consonants indicated that the information from auditory and visual inputs were integrated much more effectively than information from the tactile input. The less effective combination might be due to lack of training with the tactile input, or to more fundamental limitations in the processing of multimodal stimuli.

  8. Cross-Cultural Variation of Politeness Orientation & Speech Act Perception

    Directory of Open Access Journals (Sweden)

    Nisreen Naji Al-Khawaldeh

    2013-05-01

    Full Text Available This paper presents the findings of an empirical study which compares Jordanian and English native speakers’ perceptions about the speech act of thanking. The forty interviews conducted revealed some similarities but also of remarkable cross-cultural differences relating to the significance of thanking, the variables affecting it, and the appropriate linguistic and paralinguistic choices, as well as their impact on the interpretation of thanking behaviour. The most important theoretical finding is that the data, while consistent with many views found in the existing literature, do not support Brown and Levinson’s (1987 claim that thanking is a speech act which intrinsically threatens the speaker’s negative face because it involves overt acceptance of an imposition on the speaker.  Rather, thanking should be viewed as a means of establishing and sustaining social relationships. The study findings suggest that cultural variation in thanking is due to the high degree of sensitivity of this speech act to the complex interplay of a range of social and contextual variables, and point to some promising directions for further research.

  9. Tactile enhancement of auditory and visual speech perception in untrained perceivers

    OpenAIRE

    Gick, Bryan; Jóhannsdóttir, Kristín M.; Gibraiel, Diana; Mühlbauer, Jeff

    2008-01-01

    A single pool of untrained subjects was tested for interactions across two bimodal perception conditions: audio-tactile, in which subjects heard and felt speech, and visual-tactile, in which subjects saw and felt speech. Identifications of English obstruent consonants were compared in bimodal and no-tactile baseline conditions. Results indicate that tactile information enhances speech perception by about 10 percent, regardless of which other mode (auditory or visual) is active. However, withi...

  10. Oscillation encoding of individual differences in speech perception.

    Science.gov (United States)

    Jin, Yu; Díaz, Begoña; Colomer, Marc; Sebastián-Gallés, Núria

    2014-01-01

    Individual differences in second language (L2) phoneme perception (within the normal population) have been related to speech perception abilities, also observed in the native language, in studies assessing the electrophysiological response mismatch negativity (MMN). Here, we investigate the brain oscillatory dynamics in the theta band, the spectral correlate of the MMN, that underpin success in phoneme learning. Using previous data obtained in an MMN paradigm, the dynamics of cortical oscillations while perceiving native and unknown phonemes and nonlinguistic stimuli were studied in two groups of participants classified as good and poor perceivers (GPs and PPs), according to their L2 phoneme discrimination abilities. The results showed that for GPs, as compared to PPs, processing of a native phoneme change produced a significant increase in theta power. Stimulus time-locked analysis event-related spectral perturbation (ERSP) showed differences for the theta band within the MMN time window (between 70 and 240 ms) for the native deviant phoneme. No other significant difference between the two groups was observed for the other phoneme or nonlinguistic stimuli. The dynamic patterns in the theta-band may reflect early automatic change detection for familiar speech sounds in the brain. The behavioral differences between the two groups may reflect individual variations in activating brain circuits at a perceptual level.

  11. Oscillation encoding of individual differences in speech perception.

    Directory of Open Access Journals (Sweden)

    Yu Jin

    Full Text Available Individual differences in second language (L2 phoneme perception (within the normal population have been related to speech perception abilities, also observed in the native language, in studies assessing the electrophysiological response mismatch negativity (MMN. Here, we investigate the brain oscillatory dynamics in the theta band, the spectral correlate of the MMN, that underpin success in phoneme learning. Using previous data obtained in an MMN paradigm, the dynamics of cortical oscillations while perceiving native and unknown phonemes and nonlinguistic stimuli were studied in two groups of participants classified as good and poor perceivers (GPs and PPs, according to their L2 phoneme discrimination abilities. The results showed that for GPs, as compared to PPs, processing of a native phoneme change produced a significant increase in theta power. Stimulus time-locked analysis event-related spectral perturbation (ERSP showed differences for the theta band within the MMN time window (between 70 and 240 ms for the native deviant phoneme. No other significant difference between the two groups was observed for the other phoneme or nonlinguistic stimuli. The dynamic patterns in the theta-band may reflect early automatic change detection for familiar speech sounds in the brain. The behavioral differences between the two groups may reflect individual variations in activating brain circuits at a perceptual level.

  12. Music training and speech perception: a gene-environment interaction.

    Science.gov (United States)

    Schellenberg, E Glenn

    2015-03-01

    Claims of beneficial side effects of music training are made for many different abilities, including verbal and visuospatial abilities, executive functions, working memory, IQ, and speech perception in particular. Such claims assume that music training causes the associations even though children who take music lessons are likely to differ from other children in music aptitude, which is associated with many aspects of speech perception. Music training in childhood is also associated with cognitive, personality, and demographic variables, and it is well established that IQ and personality are determined largely by genetics. Recent evidence also indicates that the role of genetics in music aptitude and music achievement is much larger than previously thought. In short, music training is an ideal model for the study of gene-environment interactions but far less appropriate as a model for the study of plasticity. Children seek out environments, including those with music lessons, that are consistent with their predispositions; such environments exaggerate preexisting individual differences. © 2015 New York Academy of Sciences.

  13. Relations between psychophysical data and speech perception for hearing-impaired subjects. II

    NARCIS (Netherlands)

    Dreschler, W. A.; Plomp, R.

    1985-01-01

    Twenty-one sensorineurally hearing-impaired adolescents were studied with an extensive battery of tone-perception, phoneme-perception, and speech-perception tests. Tests on loudness perception, frequency selectivity, and temporal resolution at the test frequencies of 500, 1000, and 2000 Hz were

  14. The Role of Categorical Speech Perception and Phonological Processing in Familial Risk Children with and without Dyslexia

    Science.gov (United States)

    Hakvoort, Britt; de Bree, Elise; van der Leij, Aryan; Maassen, Ben; van Setten, Ellie; Maurits, Natasha; van Zuijen, Titia L.

    2016-01-01

    Purpose: This study assessed whether a categorical speech perception (CP) deficit is associated with dyslexia or familial risk for dyslexia, by exploring a possible cascading relation from speech perception to phonology to reading and by identifying whether speech perception distinguishes familial risk (FR) children with dyslexia (FRD) from those…

  15. Early Postimplant Speech Perception and Language Skills Predict Long-Term Language and Neurocognitive Outcomes Following Pediatric Cochlear Implantation

    Science.gov (United States)

    Hunter, Cynthia R.; Kronenberger, William G.; Castellanos, Irina; Pisoni, David B.

    2017-01-01

    Purpose: We sought to determine whether speech perception and language skills measured early after cochlear implantation in children who are deaf, and early postimplant growth in speech perception and language skills, predict long-term speech perception, language, and neurocognitive outcomes. Method: Thirty-six long-term users of cochlear…

  16. Functional overlap between regions involved in speech perception and in monitoring one's own voice during speech production.

    Science.gov (United States)

    Zheng, Zane Z; Munhall, Kevin G; Johnsrude, Ingrid S

    2010-08-01

    The fluency and the reliability of speech production suggest a mechanism that links motor commands and sensory feedback. Here, we examined the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not and by examining the overlap with the network recruited during passive listening to speech sounds. We used real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word ("Ted") and either heard this clearly or heard voice-gated masking noise. We compared this to when they listened to yoked stimuli (identical recordings of "Ted" or noise) without speaking. Activity along the STS and superior temporal gyrus bilaterally was significantly greater if the auditory stimulus was (a) processed as the auditory concomitant of speaking and (b) did not match the predicted outcome (noise). The network exhibiting this Feedback Type x Production/Perception interaction includes a superior temporal gyrus/middle temporal gyrus region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts and that processes an error signal in speech-sensitive regions when this and the sensory data do not match.

  17. Evaluation of Speech-Perception Training for Hearing Aid Users: A Multisite Study in Progress

    OpenAIRE

    Miller, James D.; Watson, Charles S.; Dubno, Judy R.; Leek, Marjorie R.

    2015-01-01

    Following an overview of theoretical issues in speech-perception training and of previous efforts to enhance hearing aid use through training, a multisite study, designed to evaluate the efficacy of two types of computerized speech-perception training for adults who use hearing aids, is described. One training method focuses on the identification of 109 syllable constituents (45 onsets, 28 nuclei, and 36 codas) in quiet and in noise, and on the perception of words in sentences presented in va...

  18. Adaptation to delayed auditory feedback induces the temporal recalibration effect in both speech perception and production.

    Science.gov (United States)

    Yamamoto, Kosuke; Kawabata, Hideaki

    2014-12-01

    We ordinarily speak fluently, even though our perceptions of our own voices are disrupted by various environmental acoustic properties. The underlying mechanism of speech is supposed to monitor the temporal relationship between speech production and the perception of auditory feedback, as suggested by a reduction in speech fluency when the speaker is exposed to delayed auditory feedback (DAF). While many studies have reported that DAF influences speech motor processing, its relationship to the temporal tuning effect on multimodal integration, or temporal recalibration, remains unclear. We investigated whether the temporal aspects of both speech perception and production change due to adaptation to the delay between the motor sensation and the auditory feedback. This is a well-used method of inducing temporal recalibration. Participants continually read texts with specific DAF times in order to adapt to the delay. Then, they judged the simultaneity between the motor sensation and the vocal feedback. We measured the rates of speech with which participants read the texts in both the exposure and re-exposure phases. We found that exposure to DAF changed both the rate of speech and the simultaneity judgment, that is, participants' speech gained fluency. Although we also found that a delay of 200 ms appeared to be most effective in decreasing the rates of speech and shifting the distribution on the simultaneity judgment, there was no correlation between these measurements. These findings suggest that both speech motor production and multimodal perception are adaptive to temporal lag but are processed in distinct ways.

  19. The relationship of phonological ability, speech perception, and auditory perception in adults with dyslexia

    OpenAIRE

    Law, Jeremy M.; Vandermosten, Maaike; Ghesquiere, Pol; Wouters, Jan

    2014-01-01

    This study investigated whether auditory, speech perception, and phonological skills are tightly interrelated or independently contributing to reading. We assessed each of these three skills in 36 adults with a past diagnosis of dyslexia and 54 matched normal reading adults. Phonological skills were tested by the typical threefold tasks, i.e., rapid automatic naming, verbal short-term memory and phonological awareness. Dynamic auditory processing skills were assessed by means of a frequency m...

  20. The relationship of phonological ability, speech perception and auditory perception in adults with dyslexia.

    OpenAIRE

    Jeremy eLaw; Maaike eVandermosten; Pol eGhesquiere; Jan eWouters

    2014-01-01

    This study investigated whether auditory, speech perception and phonological skills are tightly interrelated or independently contributing to reading. We assessed each of these three skills in 36 adults with a past diagnosis of dyslexia and 54 matched normal reading adults. Phonological skills were tested by the typical threefold tasks, i.e. rapid automatic naming, verbal short term memory and phonological awareness. Dynamic auditory processing skills were assessed by means of a frequency mod...

  1. No evidence of somatotopic place of articulation feature mapping in motor cortex during passive speech perception.

    Science.gov (United States)

    Arsenault, Jessica S; Buchsbaum, Bradley R

    2016-08-01

    The motor theory of speech perception has experienced a recent revival due to a number of studies implicating the motor system during speech perception. In a key study, Pulvermüller et al. (2006) showed that premotor/motor cortex differentially responds to the passive auditory perception of lip and tongue speech sounds. However, no study has yet attempted to replicate this important finding from nearly a decade ago. The objective of the current study was to replicate the principal finding of Pulvermüller et al. (2006) and generalize it to a larger set of speech tokens while applying a more powerful statistical approach using multivariate pattern analysis (MVPA). Participants performed an articulatory localizer as well as a speech perception task where they passively listened to a set of eight syllables while undergoing fMRI. Both univariate and multivariate analyses failed to find evidence for somatotopic coding in motor or premotor cortex during speech perception. Positive evidence for the null hypothesis was further confirmed by Bayesian analyses. Results consistently show that while the lip and tongue areas of the motor cortex are sensitive to movements of the articulators, they do not appear to preferentially respond to labial and alveolar speech sounds during passive speech perception.

  2. Perceived synchrony for realistic and dynamic audiovisual events

    Directory of Open Access Journals (Sweden)

    Ragnhild eEg

    2015-06-01

    Full Text Available In well-controlled laboratory experiments, researchers have found that humans can perceive delays between auditory and visual signals as short as 20 ms. Conversely, other experiments have shown that humans can tolerate audiovisual asynchrony that exceeds 200 ms. This seeming contradiction in human temporal sensitivity can be attributed to a number of factors such as experimental approaches and precedence of the asynchronous signals, along with the nature, duration, location, complexity and repetitiveness of the audiovisual stimuli, and even individual differences. In order to better understand how temporal integration of audiovisual events occurs in the real world, we need to close the gap between the experimental setting and the complex setting of everyday life. With this work, we aimed to contribute one brick to the bridge that will close this gap. We compared perceived synchrony for long-running and eventful audiovisual sequences to shorter sequences that contain a single audiovisual event, for three types of content: action, music, and speech. The resulting windows of temporal integration showed that participants were better at detecting asynchrony for the longer stimuli, possibly because the long-running sequences contain multiple corresponding events that offer audiovisual timing cues. Moreover, the points of subjective simultaneity differ between content types, suggesting that the nature of a visual scene could influence the temporal perception of events. An expected outcome from this type of experiment was the rich variation among participants' distributions and the derived points of subjective simultaneity. Hence, the designs of similar experiments call for more participants than traditional psychophysical studies. Heeding this caution, we conclude that existing theories on multisensory perception are ready to be tested on more natural and representative stimuli.

  3. Hierarchical organization of speech perception in human auditory cortex

    Directory of Open Access Journals (Sweden)

    Colin eHumphries

    2014-12-01

    Full Text Available Human speech consists of a variety of articulated sounds that vary dynamically in spectral composition. We investigated the neural activity associated with the perception of two types of speech segments: (a the period of rapid spectral transition occurring at the beginning of a stop-consonant vowel (CV syllable and (b the subsequent spectral steady-state period occurring during the vowel segment of the syllable. Functional magnetic resonance imaging (fMRI was recorded while subjects listened to series of synthesized CV syllables and non-phonemic control sounds. Adaptation to specific sound features was measured by varying either the transition or steady-state periods of the synthesized sounds. Two spatially distinct brain areas in the superior temporal cortex were found that were sensitive to either the type of adaptation or the type of stimulus. In a relatively large section of the bilateral dorsal superior temporal gyrus (STG, activity varied as a function of adaptation type regardless of whether the stimuli were phonemic or non-phonemic. Immediately adjacent to this region in a more limited area of the ventral STG, increased activity was observed for phonemic trials compared to non-phonemic trials, however, no adaptation effects were found. In addition, a third area in the bilateral medial superior temporal plane showed increased activity to non-phonemic compared to phonemic sounds. The results suggest a multi-stage hierarchical stream for speech sound processing extending ventrolaterally from the superior temporal plane to the superior temporal sulcus. At successive stages in this hierarchy, neurons code for increasingly more complex spectrotemporal features. At the same time, these representations become more abstracted from the original acoustic form of the sound.

  4. Aided and unaided speech perception by older hearing impaired listeners.

    Directory of Open Access Journals (Sweden)

    David L Woods

    Full Text Available The most common complaint of older hearing impaired (OHI listeners is difficulty understanding speech in the presence of noise. However, tests of consonant-identification and sentence reception threshold (SeRT provide different perspectives on the magnitude of impairment. Here we quantified speech perception difficulties in 24 OHI listeners in unaided and aided conditions by analyzing (1 consonant-identification thresholds and consonant confusions for 20 onset and 20 coda consonants in consonant-vowel-consonant (CVC syllables presented at consonant-specific signal-to-noise (SNR levels, and (2 SeRTs obtained with the Quick Speech in Noise Test (QSIN and the Hearing in Noise Test (HINT. Compared to older normal hearing (ONH listeners, nearly all unaided OHI listeners showed abnormal consonant-identification thresholds, abnormal consonant confusions, and reduced psychometric function slopes. Average elevations in consonant-identification thresholds exceeded 35 dB, correlated strongly with impairments in mid-frequency hearing, and were greater for hard-to-identify consonants. Advanced digital hearing aids (HAs improved average consonant-identification thresholds by more than 17 dB, with significant HA benefit seen in 83% of OHI listeners. HAs partially normalized consonant-identification thresholds, reduced abnormal consonant confusions, and increased the slope of psychometric functions. Unaided OHI listeners showed much smaller elevations in SeRTs (mean 6.9 dB than in consonant-identification thresholds and SeRTs in unaided listening conditions correlated strongly (r = 0.91 with identification thresholds of easily identified consonants. HAs produced minimal SeRT benefit (2.0 dB, with only 38% of OHI listeners showing significant improvement. HA benefit on SeRTs was accurately predicted (r = 0.86 by HA benefit on easily identified consonants. Consonant-identification tests can accurately predict sentence processing deficits and HA benefit in OHI

  5. Early speech perception in Mandarin-speaking children at one-year post cochlear implantation.

    Science.gov (United States)

    Chen, Yuan; Wong, Lena L N; Zhu, Shufeng; Xi, Xin

    2016-01-01

    The aim in this study was to examine early speech perception outcomes in Mandarin-speaking children during the first year of cochlear implant (CI) use. A hierarchical early speech perception battery was administered to 80 children before and 3, 6, and 12 months after implantation. Demographic information was obtained to evaluate its relationship with these outcomes. Regardless of dialect exposure and whether a hearing aid was trialed before implantation, implant recipients were able to attain similar pre-lingual auditory skills after 12 months of CI use. Children speaking Mandarin developed early Mandarin speech perception faster than those with greater exposure to other Chinese dialects. In addition, children with better pre-implant hearing levels and younger age at implantation attained significantly better speech perception scores after 12 months of CI use. Better pre-implant hearing levels and higher maternal education level were also associated with a significantly steeper growth in early speech perception ability. Mandarin-speaking children with CIs are able to attain early speech perception results comparable to those of their English-speaking counterparts. In addition, consistent single language input via CI probably enhances early speech perception development at least during the first-year of CI use. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. The relationship of phonological ability, speech perception and auditory perception in adults with dyslexia.

    Directory of Open Access Journals (Sweden)

    Jeremy eLaw

    2014-07-01

    Full Text Available This study investigated whether auditory, speech perception and phonological skills are tightly interrelated or independently contributing to reading. We assessed each of these three skills in 36 adults with a past diagnosis of dyslexia and 54 matched normal reading adults. Phonological skills were tested by the typical threefold tasks, i.e. rapid automatic naming, verbal short term memory and phonological awareness. Dynamic auditory processing skills were assessed by means of a frequency modulation (FM and an amplitude rise time (RT; an intensity discrimination task (ID was included as a non-dynamic control task. Speech perception was assessed by means of sentences and words in noise tasks. Group analysis revealed significant group differences in auditory tasks (i.e. RT and ID and in phonological processing measures, yet no differences were found for speech perception. In addition, performance on RT discrimination correlated with reading but this relation was mediated by phonological processing and not by speech in noise. Finally, inspection of the individual scores revealed that the dyslexic readers showed an increased proportion of deviant subjects on the slow-dynamic auditory and phonological tasks, yet each individual dyslexic reader does not display a clear pattern of deficiencies across the levels of processing skills. Although our results support phonological and slow-rate dynamic auditory deficits which relate to literacy, they suggest that at the individual level, problems in reading and writing cannot be explained by the cascading auditory theory. Instead, dyslexic adults seem to vary considerably in the extent to which each of the auditory and phonological factors are expressed and interact with environmental and higher-order cognitive influences.

  7. The relationship of phonological ability, speech perception, and auditory perception in adults with dyslexia.

    Science.gov (United States)

    Law, Jeremy M; Vandermosten, Maaike; Ghesquiere, Pol; Wouters, Jan

    2014-01-01

    This study investigated whether auditory, speech perception, and phonological skills are tightly interrelated or independently contributing to reading. We assessed each of these three skills in 36 adults with a past diagnosis of dyslexia and 54 matched normal reading adults. Phonological skills were tested by the typical threefold tasks, i.e., rapid automatic naming, verbal short-term memory and phonological awareness. Dynamic auditory processing skills were assessed by means of a frequency modulation (FM) and an amplitude rise time (RT); an intensity discrimination task (ID) was included as a non-dynamic control task. Speech perception was assessed by means of sentences and words-in-noise tasks. Group analyses revealed significant group differences in auditory tasks (i.e., RT and ID) and in phonological processing measures, yet no differences were found for speech perception. In addition, performance on RT discrimination correlated with reading but this relation was mediated by phonological processing and not by speech-in-noise. Finally, inspection of the individual scores revealed that the dyslexic readers showed an increased proportion of deviant subjects on the slow-dynamic auditory and phonological tasks, yet each individual dyslexic reader does not display a clear pattern of deficiencies across the processing skills. Although our results support phonological and slow-rate dynamic auditory deficits which relate to literacy, they suggest that at the individual level, problems in reading and writing cannot be explained by the cascading auditory theory. Instead, dyslexic adults seem to vary considerably in the extent to which each of the auditory and phonological factors are expressed and interact with environmental and higher-order cognitive influences.

  8. Functional correlates of the speech-in-noise perception impairment in dyslexia: an MRI study.

    Science.gov (United States)

    Dole, Marjorie; Meunier, Fanny; Hoen, Michel

    2014-07-01

    Dyslexia is a language-based neurodevelopmental disorder. It is characterized as a persistent deficit in reading and spelling. These difficulties have been shown to result from an underlying impairment of the phonological component of language, possibly also affecting speech perception. Although there is little evidence for such a deficit under optimal, quiet listening conditions, speech perception difficulties in adults with dyslexia are often reported under more challenging conditions, such as when speech is masked by noise. Previous studies have shown that these difficulties are more pronounced when the background noise is speech and when little spatial information is available to facilitate differentiation between target and background sound sources. In this study, we investigated the neuroimaging correlates of speech-in-speech perception in typical readers and participants with dyslexia, focusing on the effects of different listening configurations. Fourteen adults with dyslexia and 14 matched typical readers performed a subjective intelligibility rating test with single words presented against concurrent speech during functional magnetic resonance imaging (fMRI) scanning. Target words were always presented with a four-talker background in one of three listening configurations: Dichotic, Binaural or Monaural. The results showed that in the Monaural configuration, in which no spatial information was available and energetic masking was maximal, intelligibility was severely decreased in all participants, and this effect was particularly strong in participants with dyslexia. Functional imaging revealed that in this configuration, participants partially compensate for their poorer listening abilities by recruiting several areas in the cerebral networks engaged in speech perception. In the Binaural configuration, participants with dyslexia achieved the same performance level as typical readers, suggesting that they were able to use spatial information when available

  9. Auditory processing disorder and speech perception problems in noise: finding the underlying origin.

    Science.gov (United States)

    Lagacé, Josée; Jutras, Benoît; Gagné, Jean-Pierre

    2010-06-01

    A hallmark listening problem of individuals presenting with auditory processing disorder (APD) is their poor recognition of speech in noise. The underlying perceptual problem of the listening difficulties in unfavorable listening conditions is unknown. The objective of this article was to demonstrate theoretically how to determine whether the speech recognition problems are related to an auditory dysfunction, a language-based dysfunction, or a combination of both. Tests such as the Speech Perception in Noise (SPIN) test allow the exploration of the auditory and language-based functions involved in speech perception in noise, which is not possible with most other speech-in-noise tests. Psychometric functions illustrating results from hypothetical groups of individuals with APD on the SPIN test are presented. This approach makes it possible to postulate about the origin of the speech perception problems in noise. APD is a complex and heterogeneous disorder for which the underlying deficit is currently unclear. Because of their design, SPIN-like tests can potentially be used to identify the nature of the deficits underlying problems with speech perception in noise for this population. A better understanding of the difficulties with speech perception in noise experienced by many listeners with APD should lead to more efficient intervention programs.

  10. Cholinergic Potentiation and Audiovisual Repetition-Imitation Therapy Improve Speech Production and Communication Deficits in a Person with Crossed Aphasia by Inducing Structural Plasticity in White Matter Tracts

    Science.gov (United States)

    Berthier, Marcelo L.; De-Torres, Irene; Paredes-Pacheco, José; Roé-Vellvé, Núria; Thurnhofer-Hemsi, Karl; Torres-Prioris, María J.; Alfaro, Francisco; Moreno-Torres, Ignacio; López-Barroso, Diana; Dávila, Guadalupe

    2017-01-01

    Donepezil (DP), a cognitive-enhancing drug targeting the cholinergic system, combined with massed sentence repetition training augmented and speeded up recovery of speech production deficits in patients with chronic conduction aphasia and extensive left hemisphere infarctions (Berthier et al., 2014). Nevertheless, a still unsettled question is whether such improvements correlate with restorative structural changes in gray matter and white matter pathways mediating speech production. In the present study, we used pharmacological magnetic resonance imaging to study treatment-induced brain changes in gray matter and white matter tracts in a right-handed male with chronic conduction aphasia and a right subcortical lesion (crossed aphasia). A single-patient, open-label multiple-baseline design incorporating two different treatments and two post-treatment evaluations was used. The patient received an initial dose of DP (5 mg/day) which was maintained during 4 weeks and then titrated up to 10 mg/day and administered alone (without aphasia therapy) during 8 weeks (Endpoint 1). Thereafter, the drug was combined with an audiovisual repetition-imitation therapy (Look-Listen-Repeat, LLR) during 3 months (Endpoint 2). Language evaluations, diffusion weighted imaging (DWI), and voxel-based morphometry (VBM) were performed at baseline and at both endpoints in JAM and once in 21 healthy control males. Treatment with DP alone and combined with LLR therapy induced marked improvement in aphasia and communication deficits as well as in selected measures of connected speech production, and phrase repetition. The obtained gains in speech production remained well-above baseline scores even 4 months after ending combined therapy. Longitudinal DWI showed structural plasticity in the right frontal aslant tract and direct segment of the arcuate fasciculus with both interventions. VBM revealed no structural changes in other white matter tracts nor in cortical areas linked by these tracts. In

  11. Cholinergic Potentiation and Audiovisual Repetition-Imitation Therapy Improve Speech Production and Communication Deficits in a Person with Crossed Aphasia by Inducing Structural Plasticity in White Matter Tracts

    Directory of Open Access Journals (Sweden)

    Marcelo L. Berthier

    2017-06-01

    Full Text Available Donepezil (DP, a cognitive-enhancing drug targeting the cholinergic system, combined with massed sentence repetition training augmented and speeded up recovery of speech production deficits in patients with chronic conduction aphasia and extensive left hemisphere infarctions (Berthier et al., 2014. Nevertheless, a still unsettled question is whether such improvements correlate with restorative structural changes in gray matter and white matter pathways mediating speech production. In the present study, we used pharmacological magnetic resonance imaging to study treatment-induced brain changes in gray matter and white matter tracts in a right-handed male with chronic conduction aphasia and a right subcortical lesion (crossed aphasia. A single-patient, open-label multiple-baseline design incorporating two different treatments and two post-treatment evaluations was used. The patient received an initial dose of DP (5 mg/day which was maintained during 4 weeks and then titrated up to 10 mg/day and administered alone (without aphasia therapy during 8 weeks (Endpoint 1. Thereafter, the drug was combined with an audiovisual repetition-imitation therapy (Look-Listen-Repeat, LLR during 3 months (Endpoint 2. Language evaluations, diffusion weighted imaging (DWI, and voxel-based morphometry (VBM were performed at baseline and at both endpoints in JAM and once in 21 healthy control males. Treatment with DP alone and combined with LLR therapy induced marked improvement in aphasia and communication deficits as well as in selected measures of connected speech production, and phrase repetition. The obtained gains in speech production remained well-above baseline scores even 4 months after ending combined therapy. Longitudinal DWI showed structural plasticity in the right frontal aslant tract and direct segment of the arcuate fasciculus with both interventions. VBM revealed no structural changes in other white matter tracts nor in cortical areas linked by these

  12. A role for the inferior colliculus in multisensory speech integration.

    Science.gov (United States)

    Champoux, François; Tremblay, Corinne; Mercier, Claude; Lassonde, Maryse; Lepore, Franco; Gagné, Jean-Pierre; Théoret, Hugo

    2006-10-23

    Multisensory integration can occur at relatively low levels within the central nervous system. Recent evidence suggests that multisensory audio-visual integration for speech may have a subcortical component, as acoustic processing in the human brainstem is influenced by lipreading during speech perception. Here, stimuli depicting the McGurk illusion (a demonstration of auditory-visual integration using speech stimuli) were presented to a 12-year-old child (FX) with a circumscribed unilateral lesion of the right inferior colliculus. When McGurk-type stimuli were presented in the contralesional hemifield, illusory perception reflecting bimodal integration was significantly reduced compared with the ipsilesional hemifield and a group of age-matched controls. These data suggest a functional role for the inferior colliculus in the audio-visual integration of speech stimuli.

  13. The role of abstraction in non-native speech perception.

    Science.gov (United States)

    Pajak, Bozena; Levy, Roger

    2014-09-01

    The end-result of perceptual reorganization in infancy is currently viewed as a reconfigured perceptual space, "warped" around native-language phonetic categories, which then acts as a direct perceptual filter on any non-native sounds: naïve-listener discrimination of non-native-sounds is determined by their mapping onto native-language phonetic categories that are acoustically/articulatorily most similar. We report results that suggest another factor in non-native speech perception: some perceptual sensitivities cannot be attributed to listeners' warped perceptual space alone, but rather to enhanced general sensitivity along phonetic dimensions that the listeners' native language employs to distinguish between categories. Specifically, we show that the knowledge of a language with short and long vowel categories leads to enhanced discrimination of non-native consonant length contrasts. We argue that these results support a view of perceptual reorganization as the consequence of learners' hierarchical inductive inferences about the structure of the language's sound system: infants not only acquire the specific phonetic category inventory, but also draw higher-order generalizations over the set of those categories, such as the overall informativity of phonetic dimensions for sound categorization. Non-native sound perception is then also determined by sensitivities that emerge from these generalizations, rather than only by mappings of non-native sounds onto native-language phonetic categories.

  14. Relative Contributions of the Dorsal vs. Ventral Speech Streams to Speech Perception are Context Dependent: a lesion study

    Directory of Open Access Journals (Sweden)

    Corianne Rogalsky

    2014-04-01

    Full Text Available The neural basis of speech perception has been debated for over a century. While it is generally agreed that the superior temporal lobes are critical for the perceptual analysis of speech, a major current topic is whether the motor system contributes to speech perception, with several conflicting findings attested. In a dorsal-ventral speech stream framework (Hickok & Poeppel 2007, this debate is essentially about the roles of the dorsal versus ventral speech processing streams. A major roadblock in characterizing the neuroanatomy of speech perception is task-specific effects. For example, much of the evidence for dorsal stream involvement comes from syllable discrimination type tasks, which have been found to behaviorally doubly dissociate from auditory comprehension tasks (Baker et al. 1981. Discrimination task deficits could be a result of difficulty perceiving the sounds themselves, which is the typical assumption, or it could be a result of failures in temporary maintenance of the sensory traces, or the comparison and/or the decision process. Similar complications arise in perceiving sentences: the extent of inferior frontal (i.e. dorsal stream activation during listening to sentences increases as a function of increased task demands (Love et al. 2006. Another complication is the stimulus: much evidence for dorsal stream involvement uses speech samples lacking semantic context (CVs, non-words. The present study addresses these issues in a large-scale lesion-symptom mapping study. 158 patients with focal cerebral lesions from the Mutli-site Aphasia Research Consortium underwent a structural MRI or CT scan, as well as an extensive psycholinguistic battery. Voxel-based lesion symptom mapping was used to compare the neuroanatomy involved in the following speech perception tasks with varying phonological, semantic, and task loads: (i two discrimination tasks of syllables (non-words and words, respectively, (ii two auditory comprehension tasks

  15. The shadow of a doubt ? Evidence for perceptuo-motor linkage during auditory and audiovisual close shadowing

    Directory of Open Access Journals (Sweden)

    Lucie eScarbel

    2014-06-01

    Full Text Available One classical argument in favor of a functional role of the motor system in speech perception comes from the close shadowing task in which a subject has to identify and to repeat as quickly as possible an auditory speech stimulus. The fact that close shadowing can occur very rapidly and much faster than manual identification of the speech target is taken to suggest that perceptually-induced speech representations are already shaped in a motor-compatible format. Another argument is provided by audiovisual interactions often interpreted as referring to a multisensory-motor framework. In this study, we attempted to combine these two paradigms by testing whether the visual modality could speed motor response in a close-shadowing task. To this aim, both oral and manual responses were evaluated during the perception of auditory and audio-visual speech stimuli, clear or embedded in white noise. Overall, oral responses were faster than manual ones, but it also appeared that they were less accurate in noise, which suggests that motor representations evoked by the speech input could be rough at a first processing stage. In the presence of acoustic noise, the audiovisual modality led to both faster and more accurate responses than the auditory modality. No interaction was however observed between modality and response. Altogether, these results are interpreted within a two-stage sensory-motor framework, in which the auditory and visual streams are integrated together and with internally generated motor representations before a final decision may be available.

  16. Speech perception among school-aged skilled and less skilled readers.

    Science.gov (United States)

    Wayland, Ratree P; Eckhouse, Erin; Lombardino, Linda; Roberts, Rosalyn

    2010-12-01

    This study investigated the relationship between speech perception, phonological processing and reading skills among school-aged children classified as 'skilled' and 'less skilled' readers based on their ability to read words, decode non-words, and comprehend short passages. Three speech perception tasks involving categorization of speech continua differing in voicing, place and manner of articulation were administered and compared to phonological processing skills in phonological awareness, speeded naming and verbal short-term memory. The results obtained suggested that (a) speech categorization among skilled readers differed from that of less skilled readers, (b) speech perception skills were associated with both reading and phonological processing skills among both skilled and less skilled readers, however, (c) a strong association between speeded naming and both word and passage reading skills found among skilled readers was absent among less skilled readers. These results suggested that phonological representations and/or activation may not be as well developed in less skilled readers.

  17. N1 enhancement in synesthesia during visual and audio-visual perception in semantic cross-modal conflict situations: an ERP study

    Directory of Open Access Journals (Sweden)

    Christopher eSinke

    2014-01-01

    Full Text Available Synesthesia entails a special kind of sensory perception, where stimulation in one sensory modality leads to an internally generated perceptual experience of another, not stimulated sensory modality. This phenomenon can be viewed as an abnormal multisensory integration process as here the synesthetic percept is aberrantly fused with the stimulated modality. Indeed, recent synesthesia research has focused on multimodal processing even outside of the specific synesthesia-inducing context and has revealed changed multimodal integration, thus suggesting perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and inanimated objects in an audio-visual congruent and incongruent manner. Fourteen subjects with auditory-visual and/or grapheme-color synesthesia and 14 control subjects participated in the experiment. During presentation of the stimuli, event-related potentials were recorded from 32 electrodes. The analysis of reaction times and error rates revealed no group differences with best performance for audio-visually congruent stimulation indicating the well-known multimodal facilitation effect. We found an enhanced amplitude of the N1 component over occipital electrode sites for synesthetes compared to controls. The differences occurred irrespective of the experimental condition and therefore suggest a global influence on early sensory processing in synesthetes.

  18. N1 enhancement in synesthesia during visual and audio-visual perception in semantic cross-modal conflict situations: an ERP study.

    Science.gov (United States)

    Sinke, Christopher; Neufeld, Janina; Wiswede, Daniel; Emrich, Hinderk M; Bleich, Stefan; Münte, Thomas F; Szycik, Gregor R

    2014-01-01

    Synesthesia entails a special kind of sensory perception, where stimulation in one sensory modality leads to an internally generated perceptual experience of another, not stimulated sensory modality. This phenomenon can be viewed as an abnormal multisensory integration process as here the synesthetic percept is aberrantly fused with the stimulated modality. Indeed, recent synesthesia research has focused on multimodal processing even outside of the specific synesthesia-inducing context and has revealed changed multimodal integration, thus suggesting perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and in animated objects in an audio-visual congruent and incongruent manner. Fourteen subjects with auditory-visual and/or grapheme-color synesthesia and 14 control subjects participated in the experiment. During presentation of the stimuli, event-related potentials were recorded from 32 electrodes. The analysis of reaction times and error rates revealed no group differences with best performance for audio-visually congruent stimulation indicating the well-known multimodal facilitation effect. We found enhanced amplitude of the N1 component over occipital electrode sites for synesthetes compared to controls. The differences occurred irrespective of the experimental condition and therefore suggest a global influence on early sensory processing in synesthetes.

  19. The temporal window of audio-tactile integration in speech perception

    OpenAIRE

    Gick, Bryan; Ikegami, Yoko; Derrick, Donald

    2010-01-01

    Asynchronous cross-modal information is integrated asymmetrically in audio-visual perception. To test whether this asymmetry generalizes across modalities, auditory (aspirated “pa” and unaspirated “ba” stops) and tactile (slight, inaudible, cutaneous air puffs) signals were presented synchronously and asynchronously. Results were similar to previous AV studies: the temporal window of integration for the enhancement effect (but not the interference effect) was asymmetrical, allowing up to 200 ...

  20. Speech perception under adverse conditions: insights from behavioral, computational, and neuroscience research

    Science.gov (United States)

    Guediche, Sara; Blumstein, Sheila E.; Fiez, Julie A.; Holt, Lori L.

    2014-01-01

    Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. Specifically, we highlight the potential role of learning algorithms that rely on prediction error signals and discuss specific neural structures that are likely to contribute to such learning. To this end, we review behavioral studies, computational accounts, and neuroimaging findings related to adaptive plasticity in speech perception. Already, a few studies have alluded to a potential role of these mechanisms in adaptive plasticity in speech perception. Furthermore, we consider research topics in neuroscience that offer insight into how perception can be adaptively tuned to short-term deviations while balancing the need to maintain stability in the perception of learned long-term regularities. Consideration of the application and limitations of these algorithms in characterizing flexible speech perception under adverse conditions promises to inform theoretical models of speech. PMID:24427119

  1. The categorisation of speech sounds by adults and children: a study of the categorical perception hypothesis and the development weighting of acoustic speech cues

    NARCIS (Netherlands)

    Gerrits, E.

    2001-01-01

    This thesis investigates the way adults and children perceive speech. With adult listeners, the question was whether speech is perceived categorically (categorical speech perception). With children, the question was whether there are age-related differences between the weights assigned to

  2. Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech

    Science.gov (United States)

    Ben-David, Boaz M.; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H. H. M.

    2016-01-01

    Purpose: Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. Method: We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5…

  3. Predictive top-down integration of prior knowledge during speech perception

    National Research Council Canada - National Science Library

    Sohoglu, Ediz; Peelle, Jonathan E; Carlyon, Robert P; Davis, Matthew H

    2012-01-01

    ... higher-level knowledge influences sensory processing through feedback connections. Here we used concurrent EEG and MEG recordings to determine how sensory information and prior knowledge are integrated in the brain during speech perception...

  4. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    Science.gov (United States)

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…

  5. Hearing Aid-Induced Plasticity in the Auditory System of Older Adults: Evidence from Speech Perception

    Science.gov (United States)

    Lavie, Limor; Banai, Karen; Karni, Avi; Attias, Joseph

    2015-01-01

    Purpose: We tested whether using hearing aids can improve unaided performance in speech perception tasks in older adults with hearing impairment. Method: Unaided performance was evaluated in dichotic listening and speech-­in-­noise tests in 47 older adults with hearing impairment; 36 participants in 3 study groups were tested before hearing aid…

  6. Effects of Real-Time Cochlear Implant Simulation on Speech Perception and Production

    Science.gov (United States)

    Casserly, Elizabeth D.

    2013-01-01

    Real-time use of spoken language is a fundamentally interactive process involving speech perception, speech production, linguistic competence, motor control, neurocognitive abilities such as working memory, attention, and executive function, environmental noise, conversational context, and--critically--the communicative interaction between…

  7. A Retrospective Multicenter Study Comparing Speech Perception Outcomes for Bilateral Implantation and Bimodal Rehabilitation

    NARCIS (Netherlands)

    Blamey, Peter J.; Maat, Bert; Başkent, Deniz; Mawman, Deborah; Burke, Elaine; Dillier, Norbert; Beynon, Andy; Kleine-Punte, Andrea; Govaerts, Paul J.; Skarzynski, Piotr H.; Huber, Alexander M.; Sterkers-Artieres, Francoise; Van de Heyning, Paul; O'Leary, Stephen; Fraysse, Bernard; Green, Kevin; Sterkers, Olivier; Venail, Frederic; Skarzynski, Henryk; Vincent, Christophe; Truy, Eric; Dowell, Richard; Bergeron, Francois; Lazard, Diane S.

    2015-01-01

    Objectives: To compare speech perception outcomes between bilateral implantation (cochlear implants [CIs]) and bimodal rehabilitation (one CI on one side plus one hearing aid [HA] on the other side) and to explore the clinical factors that may cause asymmetric performances in speech intelligibility

  8. The Downside of Greater Lexical Influences: Selectively Poorer Speech Perception in Noise

    Science.gov (United States)

    Lam, Boji P. W.; Xie, Zilong; Tessmer, Rachel; Chandrasekaran, Bharath

    2017-01-01

    Purpose: Although lexical information influences phoneme perception, the extent to which reliance on lexical information enhances speech processing in challenging listening environments is unclear. We examined the extent to which individual differences in lexical influences on phonemic processing impact speech processing in maskers containing…

  9. The Link between Speech Perception and Production Is Phonological and Abstract: Evidence from the Shadowing Task

    Science.gov (United States)

    Mitterer, Holger; Ernestus, Mirjam

    2008-01-01

    This study reports a shadowing experiment, in which one has to repeat a speech stimulus as fast as possible. We tested claims about a direct link between perception and production based on speech gestures, and obtained two types of counterevidence. First, shadowing is not slowed down by a gestural mismatch between stimulus and response. Second,…

  10. Categorical Speech Perception Deficits Distinguish Language and Reading Impairments in Children

    Science.gov (United States)

    Robertson, Erin K.; Joanisse, Marc F.; Desroches, Amy S.; Ng, Stella

    2009-01-01

    We examined categorical speech perception in school-age children with developmental dyslexia or Specific Language Impairment (SLI), compared to age-matched and younger controls. Stimuli consisted of synthetic speech tokens in which place of articulation varied from "b" to "d". Children were tested on categorization, categorization in noise, and…

  11. Hearing loss and speech perception in noise difficulties in Fanconi anemia.

    Science.gov (United States)

    Verheij, Emmy; Oomen, Karin P Q; Smetsers, Stephanie E; van Zanten, Gijsbert A; Speleman, Lucienne

    2017-10-01

    Fanconi anemia is a hereditary chromosomal instability disorder. Hearing loss and ear abnormalities are among the many manifestations reported in this disorder. In addition, Fanconi anemia patients often complain about hearing difficulties in situations with background noise (speech perception in noise difficulties). Our study aimed to describe the prevalence of hearing loss and speech perception in noise difficulties in Dutch Fanconi anemia patients. Retrospective chart review. A retrospective chart review was conducted at a Dutch tertiary care center. All patients with Fanconi anemia at clinical follow-up in our hospital were included. Medical files were reviewed to collect data on hearing loss and speech perception in noise difficulties. In total, 49 Fanconi anemia patients were included. Audiograms were available in 29 patients and showed hearing loss in 16 patients (55%). Conductive hearing loss was present in 24.1%, sensorineural in 20.7%, and mixed in 10.3%. A speech in noise test was performed in 17 patients; speech perception in noise was subnormal in nine patients (52.9%) and abnormal in two patients (11.7%). Hearing loss and speech perception in noise abnormalities are common in Fanconi anemia. Therefore, pure tone audiograms and speech in noise tests should be performed, preferably already at a young age, because hearing aids or assistive listening devices could be very valuable in developing language and communication skills. 4. Laryngoscope, 127:2358-2361, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.

  12. Audiovisual segregation in cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Simon Landry

    Full Text Available It has traditionally been assumed that cochlear implant users de facto perform atypically in audiovisual tasks. However, a recent study that combined an auditory task with visual distractors suggests that only those cochlear implant users that are not proficient at recognizing speech sounds might show abnormal audiovisual interactions. The present study aims at reinforcing this notion by investigating the audiovisual segregation abilities of cochlear implant users in a visual task with auditory distractors. Speechreading was assessed in two groups of cochlear implant users (proficient and non-proficient at sound recognition, as well as in normal controls. A visual speech recognition task (i.e. speechreading was administered either in silence or in combination with three types of auditory distractors: i noise ii reverse speech sound and iii non-altered speech sound. Cochlear implant users proficient at speech recognition performed like normal controls in all conditions, whereas non-proficient users showed significantly different audiovisual segregation patterns in both speech conditions. These results confirm that normal-like audiovisual segregation is possible in highly skilled cochlear implant users and, consequently, that proficient and non-proficient CI users cannot be lumped into a single group. This important feature must be taken into account in further studies of audiovisual interactions in cochlear implant users.

  13. Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users.

    Science.gov (United States)

    Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha

    2015-03-01

    While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed

  14. Reanalyzing neurocognitive data on the role of the motor system in speech perception within COSMO, a Bayesian perceptuo-motor model of speech communication.

    Science.gov (United States)

    Barnaud, Marie-Lou; Bessière, Pierre; Diard, Julien; Schwartz, Jean-Luc

    2017-12-11

    While neurocognitive data provide clear evidence for the involvement of the motor system in speech perception, its precise role and the way motor information is involved in perceptual decision remain unclear. In this paper, we discuss some recent experimental results in light of COSMO, a Bayesian perceptuo-motor model of speech communication. COSMO enables us to model both speech perception and speech production with probability distributions relating phonological units with sensory and motor variables. Speech perception is conceived as a sensory-motor architecture combining an auditory and a motor decoder thanks to a Bayesian fusion process. We propose the sketch of a neuroanatomical architecture for COSMO, and we capitalize on properties of the auditory vs. motor decoders to address three neurocognitive studies of the literature. Altogether, this computational study reinforces functional arguments supporting the role of a motor decoding branch in the speech perception process. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Listener-speaker perceived distance predicts the degree of motor contribution to speech perception.

    Science.gov (United States)

    Bartoli, Eleonora; D'Ausilio, Alessandro; Berry, Jeffrey; Badino, Leonardo; Bever, Thomas; Fadiga, Luciano

    2015-02-01

    Listening speech sounds activates motor and premotor areas in addition to temporal and parietal brain regions. These activations are somatotopically localized according to the effectors recruited in the production of particular phonemes. Previous work demonstrated that transcranial magnetic stimulation (TMS) of speech motor centers somatotopically altered speech perception, suggesting a role for the motor system. However, these effects seemed to occur only under adverse listening conditions, suggesting that degraded speech may stimulate listeners to adopt unnatural neural strategies relying on motor centers. Here, we investigated whether naturally occurring interspeaker variability, which did not affect task difficulty, made a speech discrimination task sensitive to TMS interference. In this paradigm, TMS over tongue and lips motor representations somatotopically altered the discrimination time of speech. Furthermore, the TMS-induced effect correlated with listeners' similarity judgments between listeners' and speakers' speech productions. Thus, the degree of motor recruitment depends on the perceived distance between listener and speaker. This result supports the claim that discriminating others' speech pattern requires the contribution of the listener's own motor repertoire. We conclude that motor recruitment in speech perception can be a natural product of discriminating speech in a normally variable and unpredictable environment, not merely related to task difficulty. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Bias perceptions of realism in audiovisual media: Why we may take fiction for real

    NARCIS (Netherlands)

    Konijn, E.A.; Walma van der Molen, J.H.; van Nes, S.

    2009-01-01

    This study investigated whether emotions induced in TV-viewers (either as an emotional state or co-occurring with emotional involvement) would increase viewers' perception of realism in a fake documentary and affect the information value that viewers would attribute to its content. To that end, two

  17. Emotions Bias Perceptions of Realism in Audiovisual Media: Why We May Take Fiction for Real

    Science.gov (United States)

    Konijn, Elly A.; Walma van der Molen, Juliette H.; van Nes, Sander

    2009-01-01

    This study investigated whether emotions induced in TV-viewers (either as an emotional state or co-occurring with emotional involvement) would increase viewers' perception of realism in a fake documentary and affect the information value that viewers would attribute to its content. To that end, two experiments were conducted that manipulated (a)…

  18. Factors contributing to speech perception scores in long-term pediatric cochlear implant users.

    Science.gov (United States)

    Davidson, Lisa S; Geers, Ann E; Blamey, Peter J; Tobey, Emily A; Brenner, Christine A

    2011-02-01

    The objectives of this report are to (1) describe the speech perception abilities of long-term pediatric cochlear implant (CI) recipients by comparing scores obtained at elementary school (CI-E, 8 to 9 yrs) with scores obtained at high school (CI-HS, 15 to 18 yrs); (2) evaluate speech perception abilities in demanding listening conditions (i.e., noise and lower intensity levels) at adolescence; and (3) examine the relation of speech perception scores to speech and language development over this longitudinal timeframe. All 112 teenagers were part of a previous nationwide study of 8- and 9-yr-olds (N = 181) who received a CI between 2 and 5 yrs of age. The test battery included (1) the Lexical Neighborhood Test (LNT; hard and easy word lists); (2) the Bamford Kowal Bench sentence test; (3) the Children's Auditory-Visual Enhancement Test; (4) the Test of Auditory Comprehension of Language at CI-E; (5) the Peabody Picture Vocabulary Test at CI-HS; and (6) the McGarr sentences (consonants correct) at CI-E and CI-HS. CI-HS speech perception was measured in both optimal and demanding listening conditions (i.e., background noise and low-intensity level). Speech perception scores were compared based on age at test, lexical difficulty of stimuli, listening environment (optimal and demanding), input mode (visual and auditory-visual), and language age. All group mean scores significantly increased with age across the two test sessions. Scores of adolescents significantly decreased in demanding listening conditions. The effect of lexical difficulty on the LNT scores, as evidenced by the difference in performance between easy versus hard lists, increased with age and decreased for adolescents in challenging listening conditions. Calculated curves for percent correct speech perception scores (LNT and Bamford Kowal Bench) and consonants correct on the McGarr sentences plotted against age-equivalent language scores on the Test of Auditory Comprehension of Language and Peabody

  19. Investigating speech perception in children with dyslexia: is there evidence of a consistent deficit in individuals?

    Science.gov (United States)

    Messaoud-Galusi, Souhila; Hazan, Valerie; Rosen, Stuart

    2011-12-01

    The claim that speech perception abilities are impaired in dyslexia was investigated in a group of 62 children with dyslexia and 51 average readers matched in age. To test whether there was robust evidence of speech perception deficits in children with dyslexia, speech perception in noise and quiet was measured using 8 different tasks involving the identification and discrimination of a complex and highly natural synthetic "bee"-"pea" contrast (copy synthesized from natural models) and the perception of naturally produced words. Children with dyslexia, on average, performed more poorly than did average readers in the synthetic syllables identification task in quiet and in across-category discrimination (but not when tested using an adaptive procedure). They did not differ from average readers on 2 tasks of word recognition in noise or identification of synthetic syllables in noise. For all tasks, a majority of individual children with dyslexia performed within norms. Finally, speech perception generally did not correlate with pseudoword reading or phonological processing--the core skills related to dyslexia. On the tasks and speech stimuli that the authors used, most children with dyslexia did not appear to show a consistent deficit in speech perception.

  20. Behavioral Measures of Temporal Processing and Speech Perception in Cochlear Implant Users.

    Science.gov (United States)

    Blankenship, Chelsea; Zhang, Fawen; Keith, Robert

    2016-10-01

    Although most cochlear implant (CI) users achieve improvements in speech perception, there is still a wide variability in speech perception outcomes. There is a growing body of literature that supports the relationship between individual differences in temporal processing and speech perception performance in CI users. Previous psychophysical studies have emphasized the importance of temporal acuity for overall speech perception performance. Measurement of gap detection thresholds (GDTs) is the most common measure currently used to assess temporal resolution. However, most GDT studies completed with CI participants used direct electrical stimulation not acoustic stimulation and they used psychoacoustic research paradigms that are not easy to administer clinically. Therefore, it is necessary to determine if the variance in GDTs assessed with clinical measures of temporal processing such as the Randomized Gap Detection Test (RGDT) can be used to explain the variability in speech perception performance. The primary goal of this study was to investigate the relationship between temporal processing and speech perception performance in CI users. A correlational study investigating the relationship between behavioral GDTs (assessed with the RGDT or the Expanded Randomized Gap Detection Test) and commonly used speech perception measures (assessed with the Speech Recognition Test [SRT], Central Institute for the Deaf W-22 Word Recognition Test [W-22], Consonant-Nucleus-Consonant Test [CNC], Arizona Biomedical Sentence Recognition Test [AzBio], Bamford-Kowal-Bench Speech-in-Noise Test [BKB-SIN]). Twelve postlingually deafened adult CI users (24-83 yr) and ten normal-hearing (NH; 22-30 yr) adults participated in the study. The data were collected in a sound-attenuated test booth. After measuring pure-tone thresholds, GDTs and speech perception performance were measured. The difference in performance between-participant groups on the aforementioned tests, as well as the

  1. Prediction Errors but Not Sharpened Signals Simulate Multivoxel fMRI Patterns during Speech Perception.

    Science.gov (United States)

    Blank, Helen; Davis, Matthew H

    2016-11-01

    Successful perception depends on combining sensory input with prior knowledge. However, the underlying mechanism by which these two sources of information are combined is unknown. In speech perception, as in other domains, two functionally distinct coding schemes have been proposed for how expectations influence representation of sensory evidence. Traditional models suggest that expected features of the speech input are enhanced or sharpened via interactive activation (Sharpened Signals). Conversely, Predictive Coding suggests that expected features are suppressed so that unexpected features of the speech input (Prediction Errors) are processed further. The present work is aimed at distinguishing between these two accounts of how prior knowledge influences speech perception. By combining behavioural, univariate, and multivariate fMRI measures of how sensory detail and prior expectations influence speech perception with computational modelling, we provide evidence in favour of Prediction Error computations. Increased sensory detail and informative expectations have additive behavioural and univariate neural effects because they both improve the accuracy of word report and reduce the BOLD signal in lateral temporal lobe regions. However, sensory detail and informative expectations have interacting effects on speech representations shown by multivariate fMRI in the posterior superior temporal sulcus. When prior knowledge was absent, increased sensory detail enhanced the amount of speech information measured in superior temporal multivoxel patterns, but with informative expectations, increased sensory detail reduced the amount of measured information. Computational simulations of Sharpened Signals and Prediction Errors during speech perception could both explain these behavioural and univariate fMRI observations. However, the multivariate fMRI observations were uniquely simulated by a Prediction Error and not a Sharpened Signal model. The interaction between prior

  2. An algorithm of improving speech emotional perception for hearing aid

    Science.gov (United States)

    Xi, Ji; Liang, Ruiyu; Fei, Xianju

    2017-07-01

    In this paper, a speech emotion recognition (SER) algorithm was proposed to improve the emotional perception of hearing-impaired people. The algorithm utilizes multiple kernel technology to overcome the drawback of SVM: slow training speed. Firstly, in order to improve the adaptive performance of Gaussian Radial Basis Function (RBF), the parameter determining the nonlinear mapping was optimized on the basis of Kernel target alignment. Then, the obtained Kernel Function was used as the basis kernel of Multiple Kernel Learning (MKL) with slack variable that could solve the over-fitting problem. However, the slack variable also brings the error into the result. Therefore, a soft-margin MKL was proposed to balance the margin against the error. Moreover, the relatively iterative algorithm was used to solve the combination coefficients and hyper-plane equations. Experimental results show that the proposed algorithm can acquire an accuracy of 90% for five kinds of emotions including happiness, sadness, anger, fear and neutral. Compared with KPCA+CCA and PIM-FSVM, the proposed algorithm has the highest accuracy.

  3. Production and perception of listener-oriented clear speech in child language.

    Science.gov (United States)

    Syrett, Kristen; Kawahara, Shigeto

    2014-11-01

    In this paper, we ask whether children are sensitive to the needs of their interlocutor, and, if so, whether they - like adults - modify acoustic characteristics of their speech as part of a communicative goal. In a production task, preschoolers participated in a word learning task that favored the use of clear speech. Children produced vowels that were longer, more intense, more dispersed in the vowel space, and had a more expanded F0 range than normal speech. Two perception studies with adults showed that these acoustic differences were perceptible and were used to distinguish normal and clear speech styles. We conclude that preschoolers are sensitive to aspects of the speaker-hearer relationship calling upon them to modify their speech in ways that benefit their listener.

  4. Brain electric activity during the preattentive perception of speech sounds in tonal languages

    Directory of Open Access Journals (Sweden)

    Naiphinich Kotchabhakdi

    2004-05-01

    Full Text Available The present study was intended to make electrophysiological investigations into the preattentive perception of native and non-native speech sounds. We recorded the mismatch negativity, elicited by single syllable change of both native and non-native speech-sound contrasts in tonal languages. EEGs were recorded and low-resolution brain electromagnetic tomography (LORETA was utilized to explore the neural electrical activity. Our results suggested that the left hemisphere was predominant in the perception of native speech sounds, whereas the non-native speech sound was perceived predominantly by the right hemisphere, which may be explained by the specialization in processing the prosodic and emotional components of speech formed in this hemisphere.

  5. PRONUNCIATION LANGUAGE SUBSYSTEM AND EEG-CORRELATES OF FOREIGN SPEECH PERCEPTION (PSYCHOACOUSTIC AND PHYSIOLOGICAL ASPECTS

    Directory of Open Access Journals (Sweden)

    Larisa Evgenevna Deryagina

    2015-02-01

    Full Text Available Article is devoted to identification of psychoacoustic differences between languages of Roman, Germanic and Slavic groups, as factors that hinder the learning of foreign languages and EEG-correlates of perception and recognition of foreign speech, as the process of communication. We used theoretico- methodological analysis of psycholinguistic data, psychoacoustic and physiological (own studies. It was determined that the acoustic characteristics of foreign speech affect cerebration and form s of its functioning through the auditory sensory system. Prosodic and articulatory system of the native language has a significant influence on the perception of foreign speech. Patterns of foreign language speech perception are based on different functions of the cerebral hemispheres. Differences in hemispheric organization of the brain can have a significant impact on the effectiveness of learning languages belonging to the Roman, Germanic and Slavic groups, having acoustic and rhythmical-melodic features.

  6. Working memory training to improve speech perception in noise across languages.

    Science.gov (United States)

    Ingvalson, Erin M; Dhar, Sumitrajit; Wong, Patrick C M; Liu, Hanjun

    2015-06-01

    Working memory capacity has been linked to performance on many higher cognitive tasks, including the ability to perceive speech in noise. Current efforts to train working memory have demonstrated that working memory performance can be improved, suggesting that working memory training may lead to improved speech perception in noise. A further advantage of working memory training to improve speech perception in noise is that working memory training materials are often simple, such as letters or digits, making them easily translatable across languages. The current effort tested the hypothesis that working memory training would be associated with improved speech perception in noise and that materials would easily translate across languages. Native Mandarin Chinese and native English speakers completed ten days of reversed digit span training. Reading span and speech perception in noise both significantly improved following training, whereas untrained controls showed no gains. These data suggest that working memory training may be used to improve listeners' speech perception in noise and that the materials may be quickly adapted to a wide variety of listeners.

  7. Language perception activates the hand motor cortex: implications for motor theories of speech perception.

    Science.gov (United States)

    Flöel, Agnes; Ellger, Tanja; Breitenstein, Caterina; Knecht, Stefan

    2003-08-01

    The precise mechanisms of how speech may have developed are still unknown to a large extent. Gestures have proven a powerful concept for explaining how planning and analysing of motor acts could have evolved into verbal communication. According to this concept, development of an action-perception network allowed for coding and decoding of communicative gestures. These were manual or manual/articulatory in the beginning and then became increasingly elaborate in the articulatory mode. The theory predicts that listening to the 'gestures' that compose spoken language should activate an extended articulatory and manual action-perception network. To examine this hypothesis, we assessed the effects of language on cortical excitability of the hand muscle representation by transcranial magnetic stimulation. We found the hand motor system to be activated by linguistic tasks, most notably pure linguistic perception, but not by auditory or visuospatial processing. The amount of motor system activation was comparable in both hemispheres. Our data support the theory that language may have evolved within a general and bilateral action-perception network.

  8. Effects of language experience on pre-categorical perception: Distinguishing general from specialized processes in speech perception.

    Science.gov (United States)

    Iverson, Paul; Wagner, Anita; Rosen, Stuart

    2016-04-01

    Cross-language differences in speech perception have traditionally been linked to phonological categories, but it has become increasingly clear that language experience has effects beginning at early stages of perception, which blurs the accepted distinctions between general and speech-specific processing. The present experiments explored this distinction by playing stimuli to English and Japanese speakers that manipulated the acoustic form of English /r/ and /l/, in order to determine how acoustically natural and phonologically identifiable a stimulus must be for cross-language discrimination differences to emerge. Discrimination differences were found for stimuli that did not sound subjectively like speech or /r/ and /l/, but overall they were strongly linked to phonological categorization. The results thus support the view that phonological categories are an important source of cross-language differences, but also show that these differences can extend to stimuli that do not clearly sound like speech.

  9. Is There a Relationship between Speech Identification in Noise and Categorical Perception in Children with Dyslexia?

    Science.gov (United States)

    Calcus, Axelle; Lorenzi, Christian; Collet, Gregory; Colin, Cécile; Kolinsky, Régine

    2016-01-01

    Purpose: Children with dyslexia have been suggested to experience deficits in both categorical perception (CP) and speech identification in noise (SIN) perception. However, results regarding both abilities are inconsistent, and the relationship between them is still unclear. Therefore, this study aimed to investigate the relationship between CP…

  10. Speech Perception Abilities of Adults with Dyslexia: Is There Any Evidence for a True Deficit?

    Science.gov (United States)

    Hazan, Valerie; Messaoud-Galusi, Souhila; Rosen, Stuart; Nouwens, Suzan; Shakespeare, Bethanie

    2009-01-01

    Purpose: This study investigated whether adults with dyslexia show evidence of a consistent speech perception deficit by testing phoneme categorization and word perception in noise. Method: Seventeen adults with dyslexia and 20 average readers underwent a test battery including standardized reading, language and phonological awareness tests, and…

  11. Effects of Text on Speech in Noise Perception

    OpenAIRE

    IRINA GROSSMAN

    2017-01-01

    Text and speech are often present simultaneously. Thus, questions arise about how they share the language processing network, how the system prioritises and combines the information and whether the text and speech are ever confused. In this project, a novel retrospective recall task was developed and used to evaluate the interactions between text and speech in both silent and noisy conditions. The investigations considered how these interactions were affected by the relationship between the ...

  12. Getting the cocktail party started: masking effects in speech perception

    Science.gov (United States)

    Evans, S; McGettigan, C; Agnew, ZK; Rosen, S; Scott, SK

    2016-01-01

    Spoken conversations typically take place in noisy environments and different kinds of masking sounds place differing demands on cognitive resources. Previous studies, examining the modulation of neural activity associated with the properties of competing sounds, have shown that additional speech streams engage the superior temporal gyrus. However, the absence of a condition in which target speech was heard without additional masking made it difficult to identify brain networks specific to masking and to ascertain the extent to which competing speech was processed equivalently to target speech. In this study, we scanned young healthy adults with continuous functional Magnetic Resonance Imaging (fMRI), whilst they listened to stories masked by sounds that differed in their similarity to speech. We show that auditory attention and control networks are activated during attentive listening to masked speech in the absence of an overt behavioural task. We demonstrate that competing speech is processed predominantly in the left hemisphere within the same pathway as target speech but is not treated equivalently within that stream, and that individuals who perform better in speech in noise tasks activate the left mid-posterior superior temporal gyrus more. Finally, we identify neural responses associated with the onset of sounds in the auditory environment, activity was found within right lateralised frontal regions consistent with a phasic alerting response. Taken together, these results provide a comprehensive account of the neural processes involved in listening in noise. PMID:26696297

  13. Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort

    DEFF Research Database (Denmark)

    Schmidt, Erik

    2007-01-01

    Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort. Sound processing in hearing aids is determined by the fitting rule. The fitting rule describes how the hearing aid should amplify speech and sounds in the surroundings, such that t......Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort. Sound processing in hearing aids is determined by the fitting rule. The fitting rule describes how the hearing aid should amplify speech and sounds in the surroundings...... research -for example investigations of loudness perception in hearing impaired listeners. Most research has been focused on speech and sounds at medium input-levels (e.g., 60-65 dB SPL). It is well documented that for speech at conversational levels, hearing aid-users prefer the signal to be amplified...... in regard to perceived level variation, loudness and overall acceptance. In the second experiment, two signals containing speech and noise at 75 dB SPL RMS-level, were compressed with six compression ratios from 1:1 to 10:1 and three release times from 40 ms to 4000 ms. In this experiment, subjects rated...

  14. Visual speech perception in foveal and extrafoveal vision: further implications for divisions in hemispheric projections.

    Science.gov (United States)

    Jordan, Timothy R; Sheen, Mercedes; Abedipour, Lily; Paterson, Kevin B

    2014-01-01

    When observing a talking face, it has often been argued that visual speech to the left and right of fixation may produce differences in performance due to divided projections to the two cerebral hemispheres. However, while it seems likely that such a division in hemispheric projections exists for areas away from fixation, the nature and existence of a functional division in visual speech perception at the foveal midline remains to be determined. We investigated this issue by presenting visual speech in matched hemiface displays to the left and right of a central fixation point, either exactly abutting the foveal midline or else located away from the midline in extrafoveal vision. The location of displays relative to the foveal midline was controlled precisely using an automated, gaze-contingent eye-tracking procedure. Visual speech perception showed a clear right hemifield advantage when presented in extrafoveal locations but no hemifield advantage (left or right) when presented abutting the foveal midline. Thus, while visual speech observed in extrafoveal vision appears to benefit from unilateral projections to left-hemisphere processes, no evidence was obtained to indicate that a functional division exists when visual speech is observed around the point of fixation. Implications of these findings for understanding visual speech perception and the nature of functional divisions in hemispheric projection are discussed.

  15. Speech-perception-in-noise and bilateral spatial abilities in adults with delayed sequential cochlear implantation

    Directory of Open Access Journals (Sweden)

    Ilze Oosthuizen

    2012-12-01

    Full Text Available Objective: To determine speech-perception-in-noise (with speech and noise spatially distinct and coincident and bilateral spatial benefits of head-shadow effect, summation, squelch and spatial release of masking in adults with delayed sequential cochlear implants. Study design: A cross-sectional one group post-test-only exploratory design was employed. Eleven adults (mean age 47 years; range 21 – 69 years of the Pretoria Cochlear Implant Programme (PCIP in South Africa with a bilateral severe-to-profound sensorineural hearing loss were recruited. Prerecorded Everyday Speech Sentences of The Central Institute for the Deaf (CID were used to evaluate participants’ speech-in-noise perception at sentence level. An adaptive procedure was used to determine the signal-to-noise ratio (SNR, in dB at which the participant’s speech reception threshold (SRT was achieved. Specific calculations were used to estimate bilateral spatial benefit effects. Results: A minimal bilateral benefit for speech-in-noise perception was observed with noise directed to the first implant (CI 1 (1.69 dB and in the speech and noise spatial listening condition (0.78 dB, but was not statistically significant. The head-shadow effect at 180° was the most robust bilateral spatial benefit. An improvement in speech perception in spatially distinct speech and noise indicates the contribution of the second implant (CI 2 is greater than that of the first implant (CI 1 for bilateral spatial benefit. Conclusion: Bilateral benefit for delayed sequentially implanted adults is less than previously reported for simultaneous and sequentially implanted adults. Delayed sequential implantation benefit seems to relate to the availability of the ear with the most favourable SNR.

  16. Congruent Visual Speech Enhances Cortical Entrainment to Continuous Auditory Speech in Noise-Free Conditions.

    Science.gov (United States)

    Crosse, Michael J; Butler, John S; Lalor, Edmund C

    2015-10-21

    Congruent audiovisual speech enhances our ability to comprehend a speaker, even in noise-free conditions. When incongruent auditory and visual information is presented concurrently, it can hinder a listener's perception and even cause him or her to perceive information that was not presented in either modality. Efforts to investigate the neural basis of these effects have often focused on the special case of discrete audiovisual syllables that are spatially and temporally congruent, with less work done on the case of natural, continuous speech. Recent electrophysiological studies have demonstrated that cortical response measures to continuous auditory speech can be easily obtained using multivariate analysis methods. Here, we apply such methods to the case of audiovisual speech and, importantly, present a novel framework for indexing multisensory integration in the context of continuous speech. Specifically, we examine how the temporal and contextual congruency of ongoing audiovisual speech affects the cortical encoding of the speech envelope in humans using electroencephalography. We demonstrate that the cortical representation of the speech envelope is enhanced by the presentation of congruent audiovisual speech in noise-free conditions. Furthermore, we show that this is likely attributable to the contribution of neural generators that are not particularly active during unimodal stimulation and that it is most prominent at the temporal scale corresponding to syllabic rate (2-6 Hz). Finally, our data suggest that neural entrainment to the speech envelope is inhibited when the auditory and visual streams are incongruent both temporally and contextually. Seeing a speaker's face as he or she talks can greatly help in understanding what the speaker is saying. This is because the speaker's facial movements relay information about what the speaker is saying, but also, importantly, when the speaker is saying it. Studying how the brain uses this timing relationship to

  17. Categorical speech perception during active discrimination of consonants and vowels.

    Science.gov (United States)

    Altmann, Christian F; Uesaki, Maiko; Ono, Kentaro; Matsuhashi, Masao; Mima, Tatsuya; Fukuyama, Hidenao

    2014-11-01

    Categorical perception of phonemes describes the phenomenon that, when phonemes are classified they are often perceived to fall into distinct categories even though physically they follow a continuum along a feature dimension. While consonants such as plosives have been proposed to be perceived categorically, the representation of vowels has been described to be more continuous. We aimed at testing this difference in representation at a behavioral and neurophysiological level using human magnetoencephalography (MEG). To this end, we designed stimuli based on natural speech by morphing along a phonological continuum entailing changes of the voiced stop-consonant or the steady-state vowel of a consonant-vowel (CV) syllable. Then, while recording MEG, we presented participants with consecutive pairs of either same or different CV syllables. The differences were such that either both CV syllables were from within the same category or belonged to different categories. During the MEG experiment, the participants actively discriminated the stimulus pairs. Behaviorally, we found that discrimination was easier for the between-compared to the within-category contrast for both consonants and vowels. However, this categorical effect was significantly stronger for the consonants compared to vowels, in line with a more continuous representation of vowels. At the neural level, we observed significant repetition suppression of MEG evoked fields, i.e. lower amplitudes for physically same compared to different stimulus pairs, at around 430 to 500ms after the onset of the second stimulus. Source reconstruction revealed generating sources of this repetition suppression effect within left superior temporal sulcus and gyrus, posterior to Heschl׳s gyrus. A region-of-interest analysis within this region showed a clear categorical effect for consonants, but not for vowels, providing further evidence for the important role of left superior temporal areas in categorical representation

  18. Mandarin speech perception in combined electric and acoustic stimulation.

    Directory of Open Access Journals (Sweden)

    Yongxin Li

    Full Text Available For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI and hearing aid (HA typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0 information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2 information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects' HA-aided pure-tone average (PTA thresholds between 250 and 2000 Hz; subjects were divided into two groups: "better" PTA (50 dB HL. The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12, further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception.

  19. Subclinical alexithymia modulates early audio-visual perceptive and attentional event-related potentials

    Directory of Open Access Journals (Sweden)

    Dyna eDelle-Vigne

    2014-03-01

    Full Text Available Introduction:Previous studies have highlighted the advantage of audio–visual oddball tasks (instead of unimodal ones in order to electrophysiologically index subclinical behavioral differences. Since alexithymia is highly prevalent in the general population, we investigated whether the use of various bimodal tasks could elicit emotional effects in low- versus high-alexithymic scorers. Methods:Fifty students (33 females were split into groups based on low and high scores on the Toronto Alexithymia Scale. During event-related potential recordings, they were exposed to three kinds of audio–visual oddball tasks: neutral (geometrical forms and bips, animal (dog and cock with their respective shouts, or emotional (faces and voices stimuli. In each condition, participants were asked to quickly detect deviant events occurring amongst a train of frequent matching stimuli (e.g., push a button when a sad face–voice pair appeared amongst a train of neutral face–voice pairs. P100, N100, and P300 components were analyzed: P100 refers to visual perceptive processing, N100 to auditory ones, and the P300 relates to response-related stages. Results:High-alexithymic scorers presented a particular pattern of results when processing the emotional stimulations, reflected in early ERP components by increased P100 and N100 amplitudes in the emotional oddball tasks (P100: pConclusions:Our findings suggest that high-alexithymic scorers require heightened early attentional resources when confronted with emotional stimuli.

  20. Putative mechanisms mediating tolerance for audiovisual stimulus onset asynchrony.

    Science.gov (United States)

    Bhat, Jyoti; Miller, Lee M; Pitt, Mark A; Shahin, Antoine J

    2015-03-01

    Audiovisual (AV) speech perception is robust to temporal asynchronies between visual and auditory stimuli. We investigated the neural mechanisms that facilitate tolerance for audiovisual stimulus onset asynchrony (AVOA) with EEG. Individuals were presented with AV words that were asynchronous in onsets of voice and mouth movement and judged whether they were synchronous or not. Behaviorally, individuals tolerated (perceived as synchronous) longer AVOAs when mouth movement preceded the speech (V-A) stimuli than when the speech preceded mouth movement (A-V). Neurophysiologically, the P1-N1-P2 auditory evoked potentials (AEPs), time-locked to sound onsets and known to arise in and surrounding the primary auditory cortex (PAC), were smaller for the in-sync than the out-of-sync percepts. Spectral power of oscillatory activity in the beta band (14-30 Hz) following the AEPs was larger during the in-sync than out-of-sync perception for both A-V and V-A conditions. However, alpha power (8-14 Hz), also following AEPs, was larger for the in-sync than out-of-sync percepts only in the V-A condition. These results demonstrate that AVOA tolerance is enhanced by inhibiting low-level auditory activity (e.g., AEPs representing generators in and surrounding PAC) that code for acoustic onsets. By reducing sensitivity to acoustic onsets, visual-to-auditory onset mapping is weakened, allowing for greater AVOA tolerance. In contrast, beta and alpha results suggest the involvement of higher-level neural processes that may code for language cues (phonetic, lexical), selective attention, and binding of AV percepts, allowing for wider neural windows of temporal integration, i.e., greater AVOA tolerance. Copyright © 2015 the American Physiological Society.

  1. Speech Perception in Complex Acoustic Environments: Developmental Effects

    Science.gov (United States)

    Leibold, Lori J.

    2017-01-01

    Purpose: The ability to hear and understand speech in complex acoustic environments follows a prolonged time course of development. The purpose of this article is to provide a general overview of the literature describing age effects in susceptibility to auditory masking in the context of speech recognition, including a summary of findings related…

  2. Influence of Telecommunication Modality, Internet Transmission Quality, and Accessories on Speech Perception in Cochlear Implant Users

    Science.gov (United States)

    Koller, Roger; Guignard, Jérémie; Caversaccio, Marco; Kompis, Martin; Senn, Pascal

    2017-01-01

    Background Telecommunication is limited or even impossible for more than one-thirds of all cochlear implant (CI) users. Objective We sought therefore to study the impact of voice quality on speech perception with voice over Internet protocol (VoIP) under real and adverse network conditions. Methods Telephone speech perception was assessed in 19 CI users (15-69 years, average 42 years), using the German HSM (Hochmair-Schulz-Moser) sentence test comparing Skype and conventional telephone (public switched telephone networks, PSTN) transmission using a personal computer (PC) and a digital enhanced cordless telecommunications (DECT) telephone dual device. Five different Internet transmission quality modes and four accessories (PC speakers, headphones, 3.5 mm jack audio cable, and induction loop) were compared. As a secondary outcome, the subjective perceived voice quality was assessed using the mean opinion score (MOS). Results Speech telephone perception was significantly better (median 91.6%, P 15%) were not superior to conventional telephony. In addition, there were no significant differences between the tested accessories (P>.05) using a PC. Coupling a Skype DECT phone device with an audio cable to the CI, however, resulted in higher speech perception (median 65%) and subjective MOS scores (3.2) than using PSTN (median 7.5%, P<.001). Conclusions Skype calls significantly improve speech perception for CI users compared with conventional telephony under real network conditions. Listening accessories do not further improve listening experience. Current Skype DECT telephone devices do not fully offer technical advantages in voice quality. PMID:28438727

  3. Incorporating ceiling effects during analysis of speech perception data from a paediatric cochlear implant cohort.

    Science.gov (United States)

    Bruijnzeel, Hanneke; Cattani, Guido; Stegeman, Inge; Topsakal, Vedat; Grolman, Wilko

    2017-08-01

    To compare speech perception between children with a different age at cochlear implantation. We evaluated speech perception by comparing consonant-vowel-consonant (auditory) (CVC(A)) scores at five-year follow-up of children implanted between 1997 and 2010. The proportion of children from each age-at-implantation group reaching the 95%CI of CVC(A) ceiling scores (>95%) was calculated to identify speech perception differences masked by ceiling effects. 54 children implanted between 8 and 36 months. Although ceiling effects occurred, a CVC(A) score difference between age-at-implantation groups was confirmed (H (4) = 30.36; p ceiling scores. Logistic regression confirmed that age at implantation predicted whether a child reached a ceiling score. Ceiling effects can mask thorough delineation of speech perception. However, this study showed long-term speech perception outperformance of early implanted children (ceiling effects during analysis. Development of long-term assessment tools not affected by ceiling effects is essential to maintain adequate assessment of young implanted infants.

  4. Mapping the Speech Code: Cortical Responses Linking the Perception and Production of Vowels.

    Science.gov (United States)

    Schuerman, William L; Meyer, Antje S; McQueen, James M

    2017-01-01

    The acoustic realization of speech is constrained by the physical mechanisms by which it is produced. Yet for speech perception, the degree to which listeners utilize experience derived from speech production has long been debated. In the present study, we examined how sensorimotor adaptation during production may affect perception, and how this relationship may be reflected in early vs. late electrophysiological responses. Participants first performed a baseline speech production task, followed by a vowel categorization task during which EEG responses were recorded. In a subsequent speech production task, half the participants received shifted auditory feedback, leading most to alter their articulations. This was followed by a second, post-training vowel categorization task. We compared changes in vowel production to both behavioral and electrophysiological changes in vowel perception. No differences in phonetic categorization were observed between groups receiving altered or unaltered feedback. However, exploratory analyses revealed correlations between vocal motor behavior and phonetic categorization. EEG analyses revealed correlations between vocal motor behavior and cortical responses in both early and late time windows. These results suggest that participants' recent production behavior influenced subsequent vowel perception. We suggest that the change in perception can be best characterized as a mapping of acoustics onto articulation.

  5. Cortical-evoked potentials reflect speech-in-noise perception in children.

    Science.gov (United States)

    Anderson, Samira; Chandrasekaran, Bharath; Yi, Han-Gyol; Kraus, Nina

    2010-10-01

    Children are known to be particularly vulnerable to the effects of noise on speech perception, and it is commonly acknowledged that failure of central auditory processes can lead to these difficulties with speech-in-noise (SIN) perception. However, little is known about the mechanistic relationship between central processes and the perception of SIN. Our aims were twofold: to examine the effects of noise on the central encoding of speech through measurement of cortical event-related potentials and to examine the relationship between cortical processing and behavioral indices of SIN perception. We recorded cortical responses to the speech syllable [da] in quiet and multi-talker babble noise in 32 children with a broad range of SIN perception. Outcomes suggest inordinate effects of noise on auditory function in the bottom SIN perceivers compared with the top perceivers. The cortical amplitudes in the top SIN group remained stable between conditions, whereas amplitudes increased significantly in the bottom SIN group, suggesting a developmental central processing impairment in the bottom perceivers that may contribute to difficulties in encoding and perceiving speech in challenging listening environments. © 2010 The Authors. European Journal of Neuroscience © 2010 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.

  6. The effect of short-term musical training on speech perception in noise

    Directory of Open Access Journals (Sweden)

    Chandni Jain

    2015-03-01

    Full Text Available The aim of the study was to assess the effect of short-term musical training on speech perception in noise. In the present study speech perception in noise was measured pre- and post- short-term musical training. The musical training involved auditory perceptual training for raga identification of two Carnatic ragas. The training was given for eight sessions. A total of 18 normal hearing adults in the age range of 18-25 years participated in the study wherein group 1 consisted of ten individuals who underwent musical training and group 2 consisted of eight individuals who did not undergo any training. Results revealed that post training, speech perception in noise improved significantly in group 1, whereas group 2 did not show any changes in speech perception scores. Thus, short-term musical training shows an enhancement of speech perception in the presence of noise. However, generalization and long-term maintenance of these benefits needs to be evaluated.

  7. Speech perception with the Vienna extra-cochlear single-channel implant: a comparison of two approaches to speech coding.

    Science.gov (United States)

    Rosen, S; Ball, V

    1986-02-01

    Although it is generally accepted that single-channel electrical stimulation can significantly improve a deafened patient's speech perceptual ability, there is still much controversy surrounding the choice of speech processing schemes. We have compared, in the same patients, two different approaches: (1) The speech pattern extraction technique of the EPI group, London (Fourcin et al., British Journal of Audiology, 1979,13,85-107) in which voice fundamental frequency is extracted and presented in an appropriate way, and (2) The analogue 'whole speech' approach of Hochmair and Hochmair-Desoyer (Annals of the New York Academy of Sciences, 1983, 405, 268-279) of Vienna, in which the microphone-sensed acoustic signal is frequency-equalized and amplitude-compressed before being presented to the electrode. With the 'whole-speech' coding scheme (which they used daily), all three patients showed an improvement in lipreading when they used the device. No patient was able to understand speech without lipreading. Reasonable ability to distinguish voicing contrasts and voice pitch contours was displayed. One patient was able to detect and make appropriate use of the presence of voiceless frication in certain situations. Little sensitivity to spectral features in natural speech was noted, although two patients could detect changes in the frequency of the first formant of synthesised vowels. Presentation of the fundamental frequency only generally led to improved perception of features associated with it (voicing and intonation). Only one patient consistently showed any advantage (and that not in all tests) of coding more than the fundamental alone.

  8. Speech misperception: speaking and seeing interfere differently with hearing.

    Directory of Open Access Journals (Sweden)

    Takemi Mochida

    Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.

  9. Speech misperception: speaking and seeing interfere differently with hearing.

    Science.gov (United States)

    Mochida, Takemi; Kimura, Toshitaka; Hiroya, Sadao; Kitagawa, Norimichi; Gomi, Hiroaki; Kondo, Tadahisa

    2013-01-01

    Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.

  10. Masked speech perception across the adult lifespan: Impact of age and hearing impairment.

    Science.gov (United States)

    Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid

    2017-02-01

    As people grow older, speech perception difficulties become highly prevalent, especially in noisy listening situations. Moreover, it is assumed that speech intelligibility is more affected in the event of background noises that induce a higher cognitive load, i.e., noises that result in informational versus energetic masking. There is ample evidence showing that speech perception problems in aging persons are partly due to hearing impairment and partly due to age-related declines in cognition and suprathreshold auditory processing. In order to develop effective rehabilitation strategies, it is indispensable to know how these different degrading factors act upon speech perception. This implies disentangling effects of hearing impairment versus age and examining the interplay between both factors in different background noises of everyday settings. To that end, we investigated open-set sentence identification in six participant groups: a young (20-30 years), middle-aged (50-60 years), and older cohort (70-80 years), each including persons who had normal audiometric thresholds up to at least 4 kHz, on the one hand, and persons who were diagnosed with elevated audiometric thresholds, on the other hand. All participants were screened for (mild) cognitive impairment. We applied stationary and amplitude modulated speech-weighted noise, which are two types of energetic maskers, and unintelligible speech, which causes informational masking in addition to energetic masking. By means of these different background noises, we could look into speech perception performance in listening situations with a low and high cognitive load, respectively. Our results indicate that, even when audiometric thresholds are within normal limits up to 4 kHz, irrespective of threshold elevations at higher frequencies, and there is no indication of even mild cognitive impairment, masked speech perception declines by middle age and decreases further on to older age. The impact of hearing

  11. Early Postimplant Speech Perception and Language Skills Predict Long-Term Language and Neurocognitive Outcomes Following Pediatric Cochlear Implantation.

    Science.gov (United States)

    Hunter, Cynthia R; Kronenberger, William G; Castellanos, Irina; Pisoni, David B

    2017-08-16

    We sought to determine whether speech perception and language skills measured early after cochlear implantation in children who are deaf, and early postimplant growth in speech perception and language skills, predict long-term speech perception, language, and neurocognitive outcomes. Thirty-six long-term users of cochlear implants, implanted at an average age of 3.4 years, completed measures of speech perception, language, and executive functioning an average of 14.4 years postimplantation. Speech perception and language skills measured in the 1st and 2nd years postimplantation and open-set word recognition measured in the 3rd and 4th years postimplantation were obtained from a research database in order to assess predictive relations with long-term outcomes. Speech perception and language skills at 6 and 18 months postimplantation were correlated with long-term outcomes for language, verbal working memory, and parent-reported executive functioning. Open-set word recognition was correlated with early speech perception and language skills and long-term speech perception and language outcomes. Hierarchical regressions showed that early speech perception and language skills at 6 months postimplantation and growth in these skills from 6 to 18 months both accounted for substantial variance in long-term outcomes for language and verbal working memory that was not explained by conventional demographic and hearing factors. Speech perception and language skills measured very early postimplantation, and early postimplant growth in speech perception and language, may be clinically relevant markers of long-term language and neurocognitive outcomes in users of cochlear implants. https://doi.org/10.23641/asha.5216200.

  12. [Interhemispheric brain asymmetry in speech perception by persons of different age groups and the "effect of directional attention"].

    Science.gov (United States)

    Morozov, V P; Dmitrieva, E S; Zaĭtseva, K A; Karmanova, V Iu; Sukhanova, N V

    1982-01-01

    Using dichotic tests, studies have been made on the degree of functional asymmetry of the brain with respect to perception of verbal information in 3-50 years old subjects. Significant age variations were found in the volume of the percepted speech information. Interhemispheral differences in speech perception, which reveal themselves in higher perception of the right ear as compared to the left one, are more evident in children under 10 years. Significant influence of the "effect of voluntary and involuntary attention" of the subjects was noted on the degree of asymmetry of dichotic speech perception, this influence alleviating the intrinsic functional asymmetry of the brain.

  13. Speech perception and talker segregation: Effects of level, pitch, and tactile support with multiple simultaneous talkers

    Science.gov (United States)

    Drullman, Rob; Bronkhorst, Adelbert W.

    2004-11-01

    Speech intelligibility was investigated by varying the number of interfering talkers, level, and mean pitch differences between target and interfering speech, and the presence of tactile support. In a first experiment the speech-reception threshold (SRT) for sentences was measured for a male talker against a background of one to eight interfering male talkers or speech noise. Speech was presented diotically and vibro-tactile support was given by presenting the low-pass-filtered signal (0-200 Hz) to the index finger. The benefit in the SRT resulting from tactile support ranged from 0 to 2.4 dB and was largest for one or two interfering talkers. A second experiment focused on masking effects of one interfering talker. The interference was the target talker's own voice with an increased mean pitch by 2, 4, 8, or 12 semitones. Level differences between target and interfering speech ranged from -16 to +4 dB. Results from measurements of correctly perceived words in sentences show an intelligibility increase of up to 27% due to tactile support. Performance gradually improves with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences. Differences in performance between noise and speech maskers and between speech maskers with various mean pitches are explained by the effect of informational masking. .

  14. Perception of degraded speech sounds differs in chinchilla and human listeners.

    Science.gov (United States)

    Shofner, William P

    2014-04-01

    The behavioral responses of chinchillas to noise-vocoded versions of naturally spoken speech sounds were measured using stimulus generalization and operant conditioning. Behavioral performance for speech generalization by chinchillas is compared to recognition by a group of human listeners for the identical speech sounds. The ability of chinchillas to generalize the vocoded versions as tokens of the natural speech sounds is far less than recognition by human listeners. In many cases, responses of chinchillas to noise-vocoded speech sounds were more similar to responses to band limited noise than to the responses to natural speech sounds. Chinchillas were also tested with a middle C musical note as played on a piano. Comparison of the responses of chinchillas for the middle C condition to the responses obtained for the speech conditions suggest that chinchillas may be more influenced by fundamental frequency than by formant structure. The differences between vocoded speech perception in chinchillas and human listeners may reflect differences in their abilities to resolve the formants along the cochlea. It is argued that lengthening of the cochlea during human evolution may have provided one of the auditory mechanisms that influenced the evolution of speech-specific mechanisms.

  15. Sources of Variability in Consonant Perception and Implications for Speech Perception Modeling

    DEFF Research Database (Denmark)

    Zaar, Johannes; Dau, Torsten

    2016-01-01

    to the considered sources of variability using a measure of the perceptual distance between responses. The largest effect was found across different CVs. For stimuli of the same phonetic identity, the speech­induced  variability  across  and  within talkers  and the  across­listener  variability were  substantial......The  present  study  investigated  the  influence  of  various  sources  of response  variability  in  consonant  perception.  A  distinction  was  made  between source­induced variability and receiver­related variability. The former refers to perceptual differences induced by differences in the speech...

  16. Speech across species: on the mechanistic fundamentals of vocal production and perception

    OpenAIRE

    Ohms, Verena Regina

    2011-01-01

    Birdsong and human speech are both complex behaviours which show striking similarities mainly thought to be present in the area of development and learning. The most important parameters in human speech are vocal tract resonances, called formants. Different formant patterns characterize different vowels and are produced by moving articulators such as tongue and lips. However, not much is known about the production and perception of vocal tract resonances by birds. In this thesis I show that b...

  17. Contributions of cerebellar event-based temporal processing and preparatory function to speech perception.

    Science.gov (United States)

    Schwartze, Michael; Kotz, Sonja A

    2016-10-01

    The role of the cerebellum in the anatomical and functional architecture of the brain is a matter of ongoing debate. We propose that cerebellar temporal processing contributes to speech perception on a number of accounts: temporally precise cerebellar encoding and rapid transmission of an event-based representation of the temporal structure of the speech signal serves to prepare areas in the cerebral cortex for the subsequent perceptual integration of sensory information. As speech dynamically evolves in time this fundamental preparatory function may extend its scope to the predictive allocation of attention in time and supports the fine-tuning of temporally specific models of the environment. In this framework, an oscillatory account considering a range of frequencies may best serve the linking of the temporal and speech processing systems. Lastly, the concerted action of these processes may not only advance predictive adaptation to basic auditory dynamics but optimize the perceptual integration of speech. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Tactile perception by the profoundly deaf. Speech and environmental sounds.

    Science.gov (United States)

    Plant, G L

    1982-11-01

    Four subjects fitted with single-channel vibrotactile aids and provided with training in their use took part in a testing programme aimed at assessing their aided and unaided lipreading performance, their ability to detect segmental and suprasegmental features of speech, and the discrimination of common environmental sounds. The results showed that the vibrotactile aid provided very useful information as to speech and non-speech stimuli with the subjects performing best on those tasks where time/intensity cues provided sufficient information to enable identification. The implications of the study are discussed and a comparison made with those results reported for subjects using cochlear implants.

  19. The cascading influence of multisensory processing on speech perception in autism.

    Science.gov (United States)

    Stevenson, Ryan A; Segers, Magali; Ncube, Busisiwe L; Black, Karen R; Bebko, James M; Ferber, Susanne; Barense, Morgan D

    2017-05-01

    It has been recently theorized that atypical sensory processing in autism relates to difficulties in social communication. Through a series of tasks concurrently assessing multisensory temporal processes, multisensory integration and speech perception in 76 children with and without autism, we provide the first behavioral evidence of such a link. Temporal processing abilities in children with autism contributed to impairments in speech perception. This relationship was significantly mediated by their abilities to integrate social information across auditory and visual modalities. These data describe the cascading impact of sensory abilities in autism, whereby temporal processing impacts multisensory information of social information, which, in turn, contributes to deficits in speech perception. These relationships were found to be specific to autism, specific to multisensory but not unisensory integration, and specific to the processing of social information.

  20. Audiovisual Review

    Science.gov (United States)

    Physiology Teacher, 1976

    1976-01-01

    Lists and reviews recent audiovisual materials in areas of medical, dental, nursing and allied health, and veterinary medicine; undergraduate, and high school studies. Each is classified as to level, type of instruction, usefulness, and source of availability. Topics include respiration, renal physiology, muscle mechanics, anatomy, evolution,…

  1. Reading fluency and speech perception speed of beginning readers with persistent reading problems: the perception of initial stop consonants and consonant clusters

    NARCIS (Netherlands)

    Snellings, P.; van der Leij, A.; Blok, H.; de Jong, P.F.

    2010-01-01

    This study investigated the role of speech perception accuracy and speed in fluent word decoding of reading disabled (RD) children. A same-different phoneme discrimination task with natural speech tested the perception of single consonants and consonant clusters by young but persistent RD children.

  2. Reading Fluency and Speech Perception Speed of Beginning Readers with Persistent Reading Problems: The Perception of Initial Stop Consonants and Consonant Clusters

    Science.gov (United States)

    Snellings, Patrick; van der Leij, Aryan; Blok, Henk; de Jong, Peter F.

    2010-01-01

    This study investigated the role of speech perception accuracy and speed in fluent word decoding of reading disabled (RD) children. A same-different phoneme discrimination task with natural speech tested the perception of single consonants and consonant clusters by young but persistent RD children. RD children were slower than chronological age…

  3. Relative performance of single-channel and multichannel tactile aids for speech perception.

    Science.gov (United States)

    Weisenberger, J M; Broadstone, S M; Kozma-Spytek, L

    1991-01-01

    Although the results from a number of studies of the performance of multichannel tactile aids for speech perception have suggested that such devices might provide more benefit to hearing-impaired persons than single-channel tactile aids (3,4), recent studies involving direct comparisons of multichannel and single-channel vibrotactile aids (5,6) indicated otherwise. In fact, for some types of speech information, such as rhythm and stress perception, single-channel aids were shown to be superior. The present study attempted to address this apparent discrepancy by comparing the performance of two single-channel devices with two multichannel devices in a variety of speech perception tasks including both single-item and connected speech stimuli. Results indicated that the two classes of tactile device performed similarly in rhythm and stress perception, but that the multichannel aids in many cases showed better performance for tasks in which the identification of fine-structure phoneme information was required (both single-item and connected speech). Results are discussed in terms of the possibility that the performance of a specific multichannel tactile aid cannot be considered indicative of all devices of the same class.

  4. The influence of non-native language proficiency on speech perception performance.

    Science.gov (United States)

    Kilman, Lisa; Zekveld, Adriana; Hällgren, Mathias; Rönnberg, Jerker

    2014-01-01

    The present study examined to what extent proficiency in a non-native language influences speech perception in noise. We explored how English proficiency affected native (Swedish) and non-native (English) speech perception in four speech reception threshold (SRT) conditions, including two energetic (stationary, fluctuating noise) and two informational (two-talker babble Swedish, two-talker babble English) maskers. Twenty-three normal-hearing native Swedish listeners participated, age between 28 and 64 years. The participants also performed standardized tests in English proficiency, non-verbal reasoning and working memory capacity. Our approach with focus on proficiency and the assessment of external as well as internal, listener-related factors allowed us to examine which variables explained intra- and interindividual differences in native and non-native speech perception performance. The main result was that in the non-native target, the level of English proficiency is a decisive factor for speech intelligibility in noise. High English proficiency improved performance in all four conditions when the target language was English. The informational maskers were interfering more with perception than energetic maskers, specifically in the non-native target. The study also confirmed that the SRT's were better when target language was native compared to non-native.

  5. Result on speech perception after conversion from Spectra® to Freedom®.

    Science.gov (United States)

    Magalhães, Ana Tereza de Matos; Goffi-Gomez, Maria Valéria Schmidt; Hoshino, Ana Cristina; Tsuji, Robinson Koji; Bento, Ricardo Ferreira; Brito, Rubens

    2012-04-01

    New technology in the Freedom® speech processor for cochlear implants was developed to improve how incoming acoustic sound is processed; this applies not only for new users, but also for previous generations of cochlear implants. To identify the contribution of this technology-- the Nucleus 22®--on speech perception tests in silence and in noise, and on audiometric thresholds. A cross-sectional cohort study was undertaken. Seventeen patients were selected. The last map based on the Spectra® was revised and optimized before starting the tests. Troubleshooting was used to identify malfunction. To identify the contribution of the Freedom® technology for the Nucleus22®, auditory thresholds and speech perception tests were performed in free field in sound-proof booths. Recorded monosyllables and sentences in silence and in noise (SNR = 0dB) were presented at 60 dBSPL. The nonparametric Wilcoxon test for paired data was used to compare groups. Freedom® applied for the Nucleus22® showed a statistically significant difference in all speech perception tests and audiometric thresholds. The Freedom® technology improved the performance of speech perception and audiometric thresholds of patients with Nucleus 22®.

  6. Speech perception using combinations of auditory, visual, and tactile information

    National Research Council Canada - National Science Library

    Blamey, P J; Cowan, R S; Alcantara, J I; Whitford, L A; Clark, G M

    1989-01-01

    Four normally-hearing subjects were trained and tested with all combinations of a highly-degraded auditory input, a visual input via lipreading, and a tactile input using a multichannel electrotactile speech processor...

  7. Predicting individual variation in language from infant speech perception measures

    NARCIS (Netherlands)

    Christia, A.; Seidl, A.; Junge, C.; Soderstrom, M.; Hagoort, P.

    2014-01-01

    There are increasing reports that individual variation in behavioral and neurophysiological measures of infant speech processing predicts later language outcomes, and specifically concurrent or subsequent vocabulary size. If such findings are held up under scrutiny, they could both illuminate

  8. Predicting Individual Variation in Language From Infant Speech Perception Measures

    NARCIS (Netherlands)

    Cristia, Alejandrina; Seidl, Amanda; Junge, Caroline; Soderstrom, Melanie; Hagoort, Peter

    2014-01-01

    There are increasing reports that individual variation in behavioral and neurophysiological measures of infant speech processing predicts later language outcomes, and specifically concurrent or subsequent vocabulary size. If such findings are held up under scrutiny, they could both illuminate

  9. Auditory processing and speech perception in children with specific language impairment: relations with oral language and literacy skills.

    Science.gov (United States)

    Vandewalle, Ellen; Boets, Bart; Ghesquière, Pol; Zink, Inge

    2012-01-01

    This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay (n = 8), (2) children with SLI and normal literacy (n = 10) and (3) typically developing children (n = 14). Moreover, the relations between these auditory processing and speech perception skills and oral language and literacy skills in grade 1 and grade 3 were analyzed. The SLI group with literacy delay scored significantly lower than both other groups on speech perception, but not on temporal auditory processing. Both normal reading groups did not differ in terms of speech perception or auditory processing. Speech perception was significantly related to reading and spelling in grades 1 and 3 and had a unique predictive contribution to reading growth in grade 3, even after controlling reading level, phonological ability, auditory processing and oral language skills in grade 1. These findings indicated that speech perception also had a unique direct impact upon reading development and not only through its relation with phonological awareness. Moreover, speech perception seemed to be more associated with the development of literacy skills and less with oral language ability. Copyright © 2011 Elsevier Ltd. All rights reserved.

  10. Speech-in-Noise Perception Deficit in Adults with Dyslexia: Effects of Background Type and Listening Configuration

    Science.gov (United States)

    Dole, Marjorie; Hoen, Michel; Meunier, Fanny

    2012-01-01

    Developmental dyslexia is associated with impaired speech-in-noise perception. The goal of the present research was to further characterize this deficit in dyslexic adults. In order to specify the mechanisms and processing strategies used by adults with dyslexia during speech-in-noise perception, we explored the influence of background type,…

  11. Auditory Processing and Speech Perception in Children with Specific Language Impairment: Relations with Oral Language and Literacy Skills

    Science.gov (United States)

    Vandewalle, Ellen; Boets, Bart; Ghesquiere, Pol; Zink, Inge

    2012-01-01

    This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay…

  12. Associations and Dissociations between Psychoacoustic Abilities and Speech Perception in Adolescents with Severe-to-Profound Hearing Loss

    Science.gov (United States)

    Kishon-Rabin, Liat; Segal, Osnat; Algom, Daniel

    2009-01-01

    Purpose: To clarify the relationship between psychoacoustic capabilities and speech perception in adolescents with severe-to-profound hearing loss (SPHL). Method: Twenty-four adolescents with SPHL and young adults with normal hearing were assessed with psychoacoustic and speech perception tests. The psychoacoustic tests included gap detection…

  13. Effects of English Cued Speech on Speech Perception, Phonological Awareness and Literacy: A Case Study of a 9-Year-Old Deaf Boy Using a Cochlear Implant

    Science.gov (United States)

    Rees, Rachel; Bladel, Judith

    2013-01-01

    Many studies have shown that French Cued Speech (CS) can enhance lipreading and the development of phonological awareness and literacy in deaf children but, as yet, there is little evidence that these findings can be generalized to English CS. This study investigated the possible effects of English CS on the speech perception, phonological…

  14. Effects of Musicality on the Perception of Rhythmic Structure in Speech

    Directory of Open Access Journals (Sweden)

    Natalie Boll-Avetisyan

    2017-04-01

    Full Text Available Language and music share many rhythmic properties, such as variations in intensity and duration leading to repeating patterns. Perception of rhythmic properties may rely on cognitive networks that are shared between the two domains. If so, then variability in speech rhythm perception may relate to individual differences in musicality. To examine this possibility, the present study focuses on rhythmic grouping, which is assumed to be guided by a domain-general principle, the Iambic/Trochaic law, stating that sounds alternating in intensity are grouped as strong-weak, and sounds alternating in duration are grouped as weak-strong. German listeners completed a grouping task: They heard streams of syllables alternating in intensity, duration, or neither, and had to indicate whether they perceived a strong-weak or weak-strong pattern. Moreover, their music perception abilities were measured, and they filled out a questionnaire reporting their productive musical experience. Results showed that better musical rhythm perception ability was associated with more consistent rhythmic grouping of speech, while melody perception ability and productive musical experience were not. This suggests shared cognitive procedures in the perception of rhythm in music and speech. Also, the results highlight the relevance of considering individual differences in musicality when aiming to explain variability in prosody perception.

  15. Accounting for rate-dependent category boundary shifts in speech perception.

    Science.gov (United States)

    Bosker, Hans Rutger

    2017-01-01

    The perception of temporal contrasts in speech is known to be influenced by the speech rate in the surrounding context. This rate-dependent perception is suggested to involve general auditory processes because it is also elicited by nonspeech contexts, such as pure tone sequences. Two general auditory mechanisms have been proposed to underlie rate-dependent perception: durational contrast and neural entrainment. This study compares the predictions of these two accounts of rate-dependent speech perception by means of four experiments, in which participants heard tone sequences followed by Dutch target words ambiguous between /ɑs/ "ash" and /a:s/ "bait". Tone sequences varied in the duration of tones (short vs. long) and in the presentation rate of the tones (fast vs. slow). Results show that the duration of preceding tones did not influence target perception in any of the experiments, thus challenging durational contrast as explanatory mechanism behind rate-dependent perception. Instead, the presentation rate consistently elicited a category boundary shift, with faster presentation rates inducing more /a:s/ responses, but only if the tone sequence was isochronous. Therefore, this study proposes an alternative, neurobiologically plausible account of rate-dependent perception involving neural entrainment of endogenous oscillations to the rate of a rhythmic stimulus.

  16. Optimizing the perception of soft speech and speech in noise with the Advanced Bionics cochlear implant system.

    Science.gov (United States)

    Holden, Laura K; Reeder, Ruth M; Firszt, Jill B; Finley, Charles C

    2011-04-01

    This study aimed to provide guidelines to optimize perception of soft speech and speech in noise for Advanced Bionics cochlear implant (CI) users. Three programs differing in T-levels were created for ten subjects. Using the T-level setting that provided the lowest FM-tone, sound-field threshold levels for each subject, three additional programs were created with input dynamic range (IDR) settings of 50, 65 and 80 dB. Subjects were postlinguistically deaf adults implanted with either the Clarion CII or 90K CI devices. Sound-field threshold levels were lowest with T-levels set higher than 10% of M-levels and with the two widest IDRs. Group data revealed significantly higher scores for CNC words presented at a soft level with an IDR of 80 dB and 65 dB compared to 50 dB. Although no significant group differences were seen between the three IDRs for sentences in noise, significant individual differences were present. Setting Ts higher than the manufacturer's recommendation of 10% of M-levels and providing IDR options can improve overall speech perception; however, for some users, higher Ts and wider IDRs may not be appropriate. Based on the results of the study, clinical programming recommendations are provided.

  17. Neurophysiological evidence that musical training influences the recruitment of right hemispheric homologues for speech perception.

    Science.gov (United States)

    Jantzen, McNeel G; Howe, Bradley M; Jantzen, Kelly J

    2014-01-01

    Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Musacchia et al., 2007; Parbery-Clark et al., 2009a; Zendel and Alain, 2009; Kraus and Chandrasekaran, 2010). Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus and superior temporal gyrus in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.

  18. Neurophysiological Evidence That Musical Training Influences the Recruitment of Right Hemispheric Homologues for Speech Perception

    Directory of Open Access Journals (Sweden)

    McNeel Gordon Jantzen

    2014-03-01

    Full Text Available Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Kraus & Chandrasekaran, 2010; Parbery-Clark, Skoe, & Kraus, 2009; Zendel & Alain, 2008; Musacchia, Sams, Skoe, & Kraus, 2007. Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus (MTG and superior temporal gyrus (STG in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain.

  19. Speech perception in noise with a harmonic complex excited vocoder.

    Science.gov (United States)

    Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Ihlefeld, Antje; Litovsky, Ruth Y

    2014-04-01

    A cochlear implant (CI) presents band-pass-filtered acoustic envelope information by modulating current pulse train levels. Similarly, a vocoder presents envelope information by modulating an acoustic carrier. By studying how normal hearing (NH) listeners are able to understand degraded speech signals with a vocoder, the parameters that best simulate electric hearing and factors that might contribute to the NH-CI performance difference may be better understood. A vocoder with harmonic complex carriers (fundamental frequency, f0 = 100 Hz) was used to study the effect of carrier phase dispersion on speech envelopes and intelligibility. The starting phases of the harmonic components were randomly dispersed to varying degrees prior to carrier filtering and modulation. NH listeners were tested on recognition of a closed set of vocoded words in background noise. Two sets of synthesis filters simulated different amounts of current spread in CIs. Results showed that the speech vocoded with carriers whose starting phases were maximally dispersed was the most intelligible. Superior speech understanding may have been a result of the flattening of the dispersed-phase carrier's intrinsic temporal envelopes produced by the large number of interacting components in the high-frequency channels. Cross-correlogram analyses of auditory nerve model simulations confirmed that randomly dispersing the carrier's component starting phases resulted in better neural envelope representation. However, neural metrics extracted from these analyses were not found to accurately predict speech recognition scores for all vocoded speech conditions. It is possible that central speech understanding mechanisms are insensitive to the envelope-fine structure dichotomy exploited by vocoders.

  20. Hemispheric asymmetries in speech perception: sense, nonsense and modulations.

    Directory of Open Access Journals (Sweden)

    Stuart Rosen

    Full Text Available The well-established left hemisphere specialisation for language processing has long been claimed to be based on a low-level auditory specialization for specific acoustic features in speech, particularly regarding 'rapid temporal processing'.A novel analysis/synthesis technique was used to construct a variety of sounds based on simple sentences which could be manipulated in spectro-temporal complexity, and whether they were intelligible or not. All sounds consisted of two noise-excited spectral prominences (based on the lower two formants in the original speech which could be static or varying in frequency and/or amplitude independently. Dynamically varying both acoustic features based on the same sentence led to intelligible speech but when either or both acoustic features were static, the stimuli were not intelligible. Using the frequency dynamics from one sentence with the amplitude dynamics of another led to unintelligible sounds of comparable spectro-temporal complexity to the intelligible ones. Positron emission tomography (PET was used to compare which brain regions were active when participants listened to the different sounds.Neural activity to spectral and amplitude modulations sufficient to support speech intelligibility (without actually being intelligible was seen bilaterally, with a right temporal lobe dominance. A left dominant response was seen only to intelligible sounds. It thus appears that the left hemisphere specialisation for speech is based on the linguistic properties of utterances, not on particular acoustic features.

  1. How the demographic make-up of our community influences speech perception

    OpenAIRE

    Lev-Ari, S.; Peperkamp, S.

    2016-01-01

    Speech perception is known to be influenced by listeners’ expectations of the speaker. This paper tests whether the demographic makeup of individuals’ communities can influence their perception of foreign sounds by influencing their expectations of the language. Using online experiments with participants from all across the U.S. and matched census data on the proportion of Spanish and other foreign language speakers in participants’ communities, this paper shows that the demo- graphic makeup ...

  2. The neural basis of non-native speech perception in bilingual children.

    Science.gov (United States)

    Archila-Suerte, Pilar; Zevin, Jason; Ramos, Aurora Isabel; Hernandez, Arturo E

    2013-02-15

    The goal of the present study is to reveal how the neural mechanisms underlying non-native speech perception change throughout childhood. In a pre-attentive listening fMRI task, English monolingual and Spanish-English bilingual children - divided into groups of younger (6-8yrs) and older children (9-10yrs) - were asked to watch a silent movie while several English syllable combinations played through a pair of headphones. Two additional groups of monolingual and bilingual adults were included in the analyses. Our results show that the neural mechanisms supporting speech perception throughout development differ in monolinguals and bilinguals. While monolinguals recruit perceptual areas (i.e., superior temporal gyrus) in early and late childhood to process native speech, bilinguals recruit perceptual areas (i.e., superior temporal gyrus) in early childhood and higher-order executive areas in late childhood (i.e., bilateral middle frontal gyrus and bilateral inferior parietal lobule, among others) to process non-native speech. The findings support the Perceptual Assimilation Model and the Speech Learning Model and suggest that the neural system processes phonological information differently depending on the stage of L2 speech learning. Published by Elsevier Inc.

  3. Emotional and analytic music perception in cochlear implant users after optimizing the speech processor.

    Science.gov (United States)

    Rosslau, Ken; Spreckelmeyer, Katja N; Saalfeld, Hilke; Westhofen, Martin

    2012-01-01

    Cochlear implant (CI) users are able to detect harmonic differences and the emotionally exciting effect of music (arousal) even when using a speech adapted program. Raising the power of lower frequencies of speech processors in CIs for a music program further improved this ability and enhanced subjectively perceived pleasure during listening to music. This pilot study compares aspects of analytical and emotional music perception before and after optimizing the speech processor compared to results of normal-hearing subjects. Six adult post-lingually deafened CI users and six subjects with normal hearing abilities were tested on different aspects of analytical and emotional music perception. After optimizing speech processors for a music program, the CI users were tested again after a period of 1 week. The CI users were able to detect different levels of emotional arousal conveyed by music. Switching to the music program resulted in an even better distinction between different levels of musical arousal. With both the speech and music programs, CI users gave overall higher ratings for arousal and valence of the heard music when asked to estimate how listeners with normal hearing perceived the music than when asked about their own perception.

  4. Tuning in and tuning out: Speech perception in native- and foreign-talker babble

    Science.gov (United States)

    van Heukelem, Kristin; Bradlow, Ann R.

    2005-09-01

    Studies on speech perception in multitalker babble have revealed asymmetries in the effects of noise on native versus foreign-accented speech intelligibility for native listeners [Rogers et al., Lang Speech 47(2), 139-154 (2004)] and on sentence-in-noise perception by native versus non-native listeners [Mayo et al., J. Speech Lang. Hear. Res., 40, 686-693 (1997)], suggesting that the linguistic backgrounds of talkers and listeners contribute to the effects of noise on speech perception. However, little attention has been paid to the language of the babble. This study tested whether the language of the noise also has asymmetrical effects on listeners. Replicating previous findings [e.g., Bronkhorst and Plomp, J. Acoust. Soc. Am., 92, 3132-3139 (1992)], the results showed poorer English sentence recognition by native English listeners in six-talker babble than in two-talker babble regardless of the language of the babble, demonstrating the effect of increased psychoacoustic/energetic masking. In addition, the results showed that in the two-talker babble condition, native English listeners were more adversely affected by English than Chinese babble. These findings demonstrate informational/cognitive masking on sentence-in-noise recognition in the form of linguistic competition. Whether this competition is at the lexical or sublexical level and whether it is modulated by the phonetic similarity between the target and noise languages remains to be determined.

  5. The effects of input-output configuration in syllabic compression on speech perception

    NARCIS (Netherlands)

    Maré, M. J.; Dreschler, W. A.; Verschuure, H.

    1992-01-01

    Speech perception was tested through a broad-band syllabic compressor with four different static input-output configurations. All other parameters of the compressor were held constant. The compressor was implemented digitally and incorporated a delay to reduce overshoot. We studied four different

  6. A Longitudinal Evaluation of the Speech Perception Capabilities of Children Using Multichannel Tactile Vocoders.

    Science.gov (United States)

    Eilers, Rebecca E.; And Others

    1996-01-01

    Thirty children with profound hearing impairments were followed over a three-year period with a semiannual battery of speech perception tests. Testing utilized multichannel tactile vocoders in variations of tactile and/or auditory/visual conditions. Performance in the tactile plus auditory condition generally exceeded that in other conditions,…

  7. Tactile-auditory speech perception by unimodally and bimodally trained normal-hearing subjects.

    Science.gov (United States)

    Alcántara, J I; Blamey, P J; Clark, G M

    1993-03-01

    The following study compared the effectiveness of unimodal and bimodal training strategies at improving the perception of speech information under a variety of conditions. Normal-hearing subjects were trained in the perception of vowel and consonant stimuli. Speech information was provided either via a multiple channel electrotactile speech processing aid (the Tickle Talker), and/or by a 200-Hz low-pass filtered auditory signal. Two subjects were trained only in the combined tactile-plus-auditory (TA) condition; the remaining two were trained in both the tactile-alone (T) and auditory-alone (A) conditions; however, only one condition was used at any single time. All subjects were evaluated in the TA, T, and A conditions, both at the beginning of the study, prior to training, and at the completion of training, on closed-set vowel and consonant confusion tests, and on an open-set word test. Results indicated that whilst statistically significant improvements occurred from one evaluation period to the next, in both groups of subjects, the improvements per condition were not dependent on the type of training received. The results provide a preliminary indication that the provision of unimodal training does not impair the perception of speech information under bimodal perception conditions.

  8. Longitudinal Study of Speech Perception by Children with Cochlear Implants and Tactile Aids: Progress Report.

    Science.gov (United States)

    Robbins, Amy McConkey; And Others

    1988-01-01

    The paper describes a current longitudinal study to examine the speech perception abilities of profoundly hearing-impaired children who use either a cochlear implant or tactile aid. Initial findings indicate large individual differences among subjects with implants and uniformly poor performance by the three subjects with tactile aids. (Author/DB)

  9. Children with Speech, Language and Communication Needs: Their Perceptions of Their Quality of Life

    Science.gov (United States)

    Markham, Chris; van Laar, Darren; Gibbard, Deborah; Dean, Taraneh

    2009-01-01

    Background: This study is part of a programme of research aiming to develop a quantitative measure of quality of life for children with communication needs. It builds on the preliminary findings of Markham and Dean (2006), which described some of the perception's parents and carers of children with speech language and communication needs had…

  10. The effect of speech recognition on working postures, productivity and the perception of user friendliness

    NARCIS (Netherlands)

    Korte, E.M. de; Lingen, P. van

    2006-01-01

    A comparative, experimental study with repeated measures has been conducted to evaluate the effect of the use of speech recognition on working postures, productivity and the perception of user friendliness. Fifteen subjects performed a standardised task, first with keyboard and mouse and, after a

  11. Speech Perception Results for Children Using Cochlear Implants Who Have Additional Special Needs

    Science.gov (United States)

    Dettman, Shani J.; Fiket, Hayley; Dowell, Richard C.; Charlton, Margaret; Williams, Sarah S.; Tomov, Alexandra M.; Barker, Elizabeth J.

    2004-01-01

    Speech perception outcomes in young children with cochlear implants are affected by a number of variables including the age of implantation, duration of implantation, mode of communication, and the presence of a developmental delay or additional disability. The aim of this study is to examine the association between degree of developmental delay…

  12. Decoding Speech Perception by Native and Non-Native Speakers Using Single-Trial Electrophysiological Data

    NARCIS (Netherlands)

    Brandmeyer, A.; Farquhar, J.D.R.; McQueen, J.M.; Desain, P.W.M.

    2013-01-01

    Brain-computer interfaces (BCIs) are systems that use real-time analysis of neuroimaging data to determine the mental state of their user for purposes such as providing neurofeedback. Here, we investigate the feasibility of a BCI based on speech perception. Multivariate pattern classifica