WorldWideScience

Sample records for audiovisual speech perception

  1. Ordinal models of audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2011-01-01

    Audiovisual information is integrated in speech perception. One manifestation of this is the McGurk illusion in which watching the articulating face alters the auditory phonetic percept. Understanding this phenomenon fully requires a computational model with predictive power. Here, we describe...

  2. Audiovisual integration in speech perception: a multi-stage process

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    investigate whether the integration of auditory and visual speech observed in these two audiovisual integration effects are specific traits of speech perception. We further ask whether audiovisual integration is undertaken in a single processing stage or multiple processing stages....

  3. Lip movements affect infants' audiovisual speech perception.

    Science.gov (United States)

    Yeung, H Henny; Werker, Janet F

    2013-05-01

    Speech is robustly audiovisual from early in infancy. Here we show that audiovisual speech perception in 4.5-month-old infants is influenced by sensorimotor information related to the lip movements they make while chewing or sucking. Experiment 1 consisted of a classic audiovisual matching procedure, in which two simultaneously displayed talking faces (visual [i] and [u]) were presented with a synchronous vowel sound (audio /i/ or /u/). Infants' looking patterns were selectively biased away from the audiovisual matching face when the infants were producing lip movements similar to those needed to produce the heard vowel. Infants' looking patterns returned to those of a baseline condition (no lip movements, looking longer at the audiovisual matching face) when they were producing lip movements that did not match the heard vowel. Experiment 2 confirmed that these sensorimotor effects interacted with the heard vowel, as looking patterns differed when infants produced these same lip movements while seeing and hearing a talking face producing an unrelated vowel (audio /a/). These findings suggest that the development of speech perception and speech production may be mutually informative.

  4. Prediction and constraint in audiovisual speech perception.

    Science.gov (United States)

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  5. Perception of Intersensory Synchrony in Audiovisual Speech: Not that Special

    Science.gov (United States)

    Vroomen, Jean; Stekelenburg, Jeroen J.

    2011-01-01

    Perception of intersensory temporal order is particularly difficult for (continuous) audiovisual speech, as perceivers may find it difficult to notice substantial timing differences between speech sounds and lip movements. Here we tested whether this occurs because audiovisual speech is strongly paired ("unity assumption"). Participants made…

  6. Talker Variability in Audiovisual Speech Perception

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-07-01

    Full Text Available A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition. So far, this talker-variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target-word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  7. Electrophysiological assessment of audiovisual integration in speech perception

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Dau, Torsten

    Speech perception integrates signal from ear and eye. This is witnessed by a wide range of audiovisual integration effects, such as ventriloquism and the McGurk illusion. Some behavioral evidence suggest that audiovisual integration of specific aspects is special for speech perception. However, our...... knowledge of such bimodal integration would be strengthened if the phenomena could be investigated by objective, neutrally based methods. One key question of the present work is if perceptual processing of audiovisual speech can be gauged with a specific signature of neurophysiological activity...... on the auditory speech percept? In two experiments, which both combine behavioral and neurophysiological measures, an uncovering of the relation between perception of faces and of audiovisual integration is attempted. Behavioral findings suggest a strong effect of face perception, whereas the MMN results are less...

  8. Specialization in audiovisual speech perception: a replication study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by bimodal integration in the McGurk effect. This integration effect may be specific to speech or be applied to all stimuli in general. To investigate this, Tuomainen et al. (2005) used sine-wave speech, which naïve observers may perceive as non......-speech, but hear as speech once informed of the linguistic origin of the signal. Combinations of sine-wave speech and incongruent video of the talker elicited a McGurk effect only for informed observers. This indicates that the audiovisual integration effect is specific to speech perception. However, observers...... that observers did look near the mouth. We conclude that eye-movements did not influence the results of Tuomainen et al. and that their results thus can be taken as evidence of a speech specific mode of audiovisual integration underlying the McGurk illusion....

  9. The role of visual spatial attention in audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias; Tiippana, K.; Laarni, J.;

    2009-01-01

    Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre......-attentive but recent reports have challenged this view. Here we study the effect of visual spatial attention on the McGurk effect. By presenting a movie of two faces symmetrically displaced to each side of a central fixation point and dubbed with a single auditory speech track, we were able to discern the influences...... integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration...

  10. Audiovisual Temporal Recalibration for Speech in Synchrony Perception and Speech Identification

    Science.gov (United States)

    Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato

    We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.

  11. The contribution of dynamic visual cues to audiovisual speech perception.

    Science.gov (United States)

    Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

    2015-08-01

    Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech.

  12. Crossmodal and incremental perception of audiovisual cues to emotional speech

    NARCIS (Netherlands)

    Barkhuysen, Pashiera; Krahmer, E.J.; Swerts, M.G.J.

    2010-01-01

    In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? B

  13. Crossmodal and Incremental Perception of Audiovisual Cues to Emotional Speech

    Science.gov (United States)

    Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

    2010-01-01

    In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: (1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests…

  14. Speech-specific audiovisual perception affects identification but not detection of speech

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...

  15. Talker variability in audio-visual speech perception.

    Science.gov (United States)

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred. PMID:25076919

  16. Audiovisual Speech Perception and Eye Gaze Behavior of Adults with Asperger Syndrome

    Science.gov (United States)

    Saalasti, Satu; Katsyri, Jari; Tiippana, Kaisa; Laine-Hernandez, Mari; von Wendt, Lennart; Sams, Mikko

    2012-01-01

    Audiovisual speech perception was studied in adults with Asperger syndrome (AS), by utilizing the McGurk effect, in which conflicting visual articulation alters the perception of heard speech. The AS group perceived the audiovisual stimuli differently from age, sex and IQ matched controls. When a voice saying /p/ was presented with a face…

  17. Gaze-direction-based MEG averaging during audiovisual speech perception

    Directory of Open Access Journals (Sweden)

    Lotta Hirvenkari

    2010-03-01

    Full Text Available To take a step towards real-life-like experimental setups, we simultaneously recorded magnetoencephalographic (MEG signals and subject’s gaze direction during audiovisual speech perception. The stimuli were utterances of /apa/ dubbed onto two side-by-side female faces articulating /apa/ (congruent and /aka/ (incongruent in synchrony, repeated once every 3 s. Subjects (N = 10 were free to decide which face they viewed, and responses were averaged to two categories according to the gaze direction. The right-hemisphere 100-ms response to the onset of the second vowel (N100m’ was a fifth smaller to incongruent than congruent stimuli. The results demonstrate the feasibility of realistic viewing conditions with gaze-based averaging of MEG signals.

  18. Audiovisual Speech Perception in Children with Developmental Language Disorder in Degraded Listening Conditions

    Science.gov (United States)

    Meronen, Auli; Tiippana, Kaisa; Westerholm, Jari; Ahonen, Timo

    2013-01-01

    Purpose: The effect of the signal-to-noise ratio (SNR) on the perception of audiovisual speech in children with and without developmental language disorder (DLD) was investigated by varying the noise level and the sound intensity of acoustic speech. The main hypotheses were that the McGurk effect (in which incongruent visual speech alters the…

  19. The influence of task on gaze during audiovisual speech perception

    Science.gov (United States)

    Buchan, Julie; Paré, Martin; Yurick, Micheal; Munhall, Kevin

    2001-05-01

    In natural conversation, visual and auditory information about speech not only provide linguistic information but also provide information about the identity and the emotional state of the speaker. Thus, listeners must process a wide range of information in parallel to understand the full meaning in a message. In this series of studies, we examined how different types of visual information conveyed by a speaker's face are processed by measuring the gaze patterns exhibited by subjects watching audiovisual recordings of spoken sentences. In three experiments, subjects were asked to judge the emotion and the identity of the speaker, and to report the words that they heard under different auditory conditions. As in previous studies, eye and mouth regions dominated the distribution of the gaze fixations. It was hypothesized that the eyes would attract more fixations for more social judgment tasks, rather than tasks which rely more on verbal comprehension. Our results support this hypothesis. In addition, the location of gaze on the face did not influence the accuracy of the perception of speech in noise.

  20. Effect of attentional load on audiovisual speech perception: evidence from ERPs.

    Science.gov (United States)

    Alsius, Agnès; Möttönen, Riikka; Sams, Mikko E; Soto-Faraco, Salvador; Tiippana, Kaisa

    2014-01-01

    Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs) generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual, and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e., a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.

  1. Audio-Visual Speech Perception: A Developmental ERP Investigation

    Science.gov (United States)

    Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…

  2. Language/Culture Modulates Brain and Gaze Processes in Audiovisual Speech Perception

    Science.gov (United States)

    Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki

    2016-01-01

    Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs’ response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs’ early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception. PMID:27734953

  3. Audio-visual speech perception: a developmental ERP investigation

    OpenAIRE

    Knowland, V.; Mercure, E.; Karmiloff-Smith, A; Dick, F; Thomas, M.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language learning. We therefore explored this at the neural level. The event-related potential (ERP) technique has been used to assess the mechanisms of audio-vi...

  4. Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

    2016-01-01

    Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…

  5. Timing in audiovisual speech perception: A mini review and new psychophysical data.

    Science.gov (United States)

    Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory

    2016-02-01

    Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.

  6. Development of an audiovisual speech perception app for children with autism spectrum disorders.

    Science.gov (United States)

    Irwin, Julia; Preston, Jonathan; Brancazio, Lawrence; D'angelo, Michael; Turcios, Jacqueline

    2015-01-01

    Perception of spoken language requires attention to acoustic as well as visible phonetic information. This article reviews the known differences in audiovisual speech perception in children with autism spectrum disorders (ASD) and specifies the need for interventions that address this construct. Elements of an audiovisual training program are described. This researcher-developed program delivered via an iPad app presents natural speech in the context of increasing noise, but supported with a speaking face. Children are cued to attend to visible articulatory information to assist in perception of the spoken words. Data from four children with ASD ages 8-10 are presented showing that the children improved their performance on an untrained auditory speech-in-noise task.

  7. High visual resolution matters in audiovisual speech perception, but only for some.

    Science.gov (United States)

    Alsius, Agnès; Wayne, Rachel V; Paré, Martin; Munhall, Kevin G

    2016-07-01

    The basis for individual differences in the degree to which visual speech input enhances comprehension of acoustically degraded speech is largely unknown. Previous research indicates that fine facial detail is not critical for visual enhancement when auditory information is available; however, these studies did not examine individual differences in ability to make use of fine facial detail in relation to audiovisual speech perception ability. Here, we compare participants based on their ability to benefit from visual speech information in the presence of an auditory signal degraded with noise, modulating the resolution of the visual signal through low-pass spatial frequency filtering and monitoring gaze behavior. Participants who benefited most from the addition of visual information (high visual gain) were more adversely affected by the removal of high spatial frequency information, compared to participants with low visual gain, for materials with both poor and rich contextual cues (i.e., words and sentences, respectively). Differences as a function of gaze behavior between participants with the highest and lowest visual gains were observed only for words, with participants with the highest visual gain fixating longer on the mouth region. Our results indicate that the individual variance in audiovisual speech in noise performance can be accounted for, in part, by better use of fine facial detail information extracted from the visual signal and increased fixation on mouth regions for short stimuli. Thus, for some, audiovisual speech perception may suffer when the visual input (in addition to the auditory signal) is less than perfect.

  8. Self-organizing maps for measuring similarity of audiovisual speech percepts

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich

    The goal of this work is to find a way to measure similarity of audiovisual speech percepts. Phoneme-related self-organizing maps (SOM) with a rectangular basis are trained with data material from a (labeled) video film. For the training, a combination of auditory speech features and corresponding....... Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled...... sentences in German with a balanced phoneme repertoire. As a result it can be stated that (i) the SOM can be trained to map auditory and visual features in a topology-preserving way and (ii) they show strain due to the influence of other audio-visual units. The SOM can be used to measure similarity amongst...

  9. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.

  10. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

    Directory of Open Access Journals (Sweden)

    Magnus eAlm

    2015-07-01

    Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.

  11. The early maximum likelihood estimation model of audiovisual integration in speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2015-01-01

    Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk−MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely...... focused on the fuzzy logical model of perception (FLMP), which provides excellent fits to experimental observations but also has been criticized for being too flexible, post hoc and difficult to interpret. The current study introduces the early maximum likelihood estimation (MLE) model of audiovisual......-validation can evaluate models of audiovisual integration based on typical data sets taking both goodness-of-fit and model flexibility into account. All models were tested on a published data set previously used for testing the FLMP. Cross-validation favored the early MLE while more conventional error measures...

  12. Visual and Auditory Components in the Perception of Asynchronous Audiovisual Speech.

    Science.gov (United States)

    García-Pérez, Miguel A; Alcalá-Quintana, Rocío

    2015-12-01

    Research on asynchronous audiovisual speech perception manipulates experimental conditions to observe their effects on synchrony judgments. Probabilistic models establish a link between the sensory and decisional processes underlying such judgments and the observed data, via interpretable parameters that allow testing hypotheses and making inferences about how experimental manipulations affect such processes. Two models of this type have recently been proposed, one based on independent channels and the other using a Bayesian approach. Both models are fitted here to a common data set, with a subsequent analysis of the interpretation they provide about how experimental manipulations affected the processes underlying perceived synchrony. The data consist of synchrony judgments as a function of audiovisual offset in a speech stimulus, under four within-subjects manipulations of the quality of the visual component. The Bayesian model could not accommodate asymmetric data, was rejected by goodness-of-fit statistics for 8/16 observers, and was found to be nonidentifiable, which renders uninterpretable parameter estimates. The independent-channels model captured asymmetric data, was rejected for only 1/16 observers, and identified how sensory and decisional processes mediating asynchronous audiovisual speech perception are affected by manipulations that only alter the quality of the visual component of the speech signal.

  13. Perception of the multisensory coherence of fluent audiovisual speech in infancy: its emergence and the role of experience.

    Science.gov (United States)

    Lewkowicz, David J; Minar, Nicholas J; Tift, Amy H; Brandon, Melissa

    2015-02-01

    To investigate the developmental emergence of the perception of the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8- to 10-, and 12- to 14-month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor 8- to 10-month-old infants exhibited audiovisual matching in that they did not look longer at the matching monologue. In contrast, the 12- to 14-month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, perceived the multisensory coherence of native-language monologues earlier in the test trials than that of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12- to 14-month-olds did not depend on audiovisual synchrony, whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audiovisual synchrony cues are more important in the perception of the multisensory coherence of non-native speech than that of native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing.

  14. The effect of combined sensory and semantic components on audio-visual speech perception in older adults

    Directory of Open Access Journals (Sweden)

    Corrina eMaguinness

    2011-12-01

    Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.

  15. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study

    Science.gov (United States)

    Kumar, G. Vinodh; Halder, Tamesh; Jaiswal, Amit K.; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300–600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus

  16. Audiovisual speech perception at various presentation levels in Mandarin-speaking adults with cochlear implants.

    Directory of Open Access Journals (Sweden)

    Shu-Yu Liu

    Full Text Available (1 To evaluate the recognition of words, phonemes and lexical tones in audiovisual (AV and auditory-only (AO modes in Mandarin-speaking adults with cochlear implants (CIs; (2 to understand the effect of presentation levels on AV speech perception; (3 to learn the effect of hearing experience on AV speech perception.Thirteen deaf adults (age = 29.1±13.5 years; 8 male, 5 female who had used CIs for >6 months and 10 normal-hearing (NH adults participated in this study. Seven of them were prelingually deaf, and 6 postlingually deaf. The Mandarin Monosyllablic Word Recognition Test was used to assess recognition of words, phonemes and lexical tones in AV and AO conditions at 3 presentation levels: speech detection threshold (SDT, speech recognition threshold (SRT and 10 dB SL (re:SRT.The prelingual group had better phoneme recognition in the AV mode than in the AO mode at SDT and SRT (both p = 0.016, and so did the NH group at SDT (p = 0.004. Mode difference was not noted in the postlingual group. None of the groups had significantly different tone recognition in the 2 modes. The prelingual and postlingual groups had significantly better phoneme and tone recognition than the NH one at SDT in the AO mode (p = 0.016 and p = 0.002 for phonemes; p = 0.001 and p<0.001 for tones but were outperformed by the NH group at 10 dB SL (re:SRT in both modes (both p<0.001 for phonemes; p<0.001 and p = 0.002 for tones. The recognition scores had a significant correlation with group with age and sex controlled (p<0.001.Visual input may help prelingually deaf implantees to recognize phonemes but may not augment Mandarin tone recognition. The effect of presentation level seems minimal on CI users' AV perception. This indicates special considerations in developing audiological assessment protocols and rehabilitation strategies for implantees who speak tonal languages.

  17. Bimodal bilingualism as multisensory training?: Evidence for improved audiovisual speech perception after sign language exposure.

    Science.gov (United States)

    Williams, Joshua T; Darcy, Isabelle; Newman, Sharlene D

    2016-02-15

    The aim of the present study was to characterize effects of learning a sign language on the processing of a spoken language. Specifically, audiovisual phoneme comprehension was assessed before and after 13 weeks of sign language exposure. L2 ASL learners performed this task in the fMRI scanner. Results indicated that L2 American Sign Language (ASL) learners' behavioral classification of the speech sounds improved with time compared to hearing nonsigners. Results indicated increased activation in the supramarginal gyrus (SMG) after sign language exposure, which suggests concomitant increased phonological processing of speech. A multiple regression analysis indicated that learner's rating on co-sign speech use and lipreading ability was correlated with SMG activation. This pattern of results indicates that the increased use of mouthing and possibly lipreading during sign language acquisition may concurrently improve audiovisual speech processing in budding hearing bimodal bilinguals. PMID:26740404

  18. Brief Report: Arrested Development of Audiovisual Speech Perception in Autism Spectrum Disorders

    Science.gov (United States)

    Stevenson, Ryan A.; Siemann, Justin K.; Woynaroski, Tiffany G.; Schneider, Brittany C.; Eberly, Haley E.; Camarata, Stephen M.; Wallace, Mark T.

    2014-01-01

    Atypical communicative abilities are a core marker of Autism Spectrum Disorders (ASD). A number of studies have shown that, in addition to auditory comprehension differences, individuals with autism frequently show atypical responses to audiovisual speech, suggesting a multisensory contribution to these communicative differences from their…

  19. Speech-specificity of two audiovisual integration effects

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2010-01-01

    Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....

  20. Audiovisual Matching in Speech and Nonspeech Sounds: A Neurodynamical Model

    Science.gov (United States)

    Loh, Marco; Schmid, Gabriele; Deco, Gustavo; Ziegler, Wolfram

    2010-01-01

    Audiovisual speech perception provides an opportunity to investigate the mechanisms underlying multimodal processing. By using nonspeech stimuli, it is possible to investigate the degree to which audiovisual processing is specific to the speech domain. It has been shown in a match-to-sample design that matching across modalities is more difficult…

  1. Multistage audiovisual integration of speech: dissociating identification and detection

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    signal. Here we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers......Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech...

  2. Audio-visual perception of compressed speech by profoundly hearing-impaired subjects.

    Science.gov (United States)

    Drullman, R; Smoorenburg, G F

    1997-01-01

    For many people with profound hearing loss conventional hearing aids give only little support in speechreading. This study aims at optimizing the presentation of speech signals in the severely reduced dynamic range of the profoundly hearing impaired by means of multichannel compression and multichannel amplification. The speech signal in each of six 1-octave channels (125-4000 Hz) was compressed instantaneously, using compression ratios of 1, 2, 3, or 5, and a compression threshold of 35 dB below peak level. A total of eight conditions were composed in which the compression ratio varied per channel. Sentences were presented audio-visually to 16 profoundly hearing-impaired subjects and syllable intelligibility was measured. Results show that all auditory signals are valuable supplements to speechreading. No clear overall preference is found for any of the compression conditions, but relatively high compression ratios (> 3-5) have a significantly detrimental effect. Inspection of the individual results reveals that compression may be beneficial for one subject.

  3. The development of sensorimotor influences in the audiovisual speech domain: some critical questions.

    Science.gov (United States)

    Guellaï, Bahia; Streri, Arlette; Yeung, H Henny

    2014-01-01

    Speech researchers have long been interested in how auditory and visual speech signals are integrated, and the recent work has revived interest in the role of speech production with respect to this process. Here, we discuss these issues from a developmental perspective. Because speech perception abilities typically outstrip speech production abilities in infancy and childhood, it is unclear how speech-like movements could influence audiovisual speech perception in development. While work on this question is still in its preliminary stages, there is nevertheless increasing evidence that sensorimotor processes (defined here as any motor or proprioceptive process related to orofacial movements) affect developmental audiovisual speech processing. We suggest three areas on which to focus in future research: (i) the relation between audiovisual speech perception and sensorimotor processes at birth, (ii) the pathways through which sensorimotor processes interact with audiovisual speech processing in infancy, and (iii) developmental change in sensorimotor pathways as speech production emerges in childhood.

  4. The development of sensorimotor influences in the audiovisual speech domain: Some critical questions

    Directory of Open Access Journals (Sweden)

    Bahia eGuellaï

    2014-08-01

    Full Text Available Speech researchers have long been interested in how auditory and visual speech signals are integrated, and recent work has revived interest in the role of speech production with respect to this process. Here we discuss these issues from a developmental perspective. Because speech perception abilities typically outstrip speech production abilities in infancy and childhood, it is unclear how speech-like movements could influence audiovisual speech perception in development. While work on this question is still in its preliminary stages, there is nevertheless increasing evidence that sensorimotor processes (defined here as any motor or proprioceptive process related to orofacial movements affect developmental audiovisual speech processing. We suggest three areas on which to focus in future research: i the relation between audiovisual speech perception and sensorimotor processes at birth, ii the pathways through which sensorimotor processes interact with audiovisual speech processing in infancy, and iii developmental change in sensorimotor pathways as speech production emerges in childhood.

  5. Audiovisual integration of speech in a patient with Broca's Aphasia.

    Science.gov (United States)

    Andersen, Tobias S; Starrfelt, Randi

    2015-01-01

    Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca's aphasia.

  6. Contributions of Oral and Extraoral Facial Movement to Visual and Audiovisual Speech Perception

    Science.gov (United States)

    Thomas, Sharon M.; Jordan, Timothy R.

    2005-01-01

    Seeing a talker's face influences auditory speech recognition, but the visible input essential for this influence has yet to be established. Using a new seamless editing technique, the authors examined effects of restricting visible movement to oral or extraoral areas of a talking face. In Experiment 1, visual speech identification and visual…

  7. Audiovisual Asynchrony Detection in Human Speech

    Science.gov (United States)

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  8. Audiovisual Perception of Noise Vocoded Speech in Dyslexic and Non-Dyslexic Adults: The Role of Low-Frequency Visual Modulations

    Science.gov (United States)

    Megnin-Viggars, Odette; Goswami, Usha

    2013-01-01

    Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…

  9. Neural correlates of audiovisual speech processing in a second language.

    Science.gov (United States)

    Barrós-Loscertales, Alfonso; Ventura-Campos, Noelia; Visser, Maya; Alsius, Agnès; Pallier, Christophe; Avila Rivera, César; Soto-Faraco, Salvador

    2013-09-01

    Neuroimaging studies of audiovisual speech processing have exclusively addressed listeners' native language (L1). Yet, several behavioural studies now show that AV processing plays an important role in non-native (L2) speech perception. The current fMRI study measured brain activity during auditory, visual, audiovisual congruent and audiovisual incongruent utterances in L1 and L2. BOLD responses to congruent AV speech in the pSTS were stronger than in either unimodal condition in both L1 and L2. Yet no differences in AV processing were expressed according to the language background in this area. Instead, the regions in the bilateral occipital lobe had a stronger congruency effect on the BOLD response (congruent higher than incongruent) in L2 as compared to L1. According to these results, language background differences are predominantly expressed in these unimodal regions, whereas the pSTS is similarly involved in AV integration regardless of language dominance.

  10. Infants' preference for native audiovisual speech dissociated from congruency preference.

    Directory of Open Access Journals (Sweden)

    Kathleen Shaw

    Full Text Available Although infant speech perception in often studied in isolated modalities, infants' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces. Across two experiments, we tested infants' sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English and non-native (Spanish language. In Experiment 1, infants' looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.

  11. Basic to Applied Research: The Benefits of Audio-Visual Speech Perception Research in Teaching Foreign Languages

    Science.gov (United States)

    Erdener, Dogu

    2016-01-01

    Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…

  12. Open your eyes and listen carefully. Auditory and audiovisual speech perception and the McGurk effect in aphasia

    NARCIS (Netherlands)

    Klitsch, Julia Ulrike

    2008-01-01

    This dissertation investigates speech perception in three different groups of native adult speakers of Dutch; an aphasic and two age-varying control groups. By means of two different experiments it is examined if the availability of visual articulatory information is beneficial to the auditory speec

  13. Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

    Science.gov (United States)

    D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

    2016-08-01

    Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. PMID:27498221

  14. Audiovisual integration for speech during mid-childhood: electrophysiological evidence.

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer

    2014-12-01

    Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7-8-year-olds and 10-11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception.

  15. Rapid, generalized adaptation to asynchronous audiovisual speech.

    Science.gov (United States)

    Van der Burg, Erik; Goodbourn, Patrick T

    2015-04-01

    The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity.

  16. Audiovisual Speech Integration and Lipreading in Autism

    Science.gov (United States)

    Smith, Elizabeth G.; Bennetto, Loisa

    2007-01-01

    Background: During speech perception, the ability to integrate auditory and visual information causes speech to sound louder and be more intelligible, and leads to quicker processing. This integration is important in early language development, and also continues to affect speech comprehension throughout the lifespan. Previous research shows that…

  17. Impact of language on functional connectivity for audiovisual speech integration

    Science.gov (United States)

    Shinozaki, Jun; Hiroe, Nobuo; Sato, Masa-aki; Nagamine, Takashi; Sekiyama, Kaoru

    2016-01-01

    Visual information about lip and facial movements plays a role in audiovisual (AV) speech perception. Although this has been widely confirmed, previous behavioural studies have shown interlanguage differences, that is, native Japanese speakers do not integrate auditory and visual speech as closely as native English speakers. To elucidate the neural basis of such interlanguage differences, 22 native English speakers and 24 native Japanese speakers were examined in behavioural or functional Magnetic Resonance Imaging (fMRI) experiments while mono-syllabic speech was presented under AV, auditory-only, or visual-only conditions for speech identification. Behavioural results indicated that the English speakers identified visual speech more quickly than the Japanese speakers, and that the temporal facilitation effect of congruent visual speech was significant in the English speakers but not in the Japanese speakers. Using fMRI data, we examined the functional connectivity among brain regions important for auditory-visual interplay. The results indicated that the English speakers had significantly stronger connectivity between the visual motion area MT and the Heschl’s gyrus compared with the Japanese speakers, which may subserve lower-level visual influences on speech perception in English speakers in a multisensory environment. These results suggested that linguistic experience strongly affects neural connectivity involved in AV speech integration. PMID:27510407

  18. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    Directory of Open Access Journals (Sweden)

    Tobias Søren Andersen

    2015-04-01

    Full Text Available Lesions to Broca’s area cause aphasia characterised by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca’s area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca’s area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca’s aphasia did not experience the McGurk illusion suggesting that an intact Broca’s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca’s aphasia who experienced the McGurk illusion. This indicates that an intact Broca’s area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca’s area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke’s aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca’s aphasia.

  19. An audiovisual database of English speech sounds

    Science.gov (United States)

    Frisch, Stefan A.; Nikjeh, Dee Adams

    2003-10-01

    A preliminary audiovisual database of English speech sounds has been developed for teaching purposes. This database contains all Standard English speech sounds produced in isolated words in word initial, word medial, and word final position, unless not allowed by English phonotactics. There is one example of each word spoken by a male and a female talker. The database consists of an audio recording, video of the face from a 45 deg angle off of center, and ultrasound video of the tongue in the mid-saggital plane. The files contained in the database are suitable for examination by the Wavesurfer freeware program in audio or video modes [Sjolander and Beskow, KTH Stockholm]. This database is intended as a multimedia reference for students in phonetics or speech science. A demonstration and plans for further development will be presented.

  20. Neural dynamics of audiovisual speech integration under variable listening conditions: an individual participant analysis.

    Science.gov (United States)

    Altieri, Nicholas; Wenger, Michael J

    2013-01-01

    Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend and Nozawa, 1995), a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of -12 dB, and S/N ratio of -18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude compared to the unisensory signals for lower auditory S/N ratios (higher capacity/efficiency) compared to the high S/N ratio (low capacity/inefficient integration). The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity.

  1. Preference for Audiovisual Speech Congruency in Superior Temporal Cortex.

    Science.gov (United States)

    Lüttke, Claudia S; Ekman, Matthias; van Gerven, Marcel A J; de Lange, Floris P

    2016-01-01

    Auditory speech perception can be altered by concurrent visual information. The superior temporal cortex is an important combining site for this integration process. This area was previously found to be sensitive to audiovisual congruency. However, the direction of this congruency effect (i.e., stronger or weaker activity for congruent compared to incongruent stimulation) has been more equivocal. Here, we used fMRI to look at the neural responses of human participants during the McGurk illusion--in which auditory /aba/ and visual /aga/ inputs are fused to perceived /ada/--in a large homogenous sample of participants who consistently experienced this illusion. This enabled us to compare the neuronal responses during congruent audiovisual stimulation with incongruent audiovisual stimulation leading to the McGurk illusion while avoiding the possible confounding factor of sensory surprise that can occur when McGurk stimuli are only occasionally perceived. We found larger activity for congruent audiovisual stimuli than for incongruent (McGurk) stimuli in bilateral superior temporal cortex, extending into the primary auditory cortex. This finding suggests that superior temporal cortex prefers when auditory and visual input support the same representation.

  2. The temporal binding window for audiovisual speech: Children are like little adults.

    Science.gov (United States)

    Hillock-Dunn, Andrea; Grantham, D Wesley; Wallace, Mark T

    2016-07-29

    During a typical communication exchange, both auditory and visual cues contribute to speech comprehension. The influence of vision on speech perception can be measured behaviorally using a task where incongruent auditory and visual speech stimuli are paired to induce perception of a novel token reflective of multisensory integration (i.e., the McGurk effect). This effect is temporally constrained in adults, with illusion perception decreasing as the temporal offset between the auditory and visual stimuli increases. Here, we used the McGurk effect to investigate the development of the temporal characteristics of audiovisual speech binding in 7-24 year-olds. Surprisingly, results indicated that although older participants perceived the McGurk illusion more frequently, no age-dependent change in the temporal boundaries of audiovisual speech binding was observed. PMID:26920938

  3. Physical and perceptual factors shape the neural mechanisms that integrate audiovisual signals in speech comprehension.

    Science.gov (United States)

    Lee, HweeLing; Noppeney, Uta

    2011-08-01

    Face-to-face communication challenges the human brain to integrate information from auditory and visual senses with linguistic representations. Yet the role of bottom-up physical (spectrotemporal structure) input and top-down linguistic constraints in shaping the neural mechanisms specialized for integrating audiovisual speech signals are currently unknown. Participants were presented with speech and sinewave speech analogs in visual, auditory, and audiovisual modalities. Before the fMRI study, they were trained to perceive physically identical sinewave speech analogs as speech (SWS-S) or nonspeech (SWS-N). Comparing audiovisual integration (interactions) of speech, SWS-S, and SWS-N revealed a posterior-anterior processing gradient within the left superior temporal sulcus/gyrus (STS/STG): Bilateral posterior STS/STG integrated audiovisual inputs regardless of spectrotemporal structure or speech percept; in left mid-STS, the integration profile was primarily determined by the spectrotemporal structure of the signals; more anterior STS regions discarded spectrotemporal structure and integrated audiovisual signals constrained by stimulus intelligibility and the availability of linguistic representations. In addition to this "ventral" processing stream, a "dorsal" circuitry encompassing posterior STS/STG and left inferior frontal gyrus differentially integrated audiovisual speech and SWS signals. Indeed, dynamic causal modeling and Bayesian model comparison provided strong evidence for a parallel processing structure encompassing a ventral and a dorsal stream with speech intelligibility training enhancing the connectivity between posterior and anterior STS/STG. In conclusion, audiovisual speech comprehension emerges in an interactive process with the integration of auditory and visual signals being progressively constrained by stimulus intelligibility along the STS and spectrotemporal structure in a dorsal fronto-temporal circuitry.

  4. Electrophysiological evidence for speech-specific audiovisual integration.

    Science.gov (United States)

    Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

    2014-01-01

    Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode.

  5. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music

    OpenAIRE

    Lee, Hweeling; Noppeney, Uta

    2014-01-01

    This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech, or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogs of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms). Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. ...

  6. Cross-Modal Interactions during Perception of Audiovisual Speech and Nonspeech Signals: An fMRI Study

    Science.gov (United States)

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2011-01-01

    During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream--prior to its fusion with auditory phonological features [Hertrich,…

  7. Multisensory Speech Perception in Children with Autism Spectrum Disorders

    Science.gov (United States)

    Woynaroski, Tiffany G.; Kwakye, Leslie D.; Foss-Feig, Jennifer H.; Stevenson, Ryan A.; Stone, Wendy L.; Wallace, Mark T.

    2013-01-01

    This study examined unisensory and multisensory speech perception in 8-17 year old children with autism spectrum disorders (ASD) and typically developing controls matched on chronological age, sex, and IQ. Consonant-vowel syllables were presented in visual only, auditory only, matched audiovisual, and mismatched audiovisual ("McGurk")…

  8. Atypical audiovisual speech integration in infants at risk for autism.

    Directory of Open Access Journals (Sweden)

    Jeanne A Guiraud

    Full Text Available The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/- audio/ba/and the congruent visual/ba/- audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/- audio/ga/display compared with the congruent visual/ga/- audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16 = 17.153, p = 0.001. The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25 = 0.09, p = 0.767, in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41 = 4.466, p = 0.041. In some cases this reduced ability might lead to the poor communication skills characteristic of autism.

  9. Neural Development of Networks for Audiovisual Speech Comprehension

    Science.gov (United States)

    Dick, Anthony Steven; Solodkin, Ana; Small, Steven L.

    2010-01-01

    Everyday conversation is both an auditory and a visual phenomenon. While visual speech information enhances comprehension for the listener, evidence suggests that the ability to benefit from this information improves with development. A number of brain regions have been implicated in audiovisual speech comprehension, but the extent to which the…

  10. Early and late beta-band power reflect audiovisual perception in the McGurk illusion.

    Science.gov (United States)

    Roa Romero, Yadira; Senkowski, Daniel; Keil, Julian

    2015-04-01

    The McGurk illusion is a prominent example of audiovisual speech perception and the influence that visual stimuli can have on auditory perception. In this illusion, a visual speech stimulus influences the perception of an incongruent auditory stimulus, resulting in a fused novel percept. In this high-density electroencephalography (EEG) study, we were interested in the neural signatures of the subjective percept of the McGurk illusion as a phenomenon of speech-specific multisensory integration. Therefore, we examined the role of cortical oscillations and event-related responses in the perception of congruent and incongruent audiovisual speech. We compared the cortical activity elicited by objectively congruent syllables with incongruent audiovisual stimuli. Importantly, the latter elicited a subjectively congruent percept: the McGurk illusion. We found that early event-related responses (N1) to audiovisual stimuli were reduced during the perception of the McGurk illusion compared with congruent stimuli. Most interestingly, our study showed a stronger poststimulus suppression of beta-band power (13-30 Hz) at short (0-500 ms) and long (500-800 ms) latencies during the perception of the McGurk illusion compared with congruent stimuli. Our study demonstrates that auditory perception is influenced by visual context and that the subsequent formation of a McGurk illusion requires stronger audiovisual integration even at early processing stages. Our results provide evidence that beta-band suppression at early stages reflects stronger stimulus processing in the McGurk illusion. Moreover, stronger late beta-band suppression in McGurk illusion indicates the resolution of incongruent physical audiovisual input and the formation of a coherent, illusory multisensory percept.

  11. Speech perception as categorization

    OpenAIRE

    Holt, Lori L.; Lotto, Andrew J.

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words. This is an example of categorization, in that potentially discriminable speech sounds are assigned to functionally equivalent classes. In this tutorial, we present some of the main challenges to our understanding of the categorization of speech sounds and the conceptualization of SP that has...

  12. A measure for assessing the effects of audiovisual speech integration.

    Science.gov (United States)

    Altieri, Nicholas; Townsend, James T; Wenger, Michael J

    2014-06-01

    We propose a measure of audiovisual speech integration that takes into account accuracy and response times. This measure should prove beneficial for researchers investigating multisensory speech recognition, since it relates to normal-hearing and aging populations. As an example, age-related sensory decline influences both the rate at which one processes information and the ability to utilize cues from different sensory modalities. Our function assesses integration when both auditory and visual information are available, by comparing performance on these audiovisual trials with theoretical predictions for performance under the assumptions of parallel, independent self-terminating processing of single-modality inputs. We provide example data from an audiovisual identification experiment and discuss applications for measuring audiovisual integration skills across the life span.

  13. On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

    Directory of Open Access Journals (Sweden)

    Wesley Mattheyses

    2009-01-01

    Full Text Available Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.

  14. Hearing impairment and audiovisual speech integration ability: a case study report.

    Science.gov (United States)

    Altieri, Nicholas; Hudock, Daniel

    2014-01-01

    Research in audiovisual speech perception has demonstrated that sensory factors such as auditory and visual acuity are associated with a listener's ability to extract and combine auditory and visual speech cues. This case study report examined audiovisual integration using a newly developed measure of capacity in a sample of hearing-impaired listeners. Capacity assessments are unique because they examine the contribution of reaction-time (RT) as well as accuracy to determine the extent to which a listener efficiently combines auditory and visual speech cues relative to independent race model predictions. Multisensory speech integration ability was examined in two experiments: an open-set sentence recognition and a closed set speeded-word recognition study that measured capacity. Most germane to our approach, capacity illustrated speed-accuracy tradeoffs that may be predicted by audiometric configuration. Results revealed that some listeners benefit from increased accuracy, but fail to benefit in terms of speed on audiovisual relative to unisensory trials. Conversely, other listeners may not benefit in the accuracy domain but instead show an audiovisual processing time benefit.

  15. Artimate: an articulatory animation framework for audiovisual speech synthesis

    OpenAIRE

    Steiner, Ingmar; Ouni, Slim

    2012-01-01

    International audience We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a three-dimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide realistic animation of the tongue and teeth for a virtual character. The...

  16. Processing of Audiovisually Congruent and Incongruent Speech in School-Age Children with a History of Specific Language Impairment: A Behavioral and Event-Related Potentials Study

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer; Macias, Danielle; Gustafson, Dana

    2015-01-01

    Previous studies indicate that at least some aspects of audiovisual speech perception are impaired in children with specific language impairment (SLI). However, whether audiovisual processing difficulties are also present in older children with a history of this disorder is unknown. By combining electrophysiological and behavioral measures, we…

  17. No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag.

    Directory of Open Access Journals (Sweden)

    Jean-Luc Schwartz

    2014-07-01

    Full Text Available An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.

  18. No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag.

    Science.gov (United States)

    Schwartz, Jean-Luc; Savariaux, Christophe

    2014-07-01

    An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na) showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.

  19. On the Role of Crossmodal Prediction in Audiovisual Emotion Perception

    Directory of Open Access Journals (Sweden)

    Sarah eJessen

    2013-07-01

    Full Text Available Humans rely on multiple sensory modalities to determine the emotional state of others. In fact, such multisensory perception may be one of the mechanisms explaining the ease and efficiency by which others’ emotions are recognized. But how and when exactly do the different modalities interact? One aspect in multisensory perception that has received increasing interest in recent years is the concept of crossmodal prediction. In emotion perception, as in most other settings, visual information precedes the auditory one. Thereby, leading in visual information can facilitate subsequent auditory processing. While this mechanism has often been described in audiovisual speech perception, it has not been addressed so far in audiovisual emotion perception. Based on the current state of the art in (a crossmodal prediction and (b multisensory emotion perception research, we propose that it is essential to consider the former in order to fully understand the latter. Focusing on electroencephalographic (EEG and magnetoencephalographic (MEG studies, we provide a brief overview of the current research in both fields. In discussing these findings, we suggest that emotional visual information may allow for a more reliable prediction of auditory information compared to non-emotional visual information. In support of this hypothesis, we present a re-analysis of a previous data set that shows an inverse correlation between the N1 response in the EEG and the duration of visual emotional but not non-emotional information. If the assumption that emotional content allows for more reliable predictions can be corroborated in future studies, crossmodal prediction is a crucial factor in our understanding of multisensory emotion perception.

  20. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music.

    Science.gov (United States)

    Lee, Hweeling; Noppeney, Uta

    2014-01-01

    This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech, or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogs of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms). Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past 3 years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.

  1. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

    Directory of Open Access Journals (Sweden)

    Hwee Ling eLee

    2014-08-01

    Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.

  2. Semantic Framing of Speech : Emotional and Topical Cues in Perception of Poorly Specified Speech

    OpenAIRE

    Lidestam, Björn

    2003-01-01

    The general aim of this thesis was to test the effects of paralinguistic (emotional) and prior contextual (topical) cues on perception of poorly specified visual, auditory, and audiovisual speech. The specific purposes were to (1) examine if facially displayed emotions can facilitate speechreading performance; (2) to study the mechanism for such facilitation; (3) to map information-processing factors that are involved in processing of poorly specified speech; and (4) to present a comprehensiv...

  3. Audio-visual speech cue combination.

    Directory of Open Access Journals (Sweden)

    Derek H Arnold

    Full Text Available BACKGROUND: Different sources of sensory information can interact, often shaping what we think we have seen or heard. This can enhance the precision of perceptual decisions relative to those made on the basis of a single source of information. From a computational perspective, there are multiple reasons why this might happen, and each predicts a different degree of enhanced precision. Relatively slight improvements can arise when perceptual decisions are made on the basis of multiple independent sensory estimates, as opposed to just one. These improvements can arise as a consequence of probability summation. Greater improvements can occur if two initially independent estimates are summated to form a single integrated code, especially if the summation is weighted in accordance with the variance associated with each independent estimate. This form of combination is often described as a Bayesian maximum likelihood estimate. Still greater improvements are possible if the two sources of information are encoded via a common physiological process. PRINCIPAL FINDINGS: Here we show that the provision of simultaneous audio and visual speech cues can result in substantial sensitivity improvements, relative to single sensory modality based decisions. The magnitude of the improvements is greater than can be predicted on the basis of either a Bayesian maximum likelihood estimate or a probability summation. CONCLUSION: Our data suggest that primary estimates of speech content are determined by a physiological process that takes input from both visual and auditory processing, resulting in greater sensitivity than would be possible if initially independent audio and visual estimates were formed and then subsequently combined.

  4. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    DEFF Research Database (Denmark)

    Andersen, Tobias; Starrfelt, Randi

    2015-01-01

    's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical......, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing...

  5. Artimate: an articulatory animation framework for audiovisual speech synthesis

    CERN Document Server

    Steiner, Ingmar

    2012-01-01

    We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a three-dimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide realistic animation of the tongue and teeth for a virtual character. The framework also provides an interface to articulatory animation synthesis, as well as an example application to illustrate its use with a 3D game engine. We rely on cross-platform, open-source software and open standards to provide a lightweight, accessible, and portable workflow.

  6. Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

    Directory of Open Access Journals (Sweden)

    Warrick Roseboom

    Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.

  7. Intermodal timing relations and audio-visual speech recognition by normal-hearing adults.

    Science.gov (United States)

    McGrath, M; Summerfield, Q

    1985-02-01

    Audio-visual identification of sentences was measured as a function of audio delay in untrained observers with normal hearing; the soundtrack was replaced by rectangular pulses originally synchronized to the closing of the talker's vocal folds and then subjected to delay. When the soundtrack was delayed by 160 ms, identification scores were no better than when no acoustical information at all was provided. Delays of up to 80 ms had little effect on group-mean performance, but a separate analysis of a subgroup of better lipreaders showed a significant trend of reduced scores with increased delay in the range from 0-80 ms. A second experiment tested the interpretation that, although the main disruptive effect of the delay occurred on a syllabic time scale, better lipreaders might be attempting to use intermodal timing cues at a phonemic level. Normal-hearing observers determined whether a 120-Hz complex tone started before or after the opening of a pair of liplike Lissajou figures. Group-mean difference limens (70.7% correct DLs) were - 79 ms (sound leading) and + 138 ms (sound lagging), with no significant correlation between DLs and sentence lipreading scores. It was concluded that most observers, whether good lipreaders or not, possess insufficient sensitivity to intermodal timing cues in audio-visual speech for them to be used analogously to voice onset time in auditory speech perception. The results of both experiments imply that delays of up to about 40 ms introduced by signal-processing algorithms in aids to lipreading should not materially affect audio-visual speech understanding.

  8. An audio-visual corpus for multimodal speech recognition in Dutch language

    NARCIS (Netherlands)

    Wojdel, J.; Wiggers, P.; Rothkrantz, L.J.M.

    2002-01-01

    This paper describes the gathering and availability of an audio-visual speech corpus for Dutch language. The corpus was prepared with the multi-modal speech recognition in mind and it is currently used in our research on lip-reading and bimodal speech recognition. It contains the prompts used also i

  9. Brain responses to audiovisual speech mismatch in infants are associated with individual differences in looking behaviour.

    Science.gov (United States)

    Kushnerenko, Elena; Tomalski, Przemyslaw; Ballieux, Haiko; Ribeiro, Helena; Potton, Anita; Axelsson, Emma L; Murphy, Elizabeth; Moore, Derek G

    2013-11-01

    Research on audiovisual speech integration has reported high levels of individual variability, especially among young infants. In the present study we tested the hypothesis that this variability results from individual differences in the maturation of audiovisual speech processing during infancy. A developmental shift in selective attention to audiovisual speech has been demonstrated between 6 and 9 months with an increase in the time spent looking to articulating mouths as compared to eyes (Lewkowicz & Hansen-Tift. (2012) Proc. Natl Acad. Sci. USA, 109, 1431-1436; Tomalski et al. (2012) Eur. J. Dev. Psychol., 1-14). In the present study we tested whether these changes in behavioural maturational level are associated with differences in brain responses to audiovisual speech across this age range. We measured high-density event-related potentials (ERPs) in response to videos of audiovisually matching and mismatched syllables /ba/ and /ga/, and subsequently examined visual scanning of the same stimuli with eye-tracking. There were no clear age-specific changes in ERPs, but the amplitude of audiovisual mismatch response (AVMMR) to the combination of visual /ba/ and auditory /ga/ was strongly negatively associated with looking time to the mouth in the same condition. These results have significant implications for our understanding of individual differences in neural signatures of audiovisual speech processing in infants, suggesting that they are not strictly related to chronological age but instead associated with the maturation of looking behaviour, and develop at individual rates in the second half of the first year of life.

  10. APPLICATION OF PARTIAL LEAST SQUARES REGRESSION FOR AUDIO-VISUAL SPEECH PROCESSING AND MODELING

    Directory of Open Access Journals (Sweden)

    A. L. Oleinik

    2015-09-01

    Full Text Available Subject of Research. The paper deals with the problem of lip region image reconstruction from speech signal by means of Partial Least Squares regression. Such problems arise in connection with development of audio-visual speech processing methods. Audio-visual speech consists of acoustic and visual components (called modalities. Applications of audio-visual speech processing methods include joint modeling of voice and lips’ movement dynamics, synchronization of audio and video streams, emotion recognition, liveness detection. Method. Partial Least Squares regression was applied to solve the posed problem. This method extracts components of initial data with high covariance. These components are used to build regression model. Advantage of this approach lies in the possibility of achieving two goals: identification of latent interrelations between initial data components (e.g. speech signal and lip region image and approximation of initial data component as a function of another one. Main Results. Experimental research on reconstruction of lip region images from speech signal was carried out on VidTIMIT audio-visual speech database. Results of the experiment showed that Partial Least Squares regression is capable of solving reconstruction problem. Practical Significance. Obtained findings give the possibility to assert that Partial Least Squares regression is successfully applicable for solution of vast variety of audio-visual speech processing problems: from synchronization of audio and video streams to liveness detection.

  11. Developmental Trajectory of Audiovisual Speech Integration in Early Infancy. A Review of Studies Using the McGurk Paradigm

    Directory of Open Access Journals (Sweden)

    Tomalski Przemysław

    2015-10-01

    Full Text Available Apart from their remarkable phonological skills young infants prior to their first birthday show ability to match the mouth articulation they see with the speech sounds they hear. They are able to detect the audiovisual conflict of speech and to selectively attend to articulating mouth depending on audiovisual congruency. Early audiovisual speech processing is an important aspect of language development, related not only to phonological knowledge, but also to language production during subsequent years. Th is article reviews recent experimental work delineating the complex developmental trajectory of audiovisual mismatch detection. Th e central issue is the role of age-related changes in visual scanning of audiovisual speech and the corresponding changes in neural signatures of audiovisual speech processing in the second half of the first year of life. Th is phenomenon is discussed in the context of recent theories of perceptual development and existing data on the neural organisation of the infant ‘social brain’.

  12. Effects of Audio-Visual Information on the Intelligibility of Alaryngeal Speech

    Science.gov (United States)

    Evitts, Paul M.; Portugal, Lindsay; Van Dine, Ami; Holler, Aline

    2010-01-01

    Background: There is minimal research on the contribution of visual information on speech intelligibility for individuals with a laryngectomy (IWL). Aims: The purpose of this project was to determine the effects of mode of presentation (audio-only, audio-visual) on alaryngeal speech intelligibility. Method: Twenty-three naive listeners were…

  13. Audio-visual integration of speech with time-varying sine wave speech replicas

    Science.gov (United States)

    Tuomainen, Jyrki; Andersen, Tobias; Tiippana, Kaisa; Sams, Mikko

    2002-11-01

    We tested whether listener's knowledge about the nature of the auditory stimuli had an effect on audio-visual (AV) integration of speech. First, subjects were taught to categorize two sine-wave (sw) replicas of the real speech tokens /omso/ and /onso/ into two arbitrary nonspeech categories without knowledge of the speech-like nature of the sounds. A test with congruent and incongruent AV-stimulus condition (together with auditory-only presentations of the sw stimuli) demonstrated no AV integration, but instead close to perfect categorization of stimuli in the two arbitrary categories according to the auditory presentation channel. Then, the same subjects (of which most were still under the impression that the sw-stimuli were nonspeech sounds) were taught to categorize the sw stimuli as /omso/ and /onso/, and again tested with the same AV stimuli as used in the nonspeech sw condition. This time, subjects showed highly reliable AV integration similar to integration obtained with real speech stimuli in a separate test. We suggest that AV integration only occurs when subject are in a so-called ''speech mode.''

  14. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

    Science.gov (United States)

    Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

    2015-01-01

    Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.

  15. How can audiovisual pathways enhance the temporal resolution of time-compressed speech in blind subjects?

    Science.gov (United States)

    Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann

    2013-01-01

    In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1) the auditory system, (2) supramodal representations, and (3) frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal signal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for "reading" texts at ultra-fast speaking rates (>16 syllables/s), exceeding by far the normal range of 6 syllables/s. A functional magnetic resonance imaging study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the "bottleneck" for understanding time-compressed speech seems related to higher demands for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments on cross-modal adjustments during space, time, and object recognition. PMID

  16. How can audiovisual pathways enhance the temporal resolution of time-compressed speech in blind subjects?

    Directory of Open Access Journals (Sweden)

    Ingo eHertrich

    2013-08-01

    Full Text Available In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1 the auditory system, (2 supramodal representations, and (3 frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for "reading" texts at ultra-fast speaking rates (> 16 syllables/s, exceeding by far the normal range of 6 syllables/s. An fMRI study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic (MEG measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the "bottleneck" for understanding time-compressed speech seems related to a demand for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments considering cross-modal adjustments in space, time, and object recognition.

  17. Twice upon a time: multiple concurrent temporal recalibrations of audiovisual speech.

    Science.gov (United States)

    Roseboom, Warrick; Arnold, Derek H

    2011-07-01

    Audiovisual timing perception can recalibrate following prolonged exposure to asynchronous auditory and visual inputs. It has been suggested that this might contribute to achieving perceptual synchrony for auditory and visual signals despite differences in physical and neural signal times for sight and sound. However, given that people can be concurrently exposed to multiple audiovisual stimuli with variable neural signal times, a mechanism that recalibrates all audiovisual timing percepts to a single timing relationship could be dysfunctional. In the experiments reported here, we showed that audiovisual temporal recalibration can be specific for particular audiovisual pairings. Participants were shown alternating movies of male and female actors containing positive and negative temporal asynchronies between the auditory and visual streams. We found that audiovisual synchrony estimates for each actor were shifted toward the preceding audiovisual timing relationship for that actor and that such temporal recalibrations occurred in positive and negative directions concurrently. Our results show that humans can form multiple concurrent estimates of appropriate timing for audiovisual synchrony.

  18. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    DEFF Research Database (Denmark)

    Andersen, Tobias; Starrfelt, Randi

    2015-01-01

    perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca......Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech...

  19. Design and realisation of an audiovisual speech activity detector

    NARCIS (Netherlands)

    Van Bree, K.C.

    2006-01-01

    For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will givefalse positives when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach

  20. Neural bases of accented speech perception

    Directory of Open Access Journals (Sweden)

    Patti eAdank

    2015-10-01

    Full Text Available The recognition of unfamiliar regional and foreign accents represents a challenging task for the speech perception system (Adank, Evans, Stuart-Smith, & Scott, 2009; Floccia, Goslin, Girard, & Konopczynski, 2006. Despite the frequency with which we encounter such accents, the neural mechanisms supporting successful perception of accented speech are poorly understood. Nonetheless, candidate neural substrates involved in processing speech in challenging listening conditions, including accented speech, are beginning to be identified. This review will outline neural bases associated with perception of accented speech in the light of current models of speech perception, and compare these data to brain areas associated with processing other speech distortions. We will subsequently evaluate competing models of speech processing with regards to neural processing of accented speech. See Cristia et al. (2012 for an in-depth overview of behavioural aspects of accent processing.

  1. Face configuration affects speech perception: Evidence from a McGurk mismatch negativity study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; MacDonald, Ewen; Andersen, Tobias

    2015-01-01

    as demonstrated by the Thatcher illusion in which the orientation of the eyes and mouth with respect to the face is inverted (Thatcherization). This gives the face a grotesque appearance but this is only seen when the face is upright. Thatcherization can likewise disrupt visual speech perception but only when...... the face is upright indicating that facial configuration can be important for visual speech perception. This effect can propagate to auditory speech perception through audiovisual integration so that Thatcherization disrupts the McGurk illusion in which visual speech perception alters perception...... of an incongruent acoustic phoneme. This is known as the McThatcher effect. Here we show that the McThatcher effect is reflected in the McGurk mismatch negativity (MMN). The MMN is an event-related potential elicited by a change in auditory perception. The McGurk-MMN can be elicited by a change in auditory...

  2. Infant Perception of Atypical Speech Signals

    Science.gov (United States)

    Vouloumanos, Athena; Gelfand, Hanna M.

    2013-01-01

    The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…

  3. Audiovisual Perception of Congruent and Incongruent Dutch Front Vowels

    Science.gov (United States)

    Valkenier, Bea; Duyne, Jurriaan Y.; Andringa, Tjeerd C.; Baskent, Deniz

    2012-01-01

    Purpose: Auditory perception of vowels in background noise is enhanced when combined with visually perceived speech features. The objective of this study was to investigate whether the influence of visual cues on vowel perception extends to incongruent vowels, in a manner similar to the McGurk effect observed with consonants. Method:…

  4. The level of audiovisual print-speech integration deficits in dyslexia.

    Science.gov (United States)

    Kronschnabel, Jens; Brem, Silvia; Maurer, Urs; Brandeis, Daniel

    2014-09-01

    The classical phonological deficit account of dyslexia is increasingly linked to impairments in grapho-phonological conversion, and to dysfunctions in superior temporal regions associated with audiovisual integration. The present study investigates mechanisms of audiovisual integration in typical and impaired readers at the critical developmental stage of adolescence. Congruent and incongruent audiovisual as well as unimodal (visual only and auditory only) material was presented. Audiovisual presentations were single letters and three-letter (consonant-vowel-consonant) stimuli accompanied by matching or mismatching speech sounds. Three-letter stimuli exhibited fast phonetic transitions as in real-life language processing and reading. Congruency effects, i.e. different brain responses to congruent and incongruent stimuli were taken as an indicator of audiovisual integration at a phonetic level (grapho-phonological conversion). Comparisons of unimodal and audiovisual stimuli revealed basic, more sensory aspects of audiovisual integration. By means of these two criteria of audiovisual integration, the generalizability of audiovisual deficits in dyslexia was tested. Moreover, it was expected that the more naturalistic three-letter stimuli are superior to single letters in revealing group differences. Electrophysiological and hemodynamic (EEG and fMRI) data were acquired simultaneously in a simple target detection task. Applying the same statistical models to event-related EEG potentials and fMRI responses allowed comparing the effects detected by the two techniques at a descriptive level. Group differences in congruency effects (congruent against incongruent) were observed in regions involved in grapho-phonological processing, including the left inferior frontal and angular gyri and the inferotemporal cortex. Importantly, such differences also emerged in superior temporal key regions. Three-letter stimuli revealed stronger group differences than single letters. No

  5. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

    Science.gov (United States)

    Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

    2016-06-17

    The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context.

  6. Sensorimotor influences on speech perception in infancy.

    Science.gov (United States)

    Bruderer, Alison G; Danielson, D Kyle; Kandhadai, Padmapriya; Werker, Janet F

    2015-11-01

    The influence of speech production on speech perception is well established in adults. However, because adults have a long history of both perceiving and producing speech, the extent to which the perception-production linkage is due to experience is unknown. We addressed this issue by asking whether articulatory configurations can influence infants' speech perception performance. To eliminate influences from specific linguistic experience, we studied preverbal, 6-mo-old infants and tested the discrimination of a nonnative, and hence never-before-experienced, speech sound distinction. In three experimental studies, we used teething toys to control the position and movement of the tongue tip while the infants listened to the speech sounds. Using ultrasound imaging technology, we verified that the teething toys consistently and effectively constrained the movement and positioning of infants' tongues. With a looking-time procedure, we found that temporarily restraining infants' articulators impeded their discrimination of a nonnative consonant contrast but only when the relevant articulator was selectively restrained to prevent the movements associated with producing those sounds. Our results provide striking evidence that even before infants speak their first words and without specific listening experience, sensorimotor information from the articulators influences speech perception. These results transform theories of speech perception by suggesting that even at the initial stages of development, oral-motor movements influence speech sound discrimination. Moreover, an experimentally induced "impairment" in articulator movement can compromise speech perception performance, raising the question of whether long-term oral-motor impairments may impact perceptual development.

  7. STUDY ON PHASE PERCEPTION IN SPEECH

    Institute of Scientific and Technical Information of China (English)

    Tong Ming; Bian Zhengzhong; Li Xiaohui; Dai Qijun; Chen Yanpu

    2003-01-01

    The perceptual effect of the phase information in speech has been studied by auditorysubjective tests. On the condition that the phase spectrum in speech is changed while amplitudespectrum is unchanged, the tests show that: (1) If the envelop of the reconstructed speech signalis unchanged, there is indistinctive auditory perception between the original speech and thereconstructed speech; (2) The auditory perception effect of the reconstructed speech mainly lieson the amplitude of the derivative of the additive phase; (3) td is the maximum relative time shiftbetween different frequency components of the reconstructed speech signal. The speech qualityis excellent while td <10ms; good while 10ms< td <20ms; common while 20ms< td <35ms, andpoor while td >35ms.

  8. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Georgios Mantokoudis

    Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  9. The development of multisensory speech perception continues into the late childhood years.

    Science.gov (United States)

    Ross, Lars A; Molholm, Sophie; Blanco, Daniella; Gomez-Ramirez, Manuel; Saint-Amour, Dave; Foxe, John J

    2011-06-01

    Observing a speaker's articulations substantially improves the intelligibility of spoken speech, especially under noisy listening conditions. This multisensory integration of speech inputs is crucial to effective communication. Appropriate development of this ability has major implications for children in classroom and social settings, and deficits in it have been linked to a number of neurodevelopmental disorders, especially autism. It is clear from structural imaging studies that there is a prolonged maturational course within regions of the perisylvian cortex that persists into late childhood, and these regions have been firmly established as being crucial to speech and language functions. Given this protracted maturational timeframe, we reasoned that multisensory speech processing might well show a similarly protracted developmental course. Previous work in adults has shown that audiovisual enhancement in word recognition is most apparent within a restricted range of signal-to-noise ratios (SNRs). Here, we investigated when these properties emerge during childhood by testing multisensory speech recognition abilities in typically developing children aged between 5 and 14 years, and comparing them with those of adults. By parametrically varying SNRs, we found that children benefited significantly less from observing visual articulations, displaying considerably less audiovisual enhancement. The findings suggest that improvement in the ability to recognize speech-in-noise and in audiovisual integration during speech perception continues quite late into the childhood years. The implication is that a considerable amount of multisensory learning remains to be achieved during the later schooling years, and that explicit efforts to accommodate this learning may well be warranted. PMID:21615556

  10. Audiovisual benefit for recognition of speech presented with single-talker noise in older listeners

    OpenAIRE

    Jesse, A.; Janse, E.

    2012-01-01

    Older listeners are more affected than younger listeners in their recognition of speech in adverse conditions, such as when they also hear a single-competing speaker. In the present study, we investigated with a speeded response task whether older listeners with various degrees of hearing loss benefit under such conditions from also seeing the speaker they intend to listen to. We also tested, at the same time, whether older adults need postperceptual processing to obtain an audiovisual benefi...

  11. Exploring Student Perceptions of Audiovisual Feedback via Screencasting in Online Courses

    Science.gov (United States)

    Mathieson, Kathleen

    2012-01-01

    Using Moore's (1993) theory of transactional distance as a framework, this action research study explored students' perceptions of audiovisual feedback provided via screencasting as a supplement to text-only feedback. A crossover design was employed to ensure that all students experienced both text-only and text-plus-audiovisual feedback and to…

  12. ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

    Directory of Open Access Journals (Sweden)

    D.V. Ivanko

    2016-05-01

    Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.

  13. Speech perception of noise with binary gains

    DEFF Research Database (Denmark)

    Wang, DeLiang; Kjems, Ulrik; Pedersen, Michael Syskind;

    2008-01-01

    For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed by the i...... by the ideal binary mask. Only 16 filter channels and a frame rate of 100 Hz are sufficient for high intelligibility. The results show that, despite a dramatic reduction of speech information, a pattern of binary gains provides an adequate basis for speech perception....

  14. Audibility and visual biasing in speech perception

    Science.gov (United States)

    Clement, Bart Richard

    Although speech perception has been considered a predominantly auditory phenomenon, large benefits from vision in degraded acoustic conditions suggest integration of audition and vision. More direct evidence of this comes from studies of audiovisual disparity that demonstrate vision can bias and even dominate perception (McGurk & MacDonald, 1976). It has been observed that hearing-impaired listeners demonstrate more visual biasing than normally hearing listeners (Walden et al., 1990). It is argued here that stimulus audibility must be equated across groups before true differences can be established. In the present investigation, effects of visual biasing on perception were examined as audibility was degraded for 12 young normally hearing listeners. Biasing was determined by quantifying the degree to which listener identification functions for a single synthetic auditory /ba-da-ga/ continuum changed across two conditions: (1)an auditory-only listening condition; and (2)an auditory-visual condition in which every item of the continuum was synchronized with visual articulations of the consonant-vowel (CV) tokens /ba/ and /ga/, as spoken by each of two talkers. Audibility was altered by presenting the conditions in quiet and in noise at each of three signal-to- noise (S/N) ratios. For the visual-/ba/ context, large effects of audibility were found. As audibility decreased, visual biasing increased. A large talker effect also was found, with one talker eliciting more biasing than the other. An independent lipreading measure demonstrated that this talker was more visually intelligible than the other. For the visual-/ga/ context, audibility and talker effects were less robust, possibly obscured by strong listener effects, which were characterized by marked differences in perceptual processing patterns among participants. Some demonstrated substantial biasing whereas others demonstrated little, indicating a strong reliance on audition even in severely degraded acoustic

  15. Neural pathways for visual speech perception

    Directory of Open Access Journals (Sweden)

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  16. Audiovisual Integration in Children Listening to Spectrally Degraded Speech

    Science.gov (United States)

    Maidment, David W.; Kang, Hi Jee; Stewart, Hannah J.; Amitay, Sygal

    2015-01-01

    Purpose: The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Method: Children (n = 69) and adults (n = 15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in…

  17. Speech perception as an active cognitive process

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-03-01

    Full Text Available One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processingd with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or

  18. The Beginnings of Danish Speech Perception

    DEFF Research Database (Denmark)

    Østerbye, Torkil

    reductions of speech sounds evident in the pronunciation of the language. This book (originally a PhD thesis) consists of three studies based on the results of two experiments. The experiments were designed to provide knowledge of the perception of Danish speech sounds by Danish adults and infants......, in the light of the rich and complex Danish sound system. The first two studies report on native adults’ perception of Danish speech sounds in quiet and noise. The third study examined the development of language-specific perception in native Danish infants at 6, 9 and 12 months of age. The book points...... to interesting differences in speech perception and acquisition of Danish adults and infants when compared to English. The book is useful for professionals as well as students of linguistics, psycholinguistics and phonetics/phonology, or anyone else who may be interested in language....

  19. The Neural Substrates of Infant Speech Perception

    Science.gov (United States)

    Homae, Fumitaka; Watanabe, Hama; Taga, Gentaro

    2014-01-01

    Infants often pay special attention to speech sounds, and they appear to detect key features of these sounds. To investigate the neural foundation of speech perception in infants, we measured cortical activation using near-infrared spectroscopy. We presented the following three types of auditory stimuli while 3-month-old infants watched a silent…

  20. Reflections on mirror neurons and speech perception

    OpenAIRE

    Lotto, Andrew J.; Hickok, Gregory S.; Holt, Lori L.

    2009-01-01

    The discovery of mirror neurons, a class of neurons that respond when a monkey performs an action and also when the monkey observes others producing the same action, has promoted a renaissance for the Motor Theory (MT) of speech perception. This is because mirror neurons seem to accomplish the same kind of one to one mapping between perception and action that MT theorizes to be the basis of human speech communication. However, this seeming correspondence is superficial, and there are theoreti...

  1. Visemic Processing in Audiovisual Discrimination of Natural Speech: A Simultaneous fMRI-EEG Study

    Science.gov (United States)

    Dubois, Cyril; Otzenberger, Helene; Gounot, Daniel; Sock, Rudolph; Metz-Lutz, Marie-Noelle

    2012-01-01

    In a noisy environment, visual perception of articulatory movements improves natural speech intelligibility. Parallel to phonemic processing based on auditory signal, visemic processing constitutes a counterpart based on "visemes", the distinctive visual units of speech. Aiming at investigating the neural substrates of visemic processing in a…

  2. Differential Gaze Patterns on Eyes and Mouth During Audiovisual Speech Segmentation.

    Science.gov (United States)

    Lusk, Laina G; Mitchel, Aaron D

    2016-01-01

    Speech is inextricably multisensory: both auditory and visual components provide critical information for all aspects of speech processing, including speech segmentation, the visual components of which have been the target of a growing number of studies. In particular, a recent study (Mitchel and Weiss, 2014) established that adults can utilize facial cues (i.e., visual prosody) to identify word boundaries in fluent speech. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2014). Subjects spent the most time watching the eyes and mouth. A significant trend in gaze durations was found with the longest gaze duration on the mouth, followed by the eyes and then the nose. In addition, eye-gaze patterns changed across familiarization as subjects learned the word boundaries, showing decreased attention to the mouth in later blocks while attention on other facial features remained consistent. These findings highlight the importance of the visual component of speech processing and suggest that the mouth may play a critical role in visual speech segmentation.

  3. Speech perception in children with speech output disorders.

    NARCIS (Netherlands)

    Nijland, L.

    2009-01-01

    Research in the field of speech production pathology is dominated by describing deficits in output. However, perceptual problems might underlie, precede, or interact with production disorders. The present study hypothesizes that the level of the production disorders is linked to level of perception

  4. Audio-Visual Speech Intelligibility Benefits with Bilateral Cochlear Implants when Talker Location Varies

    OpenAIRE

    van Hoesel, Richard J. M.

    2015-01-01

    One of the key benefits of using cochlear implants (CIs) in both ears rather than just one is improved localization. It is likely that in complex listening scenes, improved localization allows bilateral CI users to orient toward talkers to improve signal-to-noise ratios and gain access to visual cues, but to date, that conjecture has not been tested. To obtain an objective measure of that benefit, seven bilateral CI users were assessed for both auditory-only and audio-visual speech intelligib...

  5. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  6. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Iwano Koji

    2007-01-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  7. Phonological abstraction without phonemes in speech perception

    OpenAIRE

    Mitterer, H.; Scharenborg, O.; McQueen, J

    2013-01-01

    Recent evidence shows that listeners use abstract prelexical units in speech perception. Using the phenomenon of lexical retuning in speech processing, we ask whether those units are necessarily phonemic. Dutch listeners were exposed to a Dutch speaker producing ambiguous phones between the Dutch syllable-final allophones approximant [r] and dark [l]. These ambiguous phones replaced either final /r/ or final /l/ in words in a lexical-decision task. This differential exposure affected percepti...

  8. Speech perception as complex auditory categorization

    Science.gov (United States)

    Holt, Lori L.

    2002-05-01

    Despite a long and rich history of categorization research in cognitive psychology, very little work has addressed the issue of complex auditory category formation. This is especially unfortunate because the general underlying cognitive and perceptual mechanisms that guide auditory category formation are of great importance to understanding speech perception. I will discuss a new methodological approach to examining complex auditory category formation that specifically addresses issues relevant to speech perception. This approach utilizes novel nonspeech sound stimuli to gain full experimental control over listeners' history of experience. As such, the course of learning is readily measurable. Results from this methodology indicate that the structure and formation of auditory categories are a function of the statistical input distributions of sound that listeners hear, aspects of the operating characteristics of the auditory system, and characteristics of the perceptual categorization system. These results have important implications for phonetic acquisition and speech perception.

  9. Reflections on mirror neurons and speech perception.

    Science.gov (United States)

    Lotto, Andrew J; Hickok, Gregory S; Holt, Lori L

    2009-03-01

    The discovery of mirror neurons, a class of neurons that respond when a monkey performs an action and also when the monkey observes others producing the same action, has promoted a renaissance for the Motor Theory (MT) of speech perception. This is because mirror neurons seem to accomplish the same kind of one to one mapping between perception and action that MT theorizes to be the basis of human speech communication. However, this seeming correspondence is superficial, and there are theoretical and empirical reasons to temper enthusiasm about the explanatory role mirror neurons might have for speech perception. In fact, rather than providing support for MT, mirror neurons are actually inconsistent with the central tenets of MT.

  10. Perception and the temporal properties of speech

    Science.gov (United States)

    Gordon, Peter C.

    1991-11-01

    Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.

  11. Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex

    Science.gov (United States)

    Rhone, Ariane E.; Nourski, Kirill V.; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A.; McMurray, Bob

    2016-01-01

    In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas. PMID:27182530

  12. Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

    Directory of Open Access Journals (Sweden)

    Alan James Power

    2012-07-01

    Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling

  13. Social Expectation Improves Speech Perception in Noise.

    Science.gov (United States)

    McGowan, Kevin B

    2015-12-01

    Listeners' use of social information during speech perception was investigated by measuring transcription accuracy of Chinese-accented speech in noise while listeners were presented with a congruent Chinese face, an incongruent Caucasian face, or an uninformative silhouette. When listeners were presented with a Chinese face they transcribed more accurately than when presented with the Caucasian face. This difference existed both for listeners with a relatively high level of experience and for listeners with a relatively low level of experience with Chinese-accented English. Overall, these results are inconsistent with a model of social speech perception in which listener bias reduces attendance to the acoustic signal. These results are generally consistent with exemplar models of socially indexed speech perception predicting that activation of a social category will raise base activation levels of socially appropriate episodic traces, but the similar performance of more and less experienced listeners suggests the need for a more nuanced view with a role for both detailed experience and listener stereotypes. PMID:27483742

  14. Crossmodal integration enhances neural representation of task-relevant features in audiovisual face perception.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Liu, Yongjian; Liang, Changhong; Sun, Pei

    2015-02-01

    Previous studies have shown that audiovisual integration improves identification performance and enhances neural activity in heteromodal brain areas, for example, the posterior superior temporal sulcus/middle temporal gyrus (pSTS/MTG). Furthermore, it has also been demonstrated that attention plays an important role in crossmodal integration. In this study, we considered crossmodal integration in audiovisual facial perception and explored its effect on the neural representation of features. The audiovisual stimuli in the experiment consisted of facial movie clips that could be classified into 2 gender categories (male vs. female) or 2 emotion categories (crying vs. laughing). The visual/auditory-only stimuli were created from these movie clips by removing the auditory/visual contents. The subjects needed to make a judgment about the gender/emotion category for each movie clip in the audiovisual, visual-only, or auditory-only stimulus condition as functional magnetic resonance imaging (fMRI) signals were recorded. The neural representation of the gender/emotion feature was assessed using the decoding accuracy and the brain pattern-related reproducibility indices, obtained by a multivariate pattern analysis method from the fMRI data. In comparison to the visual-only and auditory-only stimulus conditions, we found that audiovisual integration enhanced the neural representation of task-relevant features and that feature-selective attention might play a role of modulation in the audiovisual integration.

  15. Dissociating Cortical Activity during Processing of Native and Non-Native Audiovisual Speech from Early to Late Infancy.

    Science.gov (United States)

    Fava, Eswen; Hull, Rachel; Bortfeld, Heather

    2014-01-01

    Initially, infants are capable of discriminating phonetic contrasts across the world's languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech). Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity. PMID:25116572

  16. Dissociating Cortical Activity during Processing of Native and Non-Native Audiovisual Speech from Early to Late Infancy

    Directory of Open Access Journals (Sweden)

    Eswen Fava

    2014-08-01

    Full Text Available Initially, infants are capable of discriminating phonetic contrasts across the world’s languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech. Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity.

  17. Phonological abstraction without phonemes in speech perception.

    Science.gov (United States)

    Mitterer, Holger; Scharenborg, Odette; McQueen, James M

    2013-11-01

    Recent evidence shows that listeners use abstract prelexical units in speech perception. Using the phenomenon of lexical retuning in speech processing, we ask whether those units are necessarily phonemic. Dutch listeners were exposed to a Dutch speaker producing ambiguous phones between the Dutch syllable-final allophones approximant [r] and dark [l]. These ambiguous phones replaced either final /r/ or final /l/ in words in a lexical-decision task. This differential exposure affected perception of ambiguous stimuli on the same allophone continuum in a subsequent phonetic-categorization test: Listeners exposed to ambiguous phones in /r/-final words were more likely to perceive test stimuli as /r/ than listeners with exposure in /l/-final words. This effect was not found for test stimuli on continua using other allophones of /r/ and /l/. These results confirm that listeners use phonological abstraction in speech perception. They also show that context-sensitive allophones can play a role in this process, and hence that context-insensitive phonemes are not necessary. We suggest there may be no one unit of perception. PMID:23973464

  18. Perception of Dynamic and Static Audiovisual Sequences in 3- and 4-Month-Old Infants

    Science.gov (United States)

    Lewkowicz, David J.

    2008-01-01

    This study investigated perception of audiovisual sequences in 3- and 4-month-old infants. Infants were habituated to sequences consisting of moving/sounding or looming/sounding objects and then tested for their ability to detect changes in the order of the objects, sounds, or both. Results showed that 3-month-olds perceived the order of 3-element…

  19. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception

    Directory of Open Access Journals (Sweden)

    Avrill eTreille

    2014-05-01

    Full Text Available Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker’s face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant and predictive visual and haptic cues may facilitate auditory speech processing. To further investigate this hypothesis, auditory evoked potentials were here compared during auditory-only, audio-visual and audio-haptic speech perception in live dyadic interactions between a listener and a speaker. In line with previous studies, auditory evoked potentials were attenuated and speeded up during both audio-haptic and audio-visual compared to auditory speech perception. Importantly, the observed latency and amplitude reduction did not significantly depend on the degree of visual and haptic recognition of the speech targets. Altogether, these results further demonstrate cross-modal interactions between the auditory, visual and haptic speech signals. Although they do not contradict the hypothesis that visual and haptic sensory inputs convey predictive information with respect to the incoming auditory speech input, these results suggest that, at least in live conversational interactions, systematic conclusions on sensory predictability in bimodal speech integration have to be taken with caution, with the extraction of predictive cues likely depending on the variability of the speech stimuli.

  20. Dynamic visual speech perception in a patient with visual form agnosia.

    Science.gov (United States)

    Munhall, K G; Servos, P; Santi, A; Goodale, M A

    2002-10-01

    To examine the role of dynamic cues in visual speech perception, a patient with visual form agnosia (DF) was tested with a set of static and dynamic visual displays of three vowels. Five conditions were tested: (1) auditory only which provided only vocal pitch information, (2) dynamic visual only, (3) dynamic audiovisual with vocal pitch information, (4) dynamic audiovisual with full voice information and (5) static visual only images of postures during vowel production. DF showed normal performance in all conditions except the static visual only condition in which she scored at chance. Control subjects scored close to ceiling in this condition. The results suggest that spatiotemporal signatures for objects and events are processed separately from static form cues.

  1. Developing an Audiovisual Notebook as a Self-Learning Tool in Histology: Perceptions of Teachers and Students

    Science.gov (United States)

    Campos-Sánchez, Antonio; López-Núñez, Juan-Antonio; Scionti, Giuseppe; Garzón, Ingrid; González-Andrades, Miguel; Alaminos, Miguel; Sola, Tomás

    2014-01-01

    Videos can be used as didactic tools for self-learning under several circumstances, including those cases in which students are responsible for the development of this resource as an audiovisual notebook. We compared students' and teachers' perceptions regarding the main features that an audiovisual notebook should include. Four…

  2. Neurophysiological Influence of Musical Training on Speech Perception

    OpenAIRE

    Shahin, Antoine J.

    2011-01-01

    Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one’s ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss, who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skill...

  3. Speech Perception Within an Auditory Cognitive Science Framework

    OpenAIRE

    Holt, Lori L.; Lotto, Andrew J.

    2008-01-01

    The complexities of the acoustic speech signal pose many significant challenges for listeners. Although perceiving speech begins with auditory processing, investigation of speech perception has progressed mostly independently of study of the auditory system. Nevertheless, a growing body of evidence demonstrates that cross-fertilization between the two areas of research can be productive. We briefly describe research bridging the study of general auditory processing and speech perception, show...

  4. Neural correlates of quality during perception of audiovisual stimuli

    CERN Document Server

    Arndt, Sebastian

    2016-01-01

    This book presents a new approach to examining perceived quality of audiovisual sequences. It uses electroencephalography to understand how exactly user quality judgments are formed within a test participant, and what might be the physiologically-based implications when being exposed to lower quality media. The book redefines experimental paradigms of using EEG in the area of quality assessment so that they better suit the requirements of standard subjective quality testings. Therefore, experimental protocols and stimuli are adjusted accordingly. .

  5. The development of the perception of audiovisual simultaneity.

    Science.gov (United States)

    Chen, Yi-Chuan; Shore, David I; Lewis, Terri L; Maurer, Daphne

    2016-06-01

    We measured the typical developmental trajectory of the window of audiovisual simultaneity by testing four age groups of children (5, 7, 9, and 11 years) and adults. We presented a visual flash and an auditory noise burst at various stimulus onset asynchronies (SOAs) and asked participants to report whether the two stimuli were presented at the same time. Compared with adults, children aged 5 and 7 years made more simultaneous responses when the SOAs were beyond ± 200 ms but made fewer simultaneous responses at the 0 ms SOA. The point of subjective simultaneity was located at the visual-leading side, as in adults, by 5 years of age, the youngest age tested. However, the window of audiovisual simultaneity became narrower and response errors decreased with age, reaching adult levels by 9 years of age. Experiment 2 ruled out the possibility that the adult-like performance of 9-year-old children was caused by the testing of a wide range of SOAs. Together, the results demonstrate that the adult-like precision of perceiving audiovisual simultaneity is developed by 9 years of age, the youngest age that has been reported to date. PMID:26897264

  6. The development of the perception of audiovisual simultaneity.

    Science.gov (United States)

    Chen, Yi-Chuan; Shore, David I; Lewis, Terri L; Maurer, Daphne

    2016-06-01

    We measured the typical developmental trajectory of the window of audiovisual simultaneity by testing four age groups of children (5, 7, 9, and 11 years) and adults. We presented a visual flash and an auditory noise burst at various stimulus onset asynchronies (SOAs) and asked participants to report whether the two stimuli were presented at the same time. Compared with adults, children aged 5 and 7 years made more simultaneous responses when the SOAs were beyond ± 200 ms but made fewer simultaneous responses at the 0 ms SOA. The point of subjective simultaneity was located at the visual-leading side, as in adults, by 5 years of age, the youngest age tested. However, the window of audiovisual simultaneity became narrower and response errors decreased with age, reaching adult levels by 9 years of age. Experiment 2 ruled out the possibility that the adult-like performance of 9-year-old children was caused by the testing of a wide range of SOAs. Together, the results demonstrate that the adult-like precision of perceiving audiovisual simultaneity is developed by 9 years of age, the youngest age that has been reported to date.

  7. Emotion Recognition from Speech Signals and Perception of Music

    OpenAIRE

    Fernandez Pradier, Melanie

    2011-01-01

    This thesis deals with emotion recognition from speech signals. The feature extraction step shall be improved by looking at the perception of music. In music theory, different pitch intervals (consonant, dissonant) and chords are believed to invoke different feelings in listeners. The question is whether there is a similar mechanism between perception of music and perception of emotional speech. Our research will follow three stages. First, the relationship between speech and music at segment...

  8. Sound frequency affects speech emotion perception: results from congenital amusia

    OpenAIRE

    Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche

    2015-01-01

    Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions fro...

  9. Sound frequency affects speech emotion perception: Results from congenital amusia

    OpenAIRE

    Sydney eLolli; Lewenstein, Ari D.; Julian eBasurto; Sean eWinnik; Psyche eLoui

    2015-01-01

    Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from...

  10. Perception of speech in noise: neural correlates.

    Science.gov (United States)

    Song, Judy H; Skoe, Erika; Banai, Karen; Kraus, Nina

    2011-09-01

    The presence of irrelevant auditory information (other talkers, environmental noises) presents a major challenge to listening to speech. The fundamental frequency (F(0)) of the target speaker is thought to provide an important cue for the extraction of the speaker's voice from background noise, but little is known about the relationship between speech-in-noise (SIN) perceptual ability and neural encoding of the F(0). Motivated by recent findings that music and language experience enhance brainstem representation of sound, we examined the hypothesis that brainstem encoding of the F(0) is diminished to a greater degree by background noise in people with poorer perceptual abilities in noise. To this end, we measured speech-evoked auditory brainstem responses to /da/ in quiet and two multitalker babble conditions (two-talker and six-talker) in native English-speaking young adults who ranged in their ability to perceive and recall SIN. Listeners who were poorer performers on a standardized SIN measure demonstrated greater susceptibility to the degradative effects of noise on the neural encoding of the F(0). Particularly diminished was their phase-locked activity to the fundamental frequency in the portion of the syllable known to be most vulnerable to perceptual disruption (i.e., the formant transition period). Our findings suggest that the subcortical representation of the F(0) in noise contributes to the perception of speech in noisy conditions.

  11. Developing an audiovisual notebook as a self-learning tool in histology: Perceptions of teachers and students

    OpenAIRE

    Campos Sanchez, Antonio; López Núñez, Juan-Antonio; Scionti, Giuseppe; Garzón, Ingrid; González Andrades, Miguel; Alaminos, Miguel; Sola, Tomás

    2013-01-01

    Videos can be used as didactic tools for self-learning under several circumstances, including those cases in which students are responsible for the development of this resource as an audiovisual notebook. We compared students' and teachers' perceptions regarding the main features that an audiovisual notebook should include. Four questionnaires with items about information, images, text and music, and filmmaking were used to investigate students' (n¿=¿115) and teachers' perceptions (n¿=¿28) re...

  12. Audiovisual associations alter the perception of low-level visual motion.

    Science.gov (United States)

    Kafaligonul, Hulusi; Oluk, Can

    2015-01-01

    Motion perception is a pervasive nature of vision and is affected by both immediate pattern of sensory inputs and prior experiences acquired through associations. Recently, several studies reported that an association can be established quickly between directions of visual motion and static sounds of distinct frequencies. After the association is formed, sounds are able to change the perceived direction of visual motion. To determine whether such rapidly acquired audiovisual associations and their subsequent influences on visual motion perception are dependent on the involvement of higher-order attentive tracking mechanisms, we designed psychophysical experiments using regular and reverse-phi random dot motions isolating low-level pre-attentive motion processing. Our results show that an association between the directions of low-level visual motion and static sounds can be formed and this audiovisual association alters the subsequent perception of low-level visual motion. These findings support the view that audiovisual associations are not restricted to high-level attention based motion system and early-level visual motion processing has some potential role.

  13. Audiovisual associations alter the perception of low-level visual motion

    Directory of Open Access Journals (Sweden)

    Hulusi eKafaligonul

    2015-03-01

    Full Text Available Motion perception is a pervasive nature of vision and is affected by both immediate pattern of sensory inputs and prior experiences acquired through associations. Recently, several studies reported that an association can be established quickly between directions of visual motion and static sounds of distinct frequencies. After the association is formed, sounds are able to change the perceived direction of visual motion. To determine whether such rapidly acquired audiovisual associations and their subsequent influences on visual motion perception are dependent on the involvement of higher-order attentive tracking mechanisms, we designed psychophysical experiments using regular and reverse-phi random dot motions isolating low-level pre-attentive motion processing. Our results show that an association between the directions of low-level visual motion and static sounds can be formed and this audiovisual association alters the subsequent perception of low-level visual motion. These findings support the view that audiovisual associations are not restricted to high-level attention based motion system and early-level visual motion processing has some potential role.

  14. Musical expertise and foreign speech perception

    Directory of Open Access Journals (Sweden)

    Eduardo eMartínez-Montes

    2013-11-01

    Full Text Available The aim of this experiment was to investigate the influence of musical expertise on the automatic perception of foreign syllables and harmonic sounds. Participants were Cuban students with high level of expertise in music or in visual arts and with the same level of general education and socio-economic background. We used a multi-feature Mismatch Negativity (MMN design with sequences of either syllables in Mandarin Chinese or harmonic sounds, both comprising deviants in pitch contour, duration and Voice Onset Time (VOT or equivalent that were either far from (Large deviants or close to (Small deviants the standard. For both Mandarin syllables and harmonic sounds, results were clear-cut in showing larger MMNs to pitch contour deviants in musicians than in visual artists. Results were less clear for duration and VOT deviants, possibly because of the specific characteristics of the stimuli. Results are interpreted as reflecting similar processing of pitch contour in speech and non-speech sounds. The implications of these results for understanding the influence of intense musical training from childhood to adulthood and of genetic predispositions for music on foreign language perception is discussed.

  15. [Speech perception in the first two years].

    Science.gov (United States)

    Bertoncini, J; Cabrera, L

    2014-10-01

    The development of speech perception relies upon early auditory capacities (i.e. discrimination, segmentation and representation). Infants are able to discriminate most of the phonetic contrasts occurring in natural languages, and at the end of the first year, this universal ability starts to narrow down to the contrasts used in the environmental language. During the second year, this specialization is characterized by the development of comprehension, lexical organization and word production. That process appears now as the result of multiple interactions between perceptual, cognitive and social developing abilities. Distinct factors like word acquisition, sensitivity to the statistical properties of the input, or even the nature of the social interactions, might play a role at one time or another during the acquisition of phonological patterns. Experience with the native language is necessary for phonetic segments to be functional units of perception and for speech sound representations (words, syllables) to be more specified and phonetically organized. This evolution goes on beyond 24 months of age in a learning context characterized from the early stages by the interaction with other developing (linguistic and non-linguistic) capacities. PMID:25218761

  16. Audio-visual perception system for a humanoid robotic head.

    Science.gov (United States)

    Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M; Bandera, Juan P; Romero-Garces, Adrian; Reche-Lopez, Pedro

    2014-01-01

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework. PMID:24878593

  17. Audio-Visual Perception System for a Humanoid Robotic Head

    Directory of Open Access Journals (Sweden)

    Raquel Viciana-Abad

    2014-05-01

    Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  18. Audio-visual perception system for a humanoid robotic head.

    Science.gov (United States)

    Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M; Bandera, Juan P; Romero-Garces, Adrian; Reche-Lopez, Pedro

    2014-01-01

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  19. Voice and Speech Quality Perception Assessment and Evaluation

    CERN Document Server

    Jekosch, Ute

    2005-01-01

    Foundations of Voice and Speech Quality Perception starts out with the fundamental question of: "How do listeners perceive voice and speech quality and how can these processes be modeled?" Any quantitative answers require measurements. This is natural for physical quantities but harder to imagine for perceptual measurands. This book approaches the problem by actually identifying major perceptual dimensions of voice and speech quality perception, defining units wherever possible and offering paradigms to position these dimensions into a structural skeleton of perceptual speech and voice quality. The emphasis is placed on voice and speech quality assessment of systems in artificial scenarios. Many scientific fields are involved. This book bridges the gap between two quite diverse fields, engineering and humanities, and establishes the new research area of Voice and Speech Quality Perception.

  20. Neurophysiological influence of musical training on speech perception.

    Science.gov (United States)

    Shahin, Antoine J

    2011-01-01

    Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL. PMID:21716639

  1. Neurophysiological influence of musical training on speech perception

    Directory of Open Access Journals (Sweden)

    Antoine J Shahin

    2011-06-01

    Full Text Available Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one’s ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss, who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with hearing loss.

  2. Perception drives production across sensory modalities: A network for sensorimotor integration of visual speech.

    Science.gov (United States)

    Venezia, Jonathan H; Fillmore, Paul; Matchin, William; Isenberg, A Lisette; Hickok, Gregory; Fridriksson, Julius

    2016-02-01

    Sensory information is critical for movement control, both for defining the targets of actions and providing feedback during planning or ongoing movements. This holds for speech motor control as well, where both auditory and somatosensory information have been shown to play a key role. Recent clinical research demonstrates that individuals with severe speech production deficits can show a dramatic improvement in fluency during online mimicking of an audiovisual speech signal suggesting the existence of a visuomotor pathway for speech motor control. Here we used fMRI in healthy individuals to identify this new visuomotor circuit for speech production. Participants were asked to perceive and covertly rehearse nonsense syllable sequences presented auditorily, visually, or audiovisually. The motor act of rehearsal, which is prima facie the same whether or not it is cued with a visible talker, produced different patterns of sensorimotor activation when cued by visual or audiovisual speech (relative to auditory speech). In particular, a network of brain regions including the left posterior middle temporal gyrus and several frontoparietal sensorimotor areas activated more strongly during rehearsal cued by a visible talker versus rehearsal cued by auditory speech alone. Some of these brain regions responded exclusively to rehearsal cued by visual or audiovisual speech. This result has significant implications for models of speech motor control, for the treatment of speech output disorders, and for models of the role of speech gesture imitation in development.

  3. Sound frequency affects speech emotion perception: Results from congenital amusia

    Directory of Open Access Journals (Sweden)

    Sydney eLolli

    2015-09-01

    Full Text Available Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody (MBEP were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under band-pass and unfiltered speech conditions. Results showed a significant correlation between pitch discrimination threshold and emotion identification accuracy for band-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold > 16 Hz performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between band-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation.

  4. Electrophysiological correlates of individual differences in perception of audiovisual temporal asynchrony.

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer

    2016-06-01

    Sensitivity to the temporal relationship between auditory and visual stimuli is key to efficient audiovisual integration. However, even adults vary greatly in their ability to detect audiovisual temporal asynchrony. What underlies this variability is currently unknown. We recorded event-related potentials (ERPs) while participants performed a simultaneity judgment task on a range of audiovisual (AV) and visual-auditory (VA) stimulus onset asynchronies (SOAs) and compared ERP responses in good and poor performers to the 200ms SOA, which showed the largest individual variability in the number of synchronous perceptions. Analysis of ERPs to the VA200 stimulus yielded no significant results. However, those individuals who were more sensitive to the AV200 SOA had significantly more positive voltage between 210 and 270ms following the sound onset. In a follow-up analysis, we showed that the mean voltage within this window predicted approximately 36% of variability in sensitivity to AV temporal asynchrony in a larger group of participants. The relationship between the ERP measure in the 210-270ms window and accuracy on the simultaneity judgment task also held for two other AV SOAs with significant individual variability -100 and 300ms. Because the identified window was time-locked to the onset of sound in the AV stimulus, we conclude that sensitivity to AV temporal asynchrony is shaped to a large extent by the efficiency in the neural encoding of sound onsets. PMID:27094850

  5. Neural dynamics of audiovisual synchrony and asynchrony perception in 6-month-old infants

    Directory of Open Access Journals (Sweden)

    Franziska eKopp

    2013-01-01

    Full Text Available Young infants are sensitive to multisensory temporal synchrony relations, but the neural dynamics of temporal interactions between vision and audition in infancy are not well understood. We investigated audiovisual synchrony and asynchrony perception in 6-month-old infants using event-related potentials (ERP. In a prior behavioral experiment (n = 45, infants were habituated to an audiovisual synchronous stimulus and tested for recovery of interest by presenting an asynchronous test stimulus in which the visual stream was delayed with respect to the auditory stream by 400 ms. Infants who behaviorally discriminated the change in temporal alignment were included in further analyses. In the EEG experiment (final sample: n = 15, synchronous and asynchronous stimuli (visual delay of 400 ms were presented in random order. Results show latency shifts in the auditory ERP components N1 and P2 as well as the infant ERP component Nc. Latencies in the asynchronous condition were significantly longer than in the synchronous condition. After video onset but preceding the auditory onset, amplitude modulations propagating from posterior to anterior sites and related to the Pb component of infants' ERP were observed. Results suggest temporal interactions between the two modalities. Specifically, they point to the significance of anticipatory visual motion for auditory processing, and indicate young infants’ predictive capacities for audiovisual temporal synchrony relations.

  6. Electrophysiological correlates of individual differences in perception of audiovisual temporal asynchrony.

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer

    2016-06-01

    Sensitivity to the temporal relationship between auditory and visual stimuli is key to efficient audiovisual integration. However, even adults vary greatly in their ability to detect audiovisual temporal asynchrony. What underlies this variability is currently unknown. We recorded event-related potentials (ERPs) while participants performed a simultaneity judgment task on a range of audiovisual (AV) and visual-auditory (VA) stimulus onset asynchronies (SOAs) and compared ERP responses in good and poor performers to the 200ms SOA, which showed the largest individual variability in the number of synchronous perceptions. Analysis of ERPs to the VA200 stimulus yielded no significant results. However, those individuals who were more sensitive to the AV200 SOA had significantly more positive voltage between 210 and 270ms following the sound onset. In a follow-up analysis, we showed that the mean voltage within this window predicted approximately 36% of variability in sensitivity to AV temporal asynchrony in a larger group of participants. The relationship between the ERP measure in the 210-270ms window and accuracy on the simultaneity judgment task also held for two other AV SOAs with significant individual variability -100 and 300ms. Because the identified window was time-locked to the onset of sound in the AV stimulus, we conclude that sensitivity to AV temporal asynchrony is shaped to a large extent by the efficiency in the neural encoding of sound onsets.

  7. Sound frequency affects speech emotion perception: results from congenital amusia.

    Science.gov (United States)

    Lolli, Sydney L; Lewenstein, Ari D; Basurto, Julian; Winnik, Sean; Loui, Psyche

    2015-01-01

    Congenital amusics, or "tone-deaf" individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718

  8. Sound frequency affects speech emotion perception: results from congenital amusia

    Science.gov (United States)

    Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche

    2015-01-01

    Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718

  9. Cognitive Control Factors in Speech Perception at 11 Months

    Science.gov (United States)

    Conboy, Barbara T.; Sommerville, Jessica A.; Kuhl, Patricia K.

    2008-01-01

    The development of speech perception during the 1st year reflects increasing attunement to native language features, but the mechanisms underlying this development are not completely understood. One previous study linked reductions in nonnative speech discrimination to performance on nonlinguistic tasks, whereas other studies have shown…

  10. Brain responses and looking behavior during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life

    Science.gov (United States)

    Kushnerenko, Elena; Tomalski, Przemyslaw; Ballieux, Haiko; Potton, Anita; Birtles, Deidre; Frostick, Caroline; Moore, Derek G.

    2013-01-01

    The use of visual cues during the processing of audiovisual (AV) speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6–9 months to 14–16 months of age. We used eye-tracking to examine whether individual differences in visual attention during AV processing of speech in 6–9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6–9 month old infants also participated in an event-related potential (ERP) AV task within the same experimental session. Language development was then followed-up at the age of 14–16 months, using two measures of language development, the Preschool Language Scale and the Oxford Communicative Development Inventory. The results show that those infants who were less efficient in auditory speech processing at the age of 6–9 months had lower receptive language scores at 14–16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audiovisually incongruent stimuli at 6–9 months were both significantly associated with language development at 14–16 months. These findings add to the understanding of individual differences in neural signatures of AV processing and associated looking behavior in infants. PMID:23882240

  11. A Novel Algorithm for Acoustic and Visual Classifiers Decision Fusion in Audio-Visual Speech Recognition System

    Directory of Open Access Journals (Sweden)

    P.S. Sathidevi

    2010-03-01

    Full Text Available Audio-visual speech recognition (AVSR using acoustic and visual signals of speech have received attention recently because of its robustness in noisy environments. Perceptual studies also support this approach by emphasizing the importance of visual information for speech recognition in humans. An important issue in decision fusion based AVSR system is how to obtain the appropriate integration weight for the speech modalities to integrate and ensure the combined AVSR system’s performances better than that of the audio-only and visual-only systems under various noise conditions. To solve this issue, we present a genetic algorithm (GA based optimization scheme to obtain the appropriate integration weight from the relative reliability of each modality. The performance of the proposed GA optimized reliability-ratio based weight estimation scheme is demonstrated via single speaker, mobile functions isolated word recognition experiments. The results show that the proposed scheme improves robust recognition accuracy over the conventional unimodal systems and the baseline reliability ratio-based AVSR system under various signal to noise ratio conditions.

  12. Perception of Speech by Individuals with Parkinson's Disease: A Review

    Directory of Open Access Journals (Sweden)

    Lorinda C. Kwan

    2011-01-01

    Full Text Available A few clinical reports and empirical studies have suggested a possible deficit in the perception of speech in individuals with Parkinson's disease. In this paper, these studies are reviewed in an attempt to support clinical anecdotal observations by relevant empirical research findings. The combined evidence suggests a possible deficit in patients' perception of their own speech loudness. Other research studies on the perception of speech in this population were reviewed, in a broader scope of the perception of emotional prosody. These studies confirm that Parkinson's disease specifically impairs patients' perception of verbal emotions. However, explanations of the nature and causes of this perceptual deficit are still limited. Future research directions are suggested.

  13. Experimental study on phase perception in speech

    Institute of Scientific and Technical Information of China (English)

    BU Fanliang; CHEN Yanpu

    2003-01-01

    As the human ear is dull to the phase in speech, little attention has been paid tophase information in speech coding. In fact, the speech perceptual quality may be degeneratedif the phase distortion is very large. The perceptual effect of the STFT (Short time Fouriertransform) phase spectrum is studied by auditory subjective hearing tests. Three main con-clusions are (1) If the phase information is neglected completely, the subjective quality of thereconstructed speech may be very poor; (2) Whether the neglected phase is in low frequencyband or high frequency band, the difference from the original speech can be perceived by ear;(3) It is very difficult for the human ear to perceive the difference of speech quality betweenoriginal speech and reconstructed speech while the phase quantization step size is shorter thanπ/7.

  14. Exploring the Role of Brain Oscillations in Speech Perception in Noise: Intelligibility of Isochronously Retimed Speech

    Science.gov (United States)

    Aubanel, Vincent; Davis, Chris; Kim, Jeesun

    2016-01-01

    A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

  15. Exploring the Role of Brain Oscillations in Speech Perception in Noise: Intelligibility of Isochronously Retimed Speech.

    Science.gov (United States)

    Aubanel, Vincent; Davis, Chris; Kim, Jeesun

    2016-01-01

    A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise. PMID:27630552

  16. Production and perception of clear speech in Croatian and English

    Science.gov (United States)

    Smiljanić, Rajka; Bradlow, Ann R.

    2005-09-01

    Previous research has established that naturally produced English clear speech is more intelligible than English conversational speech. The major goal of this paper was to establish the presence of the clear speech effect in production and perception of a language other than English, namely Croatian. A systematic investigation of the conversational-to-clear speech transformations across languages with different phonological properties (e.g., large versus small vowel inventory) can provide a window into the interaction of general auditory-perceptual and phonological, structural factors that contribute to the high intelligibility of clear speech. The results of this study showed that naturally produced clear speech is a distinct, listener-oriented, intelligibility-enhancing mode of speech production in both languages. Furthermore, the acoustic-phonetic features of the conversational-to-clear speech transformation revealed cross-language similarities in clear speech production strategies. In both languages, talkers exhibited a decrease in speaking rate and an increase in pitch range, as well as an expansion of the vowel space. Notably, the findings of this study showed equivalent vowel space expansion in English and Croatian clear speech, despite the difference in vowel inventory size across the two languages, suggesting that the extent of vowel contrast enhancement in hyperarticulated clear speech is independent of vowel inventory size.

  17. Neural correlates of quality perception for complex speech signals

    CERN Document Server

    Antons, Jan-Niklas

    2015-01-01

    This book interconnects two essential disciplines to study the perception of speech: Neuroscience and Quality of Experience, which to date have rarely been used together for the purposes of research on speech quality perception. In five key experiments, the book demonstrates the application of standard clinical methods in neurophysiology on the one hand, and of methods used in fields of research concerned with speech quality perception on the other. Using this combination, the book shows that speech stimuli with different lengths and different quality impairments are accompanied by physiological reactions related to quality variations, e.g., a positive peak in an event-related potential. Furthermore, it demonstrates that – in most cases – quality impairment intensity has an impact on the intensity of physiological reactions.

  18. Children's perception of their synthetically corrected speech production.

    Science.gov (United States)

    Strömbergsson, Sofia; Wengelin, Asa; House, David

    2014-06-01

    We explore children's perception of their own speech - in its online form, in its recorded form, and in synthetically modified forms. Children with phonological disorder (PD) and children with typical speech and language development (TD) performed tasks of evaluating accuracy of the different types of speech stimuli, either immediately after having produced the utterance or after a delay. In addition, they performed a task designed to assess their ability to detect synthetic modification. Both groups showed high performance in tasks involving evaluation of other children's speech, whereas in tasks of evaluating one's own speech, the children with PD were less accurate than their TD peers. The children with PD were less sensitive to misproductions in immediate conjunction with their production of an utterance, and more accurate after a delay. Within-category modification often passed undetected, indicating a satisfactory quality of the generated speech. Potential clinical benefits of using corrective re-synthesis are discussed.

  19. Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

    Directory of Open Access Journals (Sweden)

    José Antonio Palao Errando

    2007-10-01

    Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.

  20. Cognitive Factors and Cochlear Implants: Some Thoughts on Perception, Learning, and Memory in Speech Perception

    OpenAIRE

    Pisoni, David B.

    2000-01-01

    Over the past few years, there has been increased interest in studying some of the cognitive factors that affect speech perception performance of cochlear implant patients. In this paper, I provide a brief theoretical overview of the fundamental assumptions of the information-processing approach to cognition and discuss the role of perception, learning, and memory in speech perception and spoken language processing. The information-processing framework provides researchers and clinicians with...

  1. Comparing Infants' Preference for Correlated Audiovisual Speech with Signal-Level Computational Models

    Science.gov (United States)

    Hollich, George; Prince, Christopher G.

    2009-01-01

    How much of infant behaviour can be accounted for by signal-level analyses of stimuli? The current paper directly compares the moment-by-moment behaviour of 8-month-old infants in an audiovisual preferential looking task with that of several computational models that use the same video stimuli as presented to the infants. One type of model…

  2. Psychophysics of the McGurk and Other Audiovisual Speech Integration Effects

    Science.gov (United States)

    Jiang, Jintao; Bernstein, Lynne E.

    2011-01-01

    When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called "McGurk effect"), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for…

  3. Model-based assessment of factors influencing categorical audiovisual pereception

    OpenAIRE

    Andersen, Tobias S.

    2005-01-01

    Information processing in the sensory modalities is not segregated but interacts strongly. The exact nature of this interaction is not known and might differ for different multisensory phenomena. Here, we investigate two cases of categorical audiovisual perception: speech perception and the perception of rapid flashes and beeps. It is known that multisensory interactions in general depend on physical factors, such as information reliability and modality appropriateness, but it is not know...

  4. Perception of words and pitch patterns in song and speech

    Directory of Open Access Journals (Sweden)

    Julia eMerrill

    2012-03-01

    Full Text Available This fMRI study examines shared and distinct cortical areas involved in the auditory perception of song and speech at the level of their underlying constituents: words, pitch and rhythm. Univariate and multivariate analyses were performed on the brain activity patterns of six conditions, arranged in a subtractive hierarchy: sung sentences including words, pitch and rhythm; hummed speech prosody and song melody containing only pitch patterns and rhythm; as well as the pure musical or speech rhythm.Systematic contrasts between these balanced conditions following their hierarchical organization showed a great overlap between song and speech at all levels in the bilateral temporal lobe, but suggested a differential role of the inferior frontal gyrus (IFG and intraparietal sulcus (IPS in processing song and speech. The left IFG was involved in word- and pitch-related processing in speech, the right IFG in processing pitch in song.Furthermore, the IPS showed sensitivity to discrete pitch relations in song as opposed to the gliding pitch in speech. Finally, the superior temporal gyrus and premotor cortex coded for general differences between words and pitch patterns, irrespective of whether they were sung or spoken. Thus, song and speech share many features which are reflected in a fundamental similarity of brain areas involved in their perception. However, fine-grained acoustic differences on word and pitch level are reflected in the activity of IFG and IPS.

  5. Speech perception at the interface of neurobiology and linguistics.

    Science.gov (United States)

    Poeppel, David; Idsardi, William J; van Wassenhove, Virginie

    2008-03-12

    Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20-80 ms, approx. 150-300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an 'analysis-by-synthesis' approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.

  6. The contribution of visual information to the perception of speech in noise with and without informative temporal fine structure.

    Science.gov (United States)

    Stacey, Paula C; Kitterick, Pádraig T; Morris, Saffron D; Sumner, Christian J

    2016-06-01

    Understanding what is said in demanding listening situations is assisted greatly by looking at the face of a talker. Previous studies have observed that normal-hearing listeners can benefit from this visual information when a talker's voice is presented in background noise. These benefits have also been observed in quiet listening conditions in cochlear-implant users, whose device does not convey the informative temporal fine structure cues in speech, and when normal-hearing individuals listen to speech processed to remove these informative temporal fine structure cues. The current study (1) characterised the benefits of visual information when listening in background noise; and (2) used sine-wave vocoding to compare the size of the visual benefit when speech is presented with or without informative temporal fine structure. The accuracy with which normal-hearing individuals reported words in spoken sentences was assessed across three experiments. The availability of visual information and informative temporal fine structure cues was varied within and across the experiments. The results showed that visual benefit was observed using open- and closed-set tests of speech perception. The size of the benefit increased when informative temporal fine structure cues were removed. This finding suggests that visual information may play an important role in the ability of cochlear-implant users to understand speech in many everyday situations. Models of audio-visual integration were able to account for the additional benefit of visual information when speech was degraded and suggested that auditory and visual information was being integrated in a similar way in all conditions. The modelling results were consistent with the notion that audio-visual benefit is derived from the optimal combination of auditory and visual sensory cues. PMID:27085797

  7. Categorical perception of speech by children with specific language impairments.

    Science.gov (United States)

    Coady, Jeffry A; Kluender, Keith R; Evans, Julia L

    2005-08-01

    Previous research has suggested that children with specific language impairments (SLI) have deficits in basic speech perception abilities, and this may be an underlying source of their linguistic deficits. These findings have come from studies in which perception of synthetic versions of meaningless syllables was typically examined in tasks with high memory demands. In this study, 20 children with SLI (mean age = 9 years, 3 months) and 20 age-matched peers participated in a categorical perception task. Children identified and discriminated digitally edited versions of naturally spoken real words in tasks designed to minimize memory requirements. Both groups exhibited all hallmarks of categorical perception: a sharp labeling function, discontinuous discrimination performance, and discrimination predicted from identification. There were no group differences for identification data, but children with SLI showed lower peak discrimination values. Children with SLI still discriminated phonemically contrastive pairs at levels significantly better than chance, with discrimination of same-label pairs at chance. These data suggest that children with SLI perceive natural speech tokens comparably to age-matched controls when listening to words under conditions that minimize memory load. Further, poor performance on speech perception tasks may not be due to a speech perception deficit, but rather to a consequence of task demands. PMID:16378484

  8. The effects of noise vocoding on speech quality perception.

    Science.gov (United States)

    Anderson, Melinda C; Arehart, Kathryn H; Kates, James M

    2014-03-01

    Speech perception depends on access to spectral and temporal acoustic cues. Temporal cues include slowly varying amplitude changes (i.e. temporal envelope, TE) and quickly varying amplitude changes associated with the center frequency of the auditory filter (i.e. temporal fine structure, TFS). This study quantifies the effects of TFS randomization through noise vocoding on the perception of speech quality by parametrically varying the amount of original TFS available above 1500Hz. The two research aims were: 1) to establish the role of TFS in quality perception, and 2) to determine if the role of TFS in quality perception differs between subjects with normal hearing and subjects with sensorineural hearing loss. Ratings were obtained from 20 subjects (10 with normal hearing and 10 with hearing loss) using an 11-point quality scale. Stimuli were processed in three different ways: 1) A 32-channel noise-excited vocoder with random envelope fluctuations in the noise carrier, 2) a 32-channel noise-excited vocoder with the noise-carrier envelope smoothed, and 3) removal of high-frequency bands. Stimuli were presented in quiet and in babble noise at 18dB and 12dB signal-to-noise ratios. TFS randomization had a measurable detrimental effect on quality ratings for speech in quiet and a smaller effect for speech in background babble. Subjects with normal hearing and subjects with sensorineural hearing loss provided similar quality ratings for noise-vocoded speech. PMID:24333929

  9. Investigating Speech Perception in Children with Dyslexia: Is There Evidence of a Consistent Deficit in Individuals?

    Science.gov (United States)

    Messaoud-Galusi, Souhila; Hazan, Valerie; Rosen, Stuart

    2011-01-01

    Purpose: The claim that speech perception abilities are impaired in dyslexia was investigated in a group of 62 children with dyslexia and 51 average readers matched in age. Method: To test whether there was robust evidence of speech perception deficits in children with dyslexia, speech perception in noise and quiet was measured using 8 different…

  10. Speech perception in a sparse domain

    OpenAIRE

    Li, Guoping

    2008-01-01

    Environmental statistics are known to be important factors shaping our perceptual system. The visual and auditory systems have evolved to be effcient for processing natural images or speech. The com- mon characteristics between natural images and speech are that they are both highly structured, therefore having much redundancy. Our perceptual system may use redundancy reduction and sparse coding strategies to deal with complex stimuli every day. Both redundancy reduction ...

  11. Oscillation encoding of individual differences in speech perception

    OpenAIRE

    Jin, Yu; Díaz, Begoña; Colomer, Marc; Sebastián Gallés, Núria

    2014-01-01

    Individual differences in second language (L2) phoneme perception (within the normal population) have been related to speech perception abilities, also observed in the native language, in studies assessing the electrophysiological response mismatch negativity (MMN). Here, we investigate the brain oscillatory dynamics in the theta band, the spectral correlate of the MMN, that underpin success in phoneme learning. Using previous data obtained in an MMN paradigm, the dynamics of cort...

  12. The Role of the Listener's State in Speech Perception

    Science.gov (United States)

    Viswanathan, Navin

    2009-01-01

    Accounts of speech perception disagree on whether listeners perceive the acoustic signal (Diehl, Lotto, & Holt, 2004) or the vocal tract gestures that produce the signal (e.g., Fowler, 1986). In this dissertation, I outline a research program using a phenomenon called "perceptual compensation for coarticulation" (Mann, 1980) to examine this…

  13. Auditory Sensitivity, Speech Perception, and Reading Development and Impairment

    Science.gov (United States)

    Zhang, Juan; McBride-Chang, Catherine

    2010-01-01

    While the importance of phonological sensitivity for understanding reading acquisition and impairment across orthographies is well documented, what underlies deficits in phonological sensitivity is not well understood. Some researchers have argued that speech perception underlies variability in phonological representations. Others have…

  14. Computational validation of the motor contribution to speech perception.

    Science.gov (United States)

    Badino, Leonardo; D'Ausilio, Alessandro; Fadiga, Luciano; Metta, Giorgio

    2014-07-01

    Action perception and recognition are core abilities fundamental for human social interaction. A parieto-frontal network (the mirror neuron system) matches visually presented biological motion information onto observers' motor representations. This process of matching the actions of others onto our own sensorimotor repertoire is thought to be important for action recognition, providing a non-mediated "motor perception" based on a bidirectional flow of information along the mirror parieto-frontal circuits. State-of-the-art machine learning strategies for hand action identification have shown better performances when sensorimotor data, as opposed to visual information only, are available during learning. As speech is a particular type of action (with acoustic targets), it is expected to activate a mirror neuron mechanism. Indeed, in speech perception, motor centers have been shown to be causally involved in the discrimination of speech sounds. In this paper, we review recent neurophysiological and machine learning-based studies showing (a) the specific contribution of the motor system to speech perception and (b) that automatic phone recognition is significantly improved when motor data are used during training of classifiers (as opposed to learning from purely auditory data). PMID:24935820

  15. Speech Perception Ability in Individuals with Friedreich Ataxia

    Science.gov (United States)

    Rance, Gary; Fava, Rosanne; Baldock, Heath; Chong, April; Barker, Elizabeth; Corben, Louise; Delatycki

    2008-01-01

    The aim of this study was to investigate auditory pathway function and speech perception ability in individuals with Friedreich ataxia (FRDA). Ten subjects confirmed by genetic testing as being homozygous for a GAA expansion in intron 1 of the FXN gene were included. While each of the subjects demonstrated normal, or near normal sound detection, 3…

  16. Audiovisual bimodal mutual compensation of Chinese

    Institute of Scientific and Technical Information of China (English)

    ZHOU; Zhi

    2001-01-01

    [1]Richard, P., Schumeyer, Kenneth E. B., The effect of visual information on word initial consonant perception of dysarthric speech, in Proc. ICSLP'96 October 3-6 1996, Philadephia, Pennsylvania, USA.[2]Goff, B. L., Marigny, T. G., Benoit, C., Read my lips...and my jaw! How intelligible are the components of a speaker's face? Eurospeech'95, 4th European Conference on Speech Communication and Technology, Madrid, September 1995.[3]McGurk, H., MacDonald, J. Hearing lips and seeing voices, Nature, 1976, 264: 746.[4]Duran A. F., Mcgurk effect in Spanish and German listeners: Influences of visual cues in the perception of Spanish and German confliction audio-visual stimuli, Eurospeech'95. 4th European Conference on Speech Communication and Technology, Madrid, September 1995.[5]Luettin, J., Visual speech and speaker recognition, Ph.D thesis, University of Sheffield, 1997.[6]Xu Yanjun, Du Limin, Chinese audiovisual bimodal speech database CAVSR1.0, Chinese Journal of Acoustics, to appear.[7]Zhang Jialu, Speech corpora and language input/output methods' evaluation, Chinese Applied Acoustics, 1994, 13(3): 5.

  17. "Perception of the speech code" revisited: Speech is alphabetic after all.

    Science.gov (United States)

    Fowler, Carol A; Shankweiler, Donald; Studdert-Kennedy, Michael

    2016-03-01

    We revisit an article, "Perception of the Speech Code" (PSC), published in this journal 50 years ago (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967) and address one of its legacies concerning the status of phonetic segments, which persists in theories of speech today. In the perspective of PSC, segments both exist (in language as known) and do not exist (in articulation or the acoustic speech signal). Findings interpreted as showing that speech is not a sound alphabet, but, rather, phonemes are encoded in the signal, coupled with findings that listeners perceive articulation, led to the motor theory of speech perception, a highly controversial legacy of PSC. However, a second legacy, the paradoxical perspective on segments has been mostly unquestioned. We remove the paradox by offering an alternative supported by converging evidence that segments exist in language both as known and as used. We support the existence of segments in both language knowledge and in production by showing that phonetic segments are articulatory and dynamic and that coarticulation does not eliminate them. We show that segments leave an acoustic signature that listeners can track. This suggests that speech is well-adapted to public communication in facilitating, not creating a barrier to, exchange of language forms. PMID:26301536

  18. Cortical Mechanisms of Speech Perception in Noise

    Science.gov (United States)

    Wong, Patrick C. M.; Uppunda, Ajith K.; Parrish, Todd B.; Dhar, Sumitrajit

    2008-01-01

    Purpose: The present study examines the brain basis of listening to spoken words in noise, which is a ubiquitous characteristic of communication, with the focus on the dorsal auditory pathway. Method: English-speaking young adults identified single words in 3 listening conditions while their hemodynamic response was measured using fMRI: speech in…

  19. An Analysis of Speech Structure and Perception Processes and Its Effects on Oral English Teaching Centering around Lexical Chunks

    Institute of Scientific and Technical Information of China (English)

    ZHOU Li; NIE Yong-Wei

    2015-01-01

    The paper tries to analyze speech perception in terms of its structure, process, levels and models. Some problems con⁃cerning speech perception have been touched upon. The paper aims at providing some reference for oral English teaching and learning in the light of speech perception. It is intended to arouse readers’reflection upon the effect of speech perception on oral English teaching.

  20. Bilingualism affects audiovisual phoneme identification.

    Science.gov (United States)

    Burfin, Sabine; Pascalis, Olivier; Ruiz Tada, Elisa; Costa, Albert; Savariaux, Christophe; Kandel, Sonia

    2014-01-01

    We all go through a process of perceptual narrowing for phoneme identification. As we become experts in the languages we hear in our environment we lose the ability to identify phonemes that do not exist in our native phonological inventory. This research examined how linguistic experience-i.e., the exposure to a double phonological code during childhood-affects the visual processes involved in non-native phoneme identification in audiovisual speech perception. We conducted a phoneme identification experiment with bilingual and monolingual adult participants. It was an ABX task involving a Bengali dental-retroflex contrast that does not exist in any of the participants' languages. The phonemes were presented in audiovisual (AV) and audio-only (A) conditions. The results revealed that in the audio-only condition monolinguals and bilinguals had difficulties in discriminating the retroflex non-native phoneme. They were phonologically "deaf" and assimilated it to the dental phoneme that exists in their native languages. In the audiovisual presentation instead, both groups could overcome the phonological deafness for the retroflex non-native phoneme and identify both Bengali phonemes. However, monolinguals were more accurate and responded quicker than bilinguals. This suggests that bilinguals do not use the same processes as monolinguals to decode visual speech.

  1. [Speech perception test in Italian language for profoundly deaf children].

    Science.gov (United States)

    Genovese, E; Orzan, E; Turrini, M; Babighian, G; Arslan, E

    1995-10-01

    Speech perception tests are an important part of procedures for diagnosing pre-verbal hearing loss. Merely establishing a child's hearing threshold with and without a hearing aid is not sufficient to ensure an adequate evaluation with a view to selecting cases suitable for cochlear implants because it fails to indicate the real benefit obtained from using a conventional hearing aid reliably. Speech perception tests have proved useful not only for patient selection, but also for subsequent evaluation of the efficacy of new hearing aids, such as tactile devices and cochlear implants. In clinical practice, the tests most commonly adopted with small children are: The Auditory Comprehension Test (ACT), Discrimination after Training (DAT), Monosyllable, Trochee, Spondee tests (MTS), Glendonald Auditory Screening Priocedure (GASP), Early Speech Perception Test (ESP), Rather than considering specific results achieved in individual cases, reference is generally made to the four speech perception classes proposed by Moog and Geers of the CID of St. Louis. The purpose of this classification, made on the results obtained with suitably differentiated tests according to the child's age and language ability, is to detect differences in perception of a spoken message in ideal listening conditions. To date, no italian language speech perception test has been designed to establish the assessment of speech perception level in children with profound hearing impairment. We attempted, therefore, to adapt the existing English tests to the Italian language taking into consideration the differences between the two languages. Our attention focused on the ESP test since it can be applied to even very small children (2 years old). The ESP is proposed in a standard version for hearing-impaired children over the age of 6 years and in a simplified version for younger children. The rationale we used for selecting Italian words reflect the rationale established for the original version, but the

  2. Auditory Speech Perception Tests in Relation to the Coding Strategy in Cochlear Implant

    OpenAIRE

    Bazon, Aline Cristine; Mantello, Erika Barioni; Gonçales, Alina Sanches; Isaac, Myriam de Lima; Hyppolito, Miguel Angelo; Reis, Ana Cláudia Mirândola Barbosa

    2015-01-01

    Introduction  The objective of the evaluation of auditory perception of cochlear implant users is to determine how the acoustic signal is processed, leading to the recognition and understanding of sound. Objective  To investigate the differences in the process of auditory speech perception in individuals with postlingual hearing loss wearing a cochlear implant, using two different speech coding strategies, and to analyze speech perception and handicap perception in relation to the strategy us...

  3. Effect of preceding speech on nonspeech sound perception

    Science.gov (United States)

    Stephens, Joseph D.; Holt, Lori L.

    2002-05-01

    Data from Japanese quail suggest that the effect of preceding liquids (/l/ or /r/) on response to subsequent stops (/g/ or /d/) arises from general auditory processes sensitive to the spectral structure of sound [A. J. Lotto, K. R. Kluender, and L. L. Holt, J. Acoust. Soc. Am. 102, 1134-1140 (1997)]. If spectral content is key, appropriate nonspeech sounds should influence perception of speech sounds and vice versa. The former effect has been demonstrated [A. J. Lotto and K. R. Kluender, Percept. Psychophys. 60, 602-619 (1998)]. The current experiment investigated the influence of speech on the perception of nonspeech sounds. Nonspeech stimuli were 80-ms chirps modeled after the F2 and F3 transitions in /ga/ and /da/. F3 onset was increased in equal steps from 1800 Hz (/ga/ analog) to 2700 Hz (/da/ analog) to create a ten-member series. During AX discrimination trials, listeners heard chirps that were three steps apart on the series. Each chirp was preceded by a synthesized /al/ or /ar/. Results showed context effects predicted from differences in spectral content between the syllables and chirps. These results are consistent with the hypothesis that spectral contrast influences context effects in speech perception. [Work supported by ONR, NOHR, and CNBC.

  4. A New Development in Audiovisual Translation Studies: Focus on Target Audience Perception

    Directory of Open Access Journals (Sweden)

    John Denton

    2013-03-01

    Full Text Available Audiovisual translation is now a well-established sub-discipline of Translation Studies (TS: a position that it has reached over the last twenty years or so. Italian scholars and professionals in the field have made a substantial contribution to this successful development, a brief overview of which will be given in the first part of this article, inevitably concentrating on dubbing in the Italian context. Special attention will be devoted to the question of target audience perception, an area where researchers in the University of Bologna at Forlì have excelled. The second part of the article applies the methodology followed by the above mentioned researchers in a case study of how Italian end users perceive the dubbed version of the British film The History Boys (2006, which contains a plethora of culture-specific verbal and visual references to the English education system. The aim of the study was to ascertain: a whether translation/adaptation allows the transmission in this admittedly constrained medium of all the intended culture-bound issues, only too well known to the source audience, and, if so, to what extent, and b whether the target audience respondents to the e-questionnaire used were aware that they were missing information. The linked, albeit controversial, issue of quality assessment will also be addressed.

  5. Music training and speech perception: a gene-environment interaction.

    Science.gov (United States)

    Schellenberg, E Glenn

    2015-03-01

    Claims of beneficial side effects of music training are made for many different abilities, including verbal and visuospatial abilities, executive functions, working memory, IQ, and speech perception in particular. Such claims assume that music training causes the associations even though children who take music lessons are likely to differ from other children in music aptitude, which is associated with many aspects of speech perception. Music training in childhood is also associated with cognitive, personality, and demographic variables, and it is well established that IQ and personality are determined largely by genetics. Recent evidence also indicates that the role of genetics in music aptitude and music achievement is much larger than previously thought. In short, music training is an ideal model for the study of gene-environment interactions but far less appropriate as a model for the study of plasticity. Children seek out environments, including those with music lessons, that are consistent with their predispositions; such environments exaggerate preexisting individual differences. PMID:25773632

  6. Music training and speech perception: a gene-environment interaction.

    Science.gov (United States)

    Schellenberg, E Glenn

    2015-03-01

    Claims of beneficial side effects of music training are made for many different abilities, including verbal and visuospatial abilities, executive functions, working memory, IQ, and speech perception in particular. Such claims assume that music training causes the associations even though children who take music lessons are likely to differ from other children in music aptitude, which is associated with many aspects of speech perception. Music training in childhood is also associated with cognitive, personality, and demographic variables, and it is well established that IQ and personality are determined largely by genetics. Recent evidence also indicates that the role of genetics in music aptitude and music achievement is much larger than previously thought. In short, music training is an ideal model for the study of gene-environment interactions but far less appropriate as a model for the study of plasticity. Children seek out environments, including those with music lessons, that are consistent with their predispositions; such environments exaggerate preexisting individual differences.

  7. How the demographic makeup of our community influences speech perception.

    Science.gov (United States)

    Lev-Ari, Shiri; Peperkamp, Sharon

    2016-06-01

    Speech perception is known to be influenced by listeners' expectations of the speaker. This paper tests whether the demographic makeup of individuals' communities can influence their perception of foreign sounds by influencing their expectations of the language. Using online experiments with participants from all across the U.S. and matched census data on the proportion of Spanish and other foreign language speakers in participants' communities, this paper shows that the demographic makeup of individuals' communities influences their expectations of foreign languages to have an alveolar trill versus a tap (Experiment 1), as well as their consequent perception of these sounds (Experiment 2). Thus, the paper shows that while individuals' expectations of foreign language to have a trill occasionally lead them to misperceive a tap in a foreign language as a trill, a higher proportion of non-trill language speakers in one's community decreases this likelihood. These results show that individuals' environment can influence their perception by shaping their linguistic expectations.

  8. Cross-Cultural Variation of Politeness Orientation & Speech Act Perception

    Directory of Open Access Journals (Sweden)

    Nisreen Naji Al-Khawaldeh

    2013-05-01

    Full Text Available This paper presents the findings of an empirical study which compares Jordanian and English native speakers’ perceptions about the speech act of thanking. The forty interviews conducted revealed some similarities but also of remarkable cross-cultural differences relating to the significance of thanking, the variables affecting it, and the appropriate linguistic and paralinguistic choices, as well as their impact on the interpretation of thanking behaviour. The most important theoretical finding is that the data, while consistent with many views found in the existing literature, do not support Brown and Levinson’s (1987 claim that thanking is a speech act which intrinsically threatens the speaker’s negative face because it involves overt acceptance of an imposition on the speaker.  Rather, thanking should be viewed as a means of establishing and sustaining social relationships. The study findings suggest that cultural variation in thanking is due to the high degree of sensitivity of this speech act to the complex interplay of a range of social and contextual variables, and point to some promising directions for further research.Keywords: Linguistic Variation, Cross-Cultural Pragmatics, Speech Act of Thanking, Perceptions of Politeness   

  9. Cross-Cultural Variation of Politeness Orientation & Speech Act Perception

    OpenAIRE

    Nisreen Naji Al-Khawaldeh; Vladimir Žegarac

    2013-01-01

    This paper presents the findings of an empirical study which compares Jordanian and English native speakers’ perceptions about the speech act of thanking. The forty interviews conducted revealed some similarities but also of remarkable cross-cultural differences relating to the significance of thanking, the variables affecting it, and the appropriate linguistic and paralinguistic choices, as well as their impact on the interpretation of thanking behaviour. The most important theoretical findi...

  10. The relationship of speech intelligibility with hearing sensitivity, cognition, and perceived hearing difficulties varies for different speech perception tests

    Directory of Open Access Journals (Sweden)

    Antje eHeinrich

    2015-06-01

    Full Text Available Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests.Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study.Forty-four listeners aged between 50-74 years with mild SNHL were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet, to medium (digit triplet perception in speech-shaped noise to high (sentence perception in modulated noise; cognitive tests of attention, memory, and nonverbal IQ; and self-report questionnaires of general health-related and hearing-specific quality of life.Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that auditory environments pose on

  11. The relationship of speech intelligibility with hearing sensitivity, cognition, and perceived hearing difficulties varies for different speech perception tests.

    Science.gov (United States)

    Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A

    2015-01-01

    Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests. Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study. Forty-four listeners aged between 50 and 74 years with mild sensorineural hearing loss were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet), to medium (digit triplet perception in speech-shaped noise) to high (sentence perception in modulated noise); cognitive tests of attention, memory, and non-verbal intelligence quotient; and self-report questionnaires of general health-related and hearing-specific quality of life. Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that

  12. Predicting individual variation in language from infant speech perception measures.

    Science.gov (United States)

    Cristia, Alejandrina; Seidl, Amanda; Junge, Caroline; Soderstrom, Melanie; Hagoort, Peter

    2014-01-01

    There are increasing reports that individual variation in behavioral and neurophysiological measures of infant speech processing predicts later language outcomes, and specifically concurrent or subsequent vocabulary size. If such findings are held up under scrutiny, they could both illuminate theoretical models of language development and contribute to the prediction of communicative disorders. A qualitative, systematic review of this emergent literature illustrated the variety of approaches that have been used and highlighted some conceptual problems regarding the measurements. A quantitative analysis of the same data established that the bivariate relation was significant, with correlations of similar strength to those found for well-established nonlinguistic predictors of language. Further exploration of infant speech perception predictors, particularly from a methodological perspective, is recommended. PMID:24320112

  13. Only Behavioral But Not Self-Report Measures of Speech Perception Correlate with Cognitive Abilities

    Science.gov (United States)

    Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A.

    2016-01-01

    Good speech perception and communication skills in everyday life are crucial for participation and well-being, and are therefore an overarching aim of auditory rehabilitation. Both behavioral and self-report measures can be used to assess these skills. However, correlations between behavioral and self-report speech perception measures are often low. One possible explanation is that there is a mismatch between the specific situations used in the assessment of these skills in each method, and a more careful matching across situations might improve consistency of results. The role that cognition plays in specific speech situations may also be important for understanding communication, as speech perception tests vary in their cognitive demands. In this study, the role of executive function, working memory (WM) and attention in behavioral and self-report measures of speech perception was investigated. Thirty existing hearing aid users with mild-to-moderate hearing loss aged between 50 and 74 years completed a behavioral test battery with speech perception tests ranging from phoneme discrimination in modulated noise (easy) to words in multi-talker babble (medium) and keyword perception in a carrier sentence against a distractor voice (difficult). In addition, a self-report measure of aided communication, residual disability from the Glasgow Hearing Aid Benefit Profile, was obtained. Correlations between speech perception tests and self-report measures were higher when specific speech situations across both were matched. Cognition correlated with behavioral speech perception test results but not with self-report. Only the most difficult speech perception test, keyword perception in a carrier sentence with a competing distractor voice, engaged executive functions in addition to WM. In conclusion, any relationship between behavioral and self-report speech perception is not mediated by a shared correlation with cognition. PMID:27242564

  14. Cortical dynamics of acoustic and phonological processing in speech perception.

    Directory of Open Access Journals (Sweden)

    Linjun Zhang

    Full Text Available In speech perception, a functional hierarchy has been proposed by recent functional neuroimaging studies: core auditory areas on the dorsal plane of superior temporal gyrus (STG are sensitive to basic acoustic characteristics, whereas downstream regions, specifically the left superior temporal sulcus (STS and middle temporal gyrus (MTG ventral to Heschl's gyrus (HG are responsive to abstract phonological features. What is unclear so far is the relationship between the dorsal and ventral processes, especially with regard to whether low-level acoustic processing is modulated by high-level phonological processing. To address the issue, we assessed sensitivity of core auditory and downstream regions to acoustic and phonological variations by using within- and across-category lexical tonal continua with equal physical intervals. We found that relative to within-category variation, across-category variation elicited stronger activation in the left middle MTG (mMTG, apparently reflecting the abstract phonological representations. At the same time, activation in the core auditory region decreased, resulting from the top-down influences of phonological processing. These results support a hierarchical organization of the ventral acoustic-phonological processing stream, which originates in the right HG/STG and projects to the left mMTG. Furthermore, our study provides direct evidence that low-level acoustic analysis is modulated by high-level phonological representations, revealing the cortical dynamics of acoustic and phonological processing in speech perception. Our findings confirm the existence of reciprocal progression projections in the auditory pathways and the roles of both feed-forward and feedback mechanisms in speech perception.

  15. Early Language Development of Children at Familial Risk of Dyslexia: Speech Perception and Production

    Science.gov (United States)

    Gerrits, Ellen; de Bree, Elise

    2009-01-01

    Speech perception and speech production were examined in 3-year-old Dutch children at familial risk of developing dyslexia. Their performance in speech sound categorisation and their production of words was compared to that of age-matched children with specific language impairment (SLI) and typically developing controls. We found that speech…

  16. Electrophysiological correlates of predictive coding of auditory location in the perception of natural audiovisual events

    Directory of Open Access Journals (Sweden)

    Jeroen eStekelenburg

    2012-05-01

    Full Text Available In many natural audiovisual events (e.g., a clap of the two hands, the visual signal precedes the sound and thus allows observers to predict when, where, and which sound will occur. Previous studies have already reported that there are distinct neural correlates of temporal (when versus phonetic/semantic (which content on audiovisual integration. Here we examined the effect of visual prediction of auditory location (where in audiovisual biological motion stimuli by varying the spatial congruency between the auditory and visual part of the audiovisual stimulus. Visual stimuli were presented centrally, whereas auditory stimuli were presented either centrally or at 90° azimuth. Typical subadditive amplitude reductions (AV – V < A were found for the auditory N1 and P2 for spatially congruent and incongruent conditions. The new finding is that the N1 suppression was larger for spatially congruent stimuli. A very early audiovisual interaction was also found at 30-50 ms in the spatially congruent condition, while no effect of congruency was found on the suppression of the P2. This indicates that visual prediction of auditory location can be coded very early in auditory processing.

  17. Role of contextual cues on the perception of spectrally reduced interrupted speech.

    Science.gov (United States)

    Patro, Chhayakanta; Mendel, Lisa Lucks

    2016-08-01

    Understanding speech within an auditory scene is constantly challenged by interfering noise in suboptimal listening environments when noise hinders the continuity of the speech stream. In such instances, a typical auditory-cognitive system perceptually integrates available speech information and "fills in" missing information in the light of semantic context. However, individuals with cochlear implants (CIs) find it difficult and effortful to understand interrupted speech compared to their normal hearing counterparts. This inefficiency in perceptual integration of speech could be attributed to further degradations in the spectral-temporal domain imposed by CIs making it difficult to utilize the contextual evidence effectively. To address these issues, 20 normal hearing adults listened to speech that was spectrally reduced and spectrally reduced interrupted in a manner similar to CI processing. The Revised Speech Perception in Noise test, which includes contextually rich and contextually poor sentences, was used to evaluate the influence of semantic context on speech perception. Results indicated that listeners benefited more from semantic context when they listened to spectrally reduced speech alone. For the spectrally reduced interrupted speech, contextual information was not as helpful under significant spectral reductions, but became beneficial as the spectral resolution improved. These results suggest top-down processing facilitates speech perception up to a point, and it fails to facilitate speech understanding when the speech signals are significantly degraded. PMID:27586760

  18. The Role of Broca's Area in Speech Perception: Evidence from Aphasia Revisited

    Science.gov (United States)

    Hickok, Gregory; Costanzo, Maddalena; Capasso, Rita; Miceli, Gabriele

    2011-01-01

    Motor theories of speech perception have been re-vitalized as a consequence of the discovery of mirror neurons. Some authors have even promoted a strong version of the motor theory, arguing that the motor speech system is critical for perception. Part of the evidence that is cited in favor of this claim is the observation from the early 1980s that…

  19. Noise on, Voicing off: Speech Perception Deficits in Children with Specific Language Impairment

    Science.gov (United States)

    Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian

    2011-01-01

    Speech perception of four phonetic categories (voicing, place, manner, and nasality) was investigated in children with specific language impairment (SLI) (n=20) and age-matched controls (n=19) in quiet and various noise conditions using an AXB two-alternative forced-choice paradigm. Children with SLI exhibited robust speech perception deficits in…

  20. The Development of the Mealings, Demuth, Dillon, and Buchholz Classroom Speech Perception Test

    Science.gov (United States)

    Mealings, Kiri T.; Demuth, Katherine; Buchholz, Jörg; Dillon, Harvey

    2015-01-01

    Purpose: Open-plan classroom styles are increasingly being adopted in Australia despite evidence that their high intrusive noise levels adversely affect learning. The aim of this study was to develop a new Australian speech perception task (the Mealings, Demuth, Dillon, and Buchholz Classroom Speech Perception Test) and use it in an open-plan…

  1. The Perception of Telephone-Processed Speech by Combined Electric and Acoustic Stimulation

    OpenAIRE

    Hu, Yi; Tahmina, Qudsia; Runge, Christina; Friedland, David R.

    2013-01-01

    This study assesses the effects of adding low- or high-frequency information to the band-limited telephone-processed speech on bimodal listeners’ telephone speech perception in quiet environments. In the proposed experiments, bimodal users were presented under quiet listening conditions with wideband speech (WB), bandpass-filtered telephone speech (300–3,400 Hz, BP), high-pass filtered speech (f > 300 Hz, HP, i.e., distorted frequency components above 3,400 Hz in telephone speech were restore...

  2. Perception of foreign-accented clear speech by younger and older English listeners

    OpenAIRE

    Li, Chi-Nin

    2009-01-01

    Naturally produced English clear speech has been shown to be more intelligible than English conversational speech. However, little is known about the extent of the clear speech effects in the production of nonnative English, and perception of foreign-accented English by younger and older listeners. The present study examined whether Cantonese speakers would employ the same strategies as those used by native English speakers in producing clear speech in their second language. Also, the clear s...

  3. Comparison of Speech Perception in Background Noise with Acceptance of Background Noise in Aided and Unaided Conditions.

    Science.gov (United States)

    Nabelek, Anna K.; Tampas, Joanna W.; Burchfield, Samuel B.

    2004-01-01

    l, speech perception in noiseBackground noise is a significant factor influencing hearing-aid satisfaction and is a major reason for rejection of hearing aids. Attempts have been made by previous researchers to relate the use of hearing aids to speech perception in noise (SPIN), with an expectation of improved speech perception followed by an…

  4. How the demographic makeup of our community influences speech perception.

    Science.gov (United States)

    Lev-Ari, Shiri; Peperkamp, Sharon

    2016-06-01

    Speech perception is known to be influenced by listeners' expectations of the speaker. This paper tests whether the demographic makeup of individuals' communities can influence their perception of foreign sounds by influencing their expectations of the language. Using online experiments with participants from all across the U.S. and matched census data on the proportion of Spanish and other foreign language speakers in participants' communities, this paper shows that the demographic makeup of individuals' communities influences their expectations of foreign languages to have an alveolar trill versus a tap (Experiment 1), as well as their consequent perception of these sounds (Experiment 2). Thus, the paper shows that while individuals' expectations of foreign language to have a trill occasionally lead them to misperceive a tap in a foreign language as a trill, a higher proportion of non-trill language speakers in one's community decreases this likelihood. These results show that individuals' environment can influence their perception by shaping their linguistic expectations. PMID:27369129

  5. On the perception/production interface in speech processing

    Science.gov (United States)

    Hemphill, Rachel Marie

    1999-10-01

    In a series of five experiments, the author tests the hypothesis that speech processing in the human mind demands two separate phonological representations: one for perception and one for production (Menn 1980, 1983; Straight 1980; Menn & Matthei 1992). The experiments probe the structure and of these mental categories and how they change in the process of acquisition. Three groups of native English-speaking subjects were taught to categorically perceive a three way Thai voicing contrast in synthetic bilabial stop consonants, which varied only in VOT (after Pisoni, Aslin, Perey, and Hennessy 1982). Perception and production tests were administered following training. Subjects showed the ability, which improved with training, to categorically identify the three-way voicing contrast. Subsequent acoustic and perceptual analyses showed that they were unable to produce the contrast correctly, producing no difference, or manipulating acoustic variables other than VOT (vowel duration, vowel quality, nasalization, etc.). When subjects' productions were compared to their pronunciations of English labial stops, it was found that subjects construct a new production category for the Thai prevoiced stop category. In contrast, subjects split their existing English perceptual /b/ category, indicating that perceptual and production phonological categories do not change in parallel. In a subsequent experiment, subjects were re-tested on perception of the synthetic stimuli, productions of two native Thai speakers, and on their own productions from the previous experiments. An analysis of the perceptual data shows that subjects performed equally well on the four tasks, indicating that they are no better at identifying their own productions than those of novel talkers or synthetic talkers. This finding contradicts the hypothetical direct link between perception and production phonologies. These results are explained in terms of separate expressive and receptive representations and the

  6. Audiovisual Resources.

    Science.gov (United States)

    Beasley, Augie E.; And Others

    1986-01-01

    Six articles on the use of audiovisual materials in the school library media center cover how to develop an audiovisual production center; audiovisual forms; a checklist for effective video/16mm use in the classroom; slides in learning; hazards of videotaping in the library; and putting audiovisuals on the shelf. (EJS)

  7. Audiovisual bimodal mutual compensation of Chinese

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    The perception of human languages is inherently a multi-modalprocess, in which audio information can be compensated by visual information to improve the recognition performance. Such a phenomenon in English, German, Spanish and so on has been researched, but in Chinese it has not been reported yet. In our experiment, 14 syllables (/ba, bi, bian, biao, bin, de, di, dian, duo, dong, gai, gan, gen, gu/), extracted from Chinese audiovisual bimodal speech database CAVSR-1.0, were pronounced by 10 subjects. The audio-only stimuli, audiovisual stimuli, and visual-only stimuli were recognized by 20 observers. The audio-only stimuli and audiovisual stimuli both were presented under 5 conditions: no noise, SNR 0 dB, -8 dB, -12 dB, and -16 dB. The experimental result is studied and the following conclusions for Chinese speech are reached. Human beings can recognize visual-only stimuli rather well. The place of articulation determines the visual distinction. In noisy environment, audio information can remarkably be compensated by visual information and as a result the recognition performance is greatly improved.

  8. Aided and unaided speech perception by older hearing impaired listeners.

    Directory of Open Access Journals (Sweden)

    David L Woods

    Full Text Available The most common complaint of older hearing impaired (OHI listeners is difficulty understanding speech in the presence of noise. However, tests of consonant-identification and sentence reception threshold (SeRT provide different perspectives on the magnitude of impairment. Here we quantified speech perception difficulties in 24 OHI listeners in unaided and aided conditions by analyzing (1 consonant-identification thresholds and consonant confusions for 20 onset and 20 coda consonants in consonant-vowel-consonant (CVC syllables presented at consonant-specific signal-to-noise (SNR levels, and (2 SeRTs obtained with the Quick Speech in Noise Test (QSIN and the Hearing in Noise Test (HINT. Compared to older normal hearing (ONH listeners, nearly all unaided OHI listeners showed abnormal consonant-identification thresholds, abnormal consonant confusions, and reduced psychometric function slopes. Average elevations in consonant-identification thresholds exceeded 35 dB, correlated strongly with impairments in mid-frequency hearing, and were greater for hard-to-identify consonants. Advanced digital hearing aids (HAs improved average consonant-identification thresholds by more than 17 dB, with significant HA benefit seen in 83% of OHI listeners. HAs partially normalized consonant-identification thresholds, reduced abnormal consonant confusions, and increased the slope of psychometric functions. Unaided OHI listeners showed much smaller elevations in SeRTs (mean 6.9 dB than in consonant-identification thresholds and SeRTs in unaided listening conditions correlated strongly (r = 0.91 with identification thresholds of easily identified consonants. HAs produced minimal SeRT benefit (2.0 dB, with only 38% of OHI listeners showing significant improvement. HA benefit on SeRTs was accurately predicted (r = 0.86 by HA benefit on easily identified consonants. Consonant-identification tests can accurately predict sentence processing deficits and HA benefit in OHI

  9. The relationship of phonological ability, speech perception and auditory perception in adults with dyslexia.

    Directory of Open Access Journals (Sweden)

    Jeremy eLaw

    2014-07-01

    Full Text Available This study investigated whether auditory, speech perception and phonological skills are tightly interrelated or independently contributing to reading. We assessed each of these three skills in 36 adults with a past diagnosis of dyslexia and 54 matched normal reading adults. Phonological skills were tested by the typical threefold tasks, i.e. rapid automatic naming, verbal short term memory and phonological awareness. Dynamic auditory processing skills were assessed by means of a frequency modulation (FM and an amplitude rise time (RT; an intensity discrimination task (ID was included as a non-dynamic control task. Speech perception was assessed by means of sentences and words in noise tasks. Group analysis revealed significant group differences in auditory tasks (i.e. RT and ID and in phonological processing measures, yet no differences were found for speech perception. In addition, performance on RT discrimination correlated with reading but this relation was mediated by phonological processing and not by speech in noise. Finally, inspection of the individual scores revealed that the dyslexic readers showed an increased proportion of deviant subjects on the slow-dynamic auditory and phonological tasks, yet each individual dyslexic reader does not display a clear pattern of deficiencies across the levels of processing skills. Although our results support phonological and slow-rate dynamic auditory deficits which relate to literacy, they suggest that at the individual level, problems in reading and writing cannot be explained by the cascading auditory theory. Instead, dyslexic adults seem to vary considerably in the extent to which each of the auditory and phonological factors are expressed and interact with environmental and higher-order cognitive influences.

  10. The relationship of phonological ability, speech perception, and auditory perception in adults with dyslexia.

    Science.gov (United States)

    Law, Jeremy M; Vandermosten, Maaike; Ghesquiere, Pol; Wouters, Jan

    2014-01-01

    This study investigated whether auditory, speech perception, and phonological skills are tightly interrelated or independently contributing to reading. We assessed each of these three skills in 36 adults with a past diagnosis of dyslexia and 54 matched normal reading adults. Phonological skills were tested by the typical threefold tasks, i.e., rapid automatic naming, verbal short-term memory and phonological awareness. Dynamic auditory processing skills were assessed by means of a frequency modulation (FM) and an amplitude rise time (RT); an intensity discrimination task (ID) was included as a non-dynamic control task. Speech perception was assessed by means of sentences and words-in-noise tasks. Group analyses revealed significant group differences in auditory tasks (i.e., RT and ID) and in phonological processing measures, yet no differences were found for speech perception. In addition, performance on RT discrimination correlated with reading but this relation was mediated by phonological processing and not by speech-in-noise. Finally, inspection of the individual scores revealed that the dyslexic readers showed an increased proportion of deviant subjects on the slow-dynamic auditory and phonological tasks, yet each individual dyslexic reader does not display a clear pattern of deficiencies across the processing skills. Although our results support phonological and slow-rate dynamic auditory deficits which relate to literacy, they suggest that at the individual level, problems in reading and writing cannot be explained by the cascading auditory theory. Instead, dyslexic adults seem to vary considerably in the extent to which each of the auditory and phonological factors are expressed and interact with environmental and higher-order cognitive influences.

  11. Modeling Interactions between Speech Production and Perception: Speech ErrorDetection at Semantic and Phonological Levels and the Inner Speech Loop

    Directory of Open Access Journals (Sweden)

    Bernd eKröger

    2016-05-01

    Full Text Available Production and comprehension of speech are closely interwoven. For example, the ability todetect an error in one's own speech, halt speech production, and finally correct the error can beexplained by assuming an inner speech loop which continuously compares the word representationsinduced by production to those induced by perception at various cognitive levels (e.g. conceptual, word,or phonological levels. Because spontaneous speech errors are relatively rare, a picture naming and haltparadigm can be used to evoke them. In this paradigm, picture presentation (target word initiation isfollowed by an auditory stop signal (distractor word for halting speech production. The current studyseeks to understand the neural mechanisms governing self-detection of speech errors by developing abiologically inspired neural model of the inner speech loop. The neural model is based on the NeuralEngineering Framework (NEF and consists of a network of about 500,000 spiking neurons. In the firstexperiment we induce simulated speech errors semantically and phonologically. In the secondexperiment, we simulate a picture naming and halt task. Target-distractor word pairs were balanced withrespect to variation of phonological and semantic similarity. The results of the first experiment show thatspeech errors are successfully detected by a monitoring component in the inner speech loop. The resultsof the second experiment show that the model correctly reproduces human behavioral data on thepicture naming and halt task. In particular, the halting rate in the production of target words was lowerfor phonologically similar words than for semantically similar or fully dissimilar distractor words. We thusconclude that the neural architecture proposed here to model the inner speech loop reflects importantinteractions in production and perception at phonological and semantic levels.

  12. Functional correlates of the speech-in-noise perception impairment in dyslexia: an MRI study.

    Science.gov (United States)

    Dole, Marjorie; Meunier, Fanny; Hoen, Michel

    2014-07-01

    Dyslexia is a language-based neurodevelopmental disorder. It is characterized as a persistent deficit in reading and spelling. These difficulties have been shown to result from an underlying impairment of the phonological component of language, possibly also affecting speech perception. Although there is little evidence for such a deficit under optimal, quiet listening conditions, speech perception difficulties in adults with dyslexia are often reported under more challenging conditions, such as when speech is masked by noise. Previous studies have shown that these difficulties are more pronounced when the background noise is speech and when little spatial information is available to facilitate differentiation between target and background sound sources. In this study, we investigated the neuroimaging correlates of speech-in-speech perception in typical readers and participants with dyslexia, focusing on the effects of different listening configurations. Fourteen adults with dyslexia and 14 matched typical readers performed a subjective intelligibility rating test with single words presented against concurrent speech during functional magnetic resonance imaging (fMRI) scanning. Target words were always presented with a four-talker background in one of three listening configurations: Dichotic, Binaural or Monaural. The results showed that in the Monaural configuration, in which no spatial information was available and energetic masking was maximal, intelligibility was severely decreased in all participants, and this effect was particularly strong in participants with dyslexia. Functional imaging revealed that in this configuration, participants partially compensate for their poorer listening abilities by recruiting several areas in the cerebral networks engaged in speech perception. In the Binaural configuration, participants with dyslexia achieved the same performance level as typical readers, suggesting that they were able to use spatial information when available

  13. Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.

    Science.gov (United States)

    Smayda, Kirsten E; Van Engen, Kristin J; Maddox, W Todd; Chandrasekaran, Bharath

    2016-01-01

    Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35) and thirty-three older adults (ages 60-90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both

  14. Compensation for Coarticulation: Disentangling Auditory and Gestural Theories of Perception of Coarticulatory Effects in Speech

    Science.gov (United States)

    Viswanathan, Navin; Magnuson, James S.; Fowler, Carol A.

    2010-01-01

    According to one approach to speech perception, listeners perceive speech by applying general pattern matching mechanisms to the acoustic signal (e.g., Diehl, Lotto, & Holt, 2004). An alternative is that listeners perceive the phonetic gestures that structured the acoustic signal (e.g., Fowler, 1986). The two accounts have offered different…

  15. Effects of Speech Style and Sex of Speaker on Person Perception.

    Science.gov (United States)

    Newcombe, Nora; Arnkoff, Diane B.

    1979-01-01

    Two experiments examined Lakoff's suggestion that men and women use different speech styles (women's speech being more polite and less assertive than men's). The effects of undergraduate students' use of three linguistic variables (tag questions, qualifiers, and compound requests) on person perception was tested. (CM)

  16. The Link between Speech Perception and Production Is Phonological and Abstract: Evidence from the Shadowing Task

    Science.gov (United States)

    Mitterer, Holger; Ernestus, Mirjam

    2008-01-01

    This study reports a shadowing experiment, in which one has to repeat a speech stimulus as fast as possible. We tested claims about a direct link between perception and production based on speech gestures, and obtained two types of counterevidence. First, shadowing is not slowed down by a gestural mismatch between stimulus and response. Second,…

  17. A Retrospective Multicenter Study Comparing Speech Perception Outcomes for Bilateral Implantation and Bimodal Rehabilitation

    NARCIS (Netherlands)

    Blamey, Peter J.; Maat, Bert; Başkent, Deniz; Mawman, Deborah; Burke, Elaine; Dillier, Norbert; Beynon, Andy; Kleine-Punte, Andrea; Govaerts, Paul J.; Skarzynski, Piotr H.; Huber, Alexander M.; Sterkers-Artieres, Francoise; Van de Heyning, Paul; O'Leary, Stephen; Fraysse, Bernard; Green, Kevin; Sterkers, Olivier; Venail, Frederic; Skarzynski, Henryk; Vincent, Christophe; Truy, Eric; Dowell, Richard; Bergeron, Francois; Lazard, Diane S.

    2015-01-01

    Objectives: To compare speech perception outcomes between bilateral implantation (cochlear implants [CIs]) and bimodal rehabilitation (one CI on one side plus one hearing aid [HA] on the other side) and to explore the clinical factors that may cause asymmetric performances in speech intelligibility

  18. Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech

    Science.gov (United States)

    Ben-David, Boaz M.; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H. H. M.

    2016-01-01

    Purpose: Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. Method: We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5…

  19. Speech perception under adverse conditions: insights from behavioral, computational, and neuroscience research

    OpenAIRE

    Guediche, Sara; Blumstein, Sheila E.; Fiez, Julie A.; Holt, Lori L.

    2014-01-01

    Adult speech perception reflects the long-term regularities of the native language, but it is also flexible such that it accommodates and adapts to adverse listening conditions and short-term deviations from native-language norms. The purpose of this article is to examine how the broader neuroscience literature can inform and advance research efforts in understanding the neural basis of flexibility and adaptive plasticity in speech perception. Specifically, we highlight the potential role of ...

  20. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Directory of Open Access Journals (Sweden)

    Yue Zhao

    2012-12-01

    Full Text Available Audio‐visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial‐temporal multimodal features from Tibetan audio‐visual speech data and build an accurate audio‐visual speech recognition model under a no frame‐independency assumption. The experiment results on Tibetan speech data from some real‐world environments showed the proposed DDBN outperforms the state‐of‐art methods in word recognition accuracy.

  1. Effect of signal to noise ratio on the speech perception ability of older adults

    Science.gov (United States)

    Shojaei, Elahe; Ashayeri, Hassan; Jafari, Zahra; Zarrin Dast, Mohammad Reza; Kamali, Koorosh

    2016-01-01

    Background: Speech perception ability depends on auditory and extra-auditory elements. The signal- to-noise ratio (SNR) is an extra-auditory element that has an effect on the ability to normally follow speech and maintain a conversation. Speech in noise perception difficulty is a common complaint of the elderly. In this study, the importance of SNR magnitude as an extra-auditory effect on speech perception in noise was examined in the elderly. Methods: The speech perception in noise test (SPIN) was conducted on 25 elderly participants who had bilateral low–mid frequency normal hearing thresholds at three SNRs in the presence of ipsilateral white noise. These participants were selected by available sampling method. Cognitive screening was done using the Persian Mini Mental State Examination (MMSE) test. Results: Independent T- test, ANNOVA and Pearson Correlation Index were used for statistical analysis. There was a significant difference in word discrimination scores at silence and at three SNRs in both ears (p≤0.047). Moreover, there was a significant difference in word discrimination scores for paired SNRs (0 and +5, 0 and +10, and +5 and +10 (p≤0.04)). No significant correlation was found between age and word recognition scores at silence and at three SNRs in both ears (p≥0.386). Conclusion: Our results revealed that decreasing the signal level and increasing the competing noise considerably reduced the speech perception ability in normal hearing at low–mid thresholds in the elderly. These results support the critical role of SNRs for speech perception ability in the elderly. Furthermore, our results revealed that normal hearing elderly participants required compensatory strategies to maintain normal speech perception in challenging acoustic situations. PMID:27390712

  2. Objective Neural Indices of Speech-in-Noise Perception

    OpenAIRE

    Anderson, Samira; Kraus, Nina

    2010-01-01

    Numerous factors contribute to understanding speech in noisy listening environments. There is a clinical need for objective biological assessment of auditory factors that contribute to the ability to hear speech in noise, factors that are free from the demands of attention and memory. Subcortical processing of complex sounds such as speech (auditory brainstem responses to speech and other complex stimuli [cABRs]) reflects the integrity of auditory function. Because cABRs physically resemble t...

  3. Audiovisual Generation of Social Attitudes from Neutral Stimuli

    OpenAIRE

    Barbulescu, Adela; Bailly, Gérard; Ronfard, Rémi; Pouget, Maël

    2015-01-01

    The focus of this study is the generation of expressive audiovisual speech from neutral utterances for 3D virtual actors. Taking into account the segmental and suprasegmental aspects of audiovisual speech, we propose and compare several computational frameworks for the generation of expressive speech and face animation. We notably evaluate a standard frame-based conversion approach with two other methods that postulate the existence of global prosodic audiovisual patterns that are characteris...

  4. Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort

    DEFF Research Database (Denmark)

    Schmidt, Erik

    2007-01-01

    Hearing aid processing of loud speech and noise signals: Consequences for loudness perception and listening comfort. Sound processing in hearing aids is determined by the fitting rule. The fitting rule describes how the hearing aid should amplify speech and sounds in the surroundings......, such that they become audible again for the hearing impaired person. The general goal is to place all sounds within the hearing aid users’ audible range, such that speech intelligibility and listening comfort become as good as possible. Amplification strategies in hearing aids are in many cases based on empirical...... research -for example investigations of loudness perception in hearing impaired listeners. Most research has been focused on speech and sounds at medium input-levels (e.g., 60-65 dB SPL). It is well documented that for speech at conversational levels, hearing aid-users prefer the signal to be amplified...

  5. Production and perception of listener-oriented clear speech in child language.

    Science.gov (United States)

    Syrett, Kristen; Kawahara, Shigeto

    2014-11-01

    In this paper, we ask whether children are sensitive to the needs of their interlocutor, and, if so, whether they - like adults - modify acoustic characteristics of their speech as part of a communicative goal. In a production task, preschoolers participated in a word learning task that favored the use of clear speech. Children produced vowels that were longer, more intense, more dispersed in the vowel space, and had a more expanded F0 range than normal speech. Two perception studies with adults showed that these acoustic differences were perceptible and were used to distinguish normal and clear speech styles. We conclude that preschoolers are sensitive to aspects of the speaker-hearer relationship calling upon them to modify their speech in ways that benefit their listener.

  6. Brain electric activity during the preattentive perception of speech sounds in tonal languages

    Directory of Open Access Journals (Sweden)

    Naiphinich Kotchabhakdi

    2004-05-01

    Full Text Available The present study was intended to make electrophysiological investigations into the preattentive perception of native and non-native speech sounds. We recorded the mismatch negativity, elicited by single syllable change of both native and non-native speech-sound contrasts in tonal languages. EEGs were recorded and low-resolution brain electromagnetic tomography (LORETA was utilized to explore the neural electrical activity. Our results suggested that the left hemisphere was predominant in the perception of native speech sounds, whereas the non-native speech sound was perceived predominantly by the right hemisphere, which may be explained by the specialization in processing the prosodic and emotional components of speech formed in this hemisphere.

  7. Contributions of electric and acoustic hearing to bimodal speech and music perception.

    Science.gov (United States)

    Crew, Joseph D; Galvin, John J; Landsberger, David M; Fu, Qian-Jie

    2015-01-01

    Cochlear implant (CI) users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA) only, and both devices together (CI+HA). Speech reception thresholds (SRTs) were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI) was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only) for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only). Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception. PMID:25790349

  8. Contributions of electric and acoustic hearing to bimodal speech and music perception.

    Directory of Open Access Journals (Sweden)

    Joseph D Crew

    Full Text Available Cochlear implant (CI users have difficulty understanding speech in noisy listening conditions and perceiving music. Aided residual acoustic hearing in the contralateral ear can mitigate these limitations. The present study examined contributions of electric and acoustic hearing to speech understanding in noise and melodic pitch perception. Data was collected with the CI only, the hearing aid (HA only, and both devices together (CI+HA. Speech reception thresholds (SRTs were adaptively measured for simple sentences in speech babble. Melodic contour identification (MCI was measured with and without a masker instrument; the fundamental frequency of the masker was varied to be overlapping or non-overlapping with the target contour. Results showed that the CI contributes primarily to bimodal speech perception and that the HA contributes primarily to bimodal melodic pitch perception. In general, CI+HA performance was slightly improved relative to the better ear alone (CI-only for SRTs but not for MCI, with some subjects experiencing a decrease in bimodal MCI performance relative to the better ear alone (HA-only. Individual performance was highly variable, and the contribution of either device to bimodal perception was both subject- and task-dependent. The results suggest that individualized mapping of CIs and HAs may further improve bimodal speech and music perception.

  9. Perceived synchrony for realistic and dynamic audiovisual events.

    Science.gov (United States)

    Eg, Ragnhild; Behne, Dawn M

    2015-01-01

    In well-controlled laboratory experiments, researchers have found that humans can perceive delays between auditory and visual signals as short as 20 ms. Conversely, other experiments have shown that humans can tolerate audiovisual asynchrony that exceeds 200 ms. This seeming contradiction in human temporal sensitivity can be attributed to a number of factors such as experimental approaches and precedence of the asynchronous signals, along with the nature, duration, location, complexity and repetitiveness of the audiovisual stimuli, and even individual differences. In order to better understand how temporal integration of audiovisual events occurs in the real world, we need to close the gap between the experimental setting and the complex setting of everyday life. With this work, we aimed to contribute one brick to the bridge that will close this gap. We compared perceived synchrony for long-running and eventful audiovisual sequences to shorter sequences that contain a single audiovisual event, for three types of content: action, music, and speech. The resulting windows of temporal integration showed that participants were better at detecting asynchrony for the longer stimuli, possibly because the long-running sequences contain multiple corresponding events that offer audiovisual timing cues. Moreover, the points of subjective simultaneity differ between content types, suggesting that the nature of a visual scene could influence the temporal perception of events. An expected outcome from this type of experiment was the rich variation among participants' distributions and the derived points of subjective simultaneity. Hence, the designs of similar experiments call for more participants than traditional psychophysical studies. Heeding this caution, we conclude that existing theories on multisensory perception are ready to be tested on more natural and representative stimuli.

  10. Speech-perception-in-noise and bilateral spatial abilities in adults with delayed sequential cochlear implantation

    Directory of Open Access Journals (Sweden)

    Ilze Oosthuizen

    2012-12-01

    Full Text Available Objective: To determine speech-perception-in-noise (with speech and noise spatially distinct and coincident and bilateral spatial benefits of head-shadow effect, summation, squelch and spatial release of masking in adults with delayed sequential cochlear implants. Study design: A cross-sectional one group post-test-only exploratory design was employed. Eleven adults (mean age 47 years; range 21 – 69 years of the Pretoria Cochlear Implant Programme (PCIP in South Africa with a bilateral severe-to-profound sensorineural hearing loss were recruited. Prerecorded Everyday Speech Sentences of The Central Institute for the Deaf (CID were used to evaluate participants’ speech-in-noise perception at sentence level. An adaptive procedure was used to determine the signal-to-noise ratio (SNR, in dB at which the participant’s speech reception threshold (SRT was achieved. Specific calculations were used to estimate bilateral spatial benefit effects. Results: A minimal bilateral benefit for speech-in-noise perception was observed with noise directed to the first implant (CI 1 (1.69 dB and in the speech and noise spatial listening condition (0.78 dB, but was not statistically significant. The head-shadow effect at 180° was the most robust bilateral spatial benefit. An improvement in speech perception in spatially distinct speech and noise indicates the contribution of the second implant (CI 2 is greater than that of the first implant (CI 1 for bilateral spatial benefit. Conclusion: Bilateral benefit for delayed sequentially implanted adults is less than previously reported for simultaneous and sequentially implanted adults. Delayed sequential implantation benefit seems to relate to the availability of the ear with the most favourable SNR.

  11. Mandarin speech perception in combined electric and acoustic stimulation.

    Directory of Open Access Journals (Sweden)

    Yongxin Li

    Full Text Available For deaf individuals with residual low-frequency acoustic hearing, combined use of a cochlear implant (CI and hearing aid (HA typically provides better speech understanding than with either device alone. Because of coarse spectral resolution, CIs do not provide fundamental frequency (F0 information that contributes to understanding of tonal languages such as Mandarin Chinese. The HA can provide good representation of F0 and, depending on the range of aided acoustic hearing, first and second formant (F1 and F2 information. In this study, Mandarin tone, vowel, and consonant recognition in quiet and noise was measured in 12 adult Mandarin-speaking bimodal listeners with the CI-only and with the CI+HA. Tone recognition was significantly better with the CI+HA in noise, but not in quiet. Vowel recognition was significantly better with the CI+HA in quiet, but not in noise. There was no significant difference in consonant recognition between the CI-only and the CI+HA in quiet or in noise. There was a wide range in bimodal benefit, with improvements often greater than 20 percentage points in some tests and conditions. The bimodal benefit was compared to CI subjects' HA-aided pure-tone average (PTA thresholds between 250 and 2000 Hz; subjects were divided into two groups: "better" PTA (50 dB HL. The bimodal benefit differed significantly between groups only for consonant recognition. The bimodal benefit for tone recognition in quiet was significantly correlated with CI experience, suggesting that bimodal CI users learn to better combine low-frequency spectro-temporal information from acoustic hearing with temporal envelope information from electric hearing. Given the small number of subjects in this study (n = 12, further research with Chinese bimodal listeners may provide more information regarding the contribution of acoustic and electric hearing to tonal language perception.

  12. The effect of short-term musical training on speech perception in noise

    Directory of Open Access Journals (Sweden)

    Chandni Jain

    2015-03-01

    Full Text Available The aim of the study was to assess the effect of short-term musical training on speech perception in noise. In the present study speech perception in noise was measured pre- and post- short-term musical training. The musical training involved auditory perceptual training for raga identification of two Carnatic ragas. The training was given for eight sessions. A total of 18 normal hearing adults in the age range of 18-25 years participated in the study wherein group 1 consisted of ten individuals who underwent musical training and group 2 consisted of eight individuals who did not undergo any training. Results revealed that post training, speech perception in noise improved significantly in group 1, whereas group 2 did not show any changes in speech perception scores. Thus, short-term musical training shows an enhancement of speech perception in the presence of noise. However, generalization and long-term maintenance of these benefits needs to be evaluated.

  13. The effects of bilingualism on children's perception of speech sounds

    NARCIS (Netherlands)

    Brasileiro, I.

    2009-01-01

    The general topic addressed by this dissertation is that of bilingualism, and more specifically, the topic of bilingual acquisition of speech sounds. The central question in this study is the following: does bilingualism affect children’s perceptual development of speech sounds? The term bilingual i

  14. Bimodal Hearing and Speech Perception with a Competing Talker

    Science.gov (United States)

    Pyschny, Verena; Landwehr, Markus; Hahn, Moritz; Walger, Martin; von Wedel, Hasso; Meister, Hartmut

    2011-01-01

    Purpose: The objective of the study was to investigate the influence of bimodal stimulation upon hearing ability for speech recognition in the presence of a single competing talker. Method: Speech recognition was measured in 3 listening conditions: hearing aid (HA) alone, cochlear implant (CI) alone, and both devices together (CI + HA). To examine…

  15. Impact of a moving noise masker on speech perception in cochlear implant users

    OpenAIRE

    Tobias Weissgerber; Tobias Rader; Uwe Baumann

    2015-01-01

    Objectives: Previous studies investigating speech perception in noise have typically been conducted with static masker positions. The aim of this study was to investigate the effect of spatial separation of source and masker (spatial release from masking, SRM) in a moving masker setup and to evaluate the impact of adaptive beamforming in comparison with fixed directional microphones in cochlear implant (CI) users. Design: Speech reception thresholds (SRT) were measured in S0N0 and in a mov...

  16. Brain electric activity during the preattentive perception of speech sounds in tonal languages

    OpenAIRE

    Naiphinich Kotchabhakdi; Chittin Chindaduangratn; Wichian Sittiprapaporn

    2004-01-01

    The present study was intended to make electrophysiological investigations into the preattentive perception of native and non-native speech sounds. We recorded the mismatch negativity, elicited by single syllable change of both native and non-native speech-sound contrasts in tonal languages. EEGs were recorded and low-resolution brain electromagnetic tomography (LORETA) was utilized to explore the neural electrical activity. Our results suggested that the left hemisphere was predominant in th...

  17. Reading fluency and speech perception speed of beginning readers with persistent reading problems: the perception of initial stop consonants and consonant clusters

    NARCIS (Netherlands)

    P. Snellings; A. van der Leij; H. Blok; P.F. de Jong

    2010-01-01

    This study investigated the role of speech perception accuracy and speed in fluent word decoding of reading disabled (RD) children. A same-different phoneme discrimination task with natural speech tested the perception of single consonants and consonant clusters by young but persistent RD children.

  18. Speech misperception: speaking and seeing interfere differently with hearing.

    Directory of Open Access Journals (Sweden)

    Takemi Mochida

    Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.

  19. Speech misperception: speaking and seeing interfere differently with hearing.

    Science.gov (United States)

    Mochida, Takemi; Kimura, Toshitaka; Hiroya, Sadao; Kitagawa, Norimichi; Gomi, Hiroaki; Kondo, Tadahisa

    2013-01-01

    Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech. PMID:23844227

  20. Are mirror neurons the basis of speech perception? Evidence from five cases with damage to the purported human mirror system.

    Science.gov (United States)

    Rogalsky, Corianne; Love, Tracy; Driscoll, David; Anderson, Steven W; Hickok, Gregory

    2011-01-01

    The discovery of mirror neurons in macaque has led to a resurrection of motor theories of speech perception. Although the majority of lesion and functional imaging studies have associated perception with the temporal lobes, it has also been proposed that the 'human mirror system', which prominently includes Broca's area, is the neurophysiological substrate of speech perception. Although numerous studies have demonstrated a tight link between sensory and motor speech processes, few have directly assessed the critical prediction of mirror neuron theories of speech perception, namely that damage to the human mirror system should cause severe deficits in speech perception. The present study measured speech perception abilities of patients with lesions involving motor regions in the left posterior frontal lobe and/or inferior parietal lobule (i.e., the proposed human 'mirror system'). Performance was at or near ceiling in patients with fronto-parietal lesions. It is only when the lesion encroaches on auditory regions in the temporal lobe that perceptual deficits are evident. This suggests that 'mirror system' damage does not disrupt speech perception, but rather that auditory systems are the primary substrate for speech perception.

  1. Speech-in-Noise Perception Deficit in Adults with Dyslexia: Effects of Background Type and Listening Configuration

    Science.gov (United States)

    Dole, Marjorie; Hoen, Michel; Meunier, Fanny

    2012-01-01

    Developmental dyslexia is associated with impaired speech-in-noise perception. The goal of the present research was to further characterize this deficit in dyslexic adults. In order to specify the mechanisms and processing strategies used by adults with dyslexia during speech-in-noise perception, we explored the influence of background type,…

  2. Speech-in-noise perception deficit in adults with dyslexia: effects of background type and listening configuration.

    OpenAIRE

    Dole, Marjorie; Hoen, Michel; Meunier, Fanny

    2012-01-01

    Developmental dyslexia is associated with impaired speech-in-noise perception. The goal of the present research was to further characterize this deficit in dyslexic adults. In order to specify the mechanisms and processing strategies used by adults with dyslexia during speech-in-noise perception, we explored the influence of background type, presenting single target-words against backgrounds made of cocktail party sounds, modulated speech-derived noise or stationary noise. We also evaluated t...

  3. Comparison of different speech coding strategies using a disability-based inventory and speech perception tests in quiet and in noise.

    NARCIS (Netherlands)

    Beynon, A.J.; Snik, A.F.M.; Broek, P. van den

    2003-01-01

    OBJECTIVE: Intraindividual comparison of two cochlear implant speech coding strategies implemented in the Nucleus 24M system (SPEAK versus ACE). Reasons for subjective preference were evaluated using a combination of speech perception scores and a disability-based inventory. STUDY DESIGN: Cross-over

  4. Dissociating speech perception and comprehension at reduced levels of awareness

    NARCIS (Netherlands)

    Davis, Matthew H.; Coleman, Martin R.; Absalom, Anthony R.; Rodd, Jennifer M.; Johnsrude, Ingrid S.; Matta, Basil F.; Owen, Adrian M.; Menon, David K.

    2007-01-01

    We used functional MRI and the anesthetic agent propofol to assess the relationship among neural responses to speech, successful comprehension, and conscious awareness. Volunteers were scanned while listening to sentences containing ambiguous words, matched sentences without ambiguous words, and sig

  5. Predicting Individual Variation in Language From Infant Speech Perception Measures

    NARCIS (Netherlands)

    Cristia, Alejandrina; Seidl, Amanda; Junge, Caroline; Soderstrom, Melanie; Hagoort, Peter

    2014-01-01

    There are increasing reports that individual variation in behavioral and neurophysiological measures of infant speech processing predicts later language outcomes, and specifically concurrent or subsequent vocabulary size. If such findings are held up under scrutiny, they could both illuminate theore

  6. Predicting individual variation in language from infant speech perception measures

    NARCIS (Netherlands)

    A. Christia; A. Seidl; C. Junge; M. Soderstrom; P. Hagoort

    2013-01-01

    There are increasing reports that individual variation in behavioral and neurophysiological measures of infant speech processing predicts later language outcomes, and specifically concurrent or subsequent vocabulary size. If such findings are held up under scrutiny, they could both illuminate theore

  7. Neurophysiological evidence that musical training influences the recruitment of right hemispheric homologues for speech perception.

    Science.gov (United States)

    Jantzen, McNeel G; Howe, Bradley M; Jantzen, Kelly J

    2014-01-01

    Musicians have a more accurate temporal and tonal representation of auditory stimuli than their non-musician counterparts (Musacchia et al., 2007; Parbery-Clark et al., 2009a; Zendel and Alain, 2009; Kraus and Chandrasekaran, 2010). Musicians who are adept at the production and perception of music are also more sensitive to key acoustic features of speech such as voice onset timing and pitch. Together, these data suggest that musical training may enhance the processing of acoustic information for speech sounds. In the current study, we sought to provide neural evidence that musicians process speech and music in a similar way. We hypothesized that for musicians, right hemisphere areas traditionally associated with music are also engaged for the processing of speech sounds. In contrast we predicted that in non-musicians processing of speech sounds would be localized to traditional left hemisphere language areas. Speech stimuli differing in voice onset time was presented using a dichotic listening paradigm. Subjects either indicated aural location for a specified speech sound or identified a specific speech sound from a directed aural location. Musical training effects and organization of acoustic features were reflected by activity in source generators of the P50. This included greater activation of right middle temporal gyrus and superior temporal gyrus in musicians. The findings demonstrate recruitment of right hemisphere in musicians for discriminating speech sounds and a putative broadening of their language network. Musicians appear to have an increased sensitivity to acoustic features and enhanced selective attention to temporal features of speech that is facilitated by musical training and supported, in part, by right hemisphere homologues of established speech processing regions of the brain. PMID:24624107

  8. Tuning in and tuning out: Speech perception in native- and foreign-talker babble

    Science.gov (United States)

    van Heukelem, Kristin; Bradlow, Ann R.

    2005-09-01

    Studies on speech perception in multitalker babble have revealed asymmetries in the effects of noise on native versus foreign-accented speech intelligibility for native listeners [Rogers et al., Lang Speech 47(2), 139-154 (2004)] and on sentence-in-noise perception by native versus non-native listeners [Mayo et al., J. Speech Lang. Hear. Res., 40, 686-693 (1997)], suggesting that the linguistic backgrounds of talkers and listeners contribute to the effects of noise on speech perception. However, little attention has been paid to the language of the babble. This study tested whether the language of the noise also has asymmetrical effects on listeners. Replicating previous findings [e.g., Bronkhorst and Plomp, J. Acoust. Soc. Am., 92, 3132-3139 (1992)], the results showed poorer English sentence recognition by native English listeners in six-talker babble than in two-talker babble regardless of the language of the babble, demonstrating the effect of increased psychoacoustic/energetic masking. In addition, the results showed that in the two-talker babble condition, native English listeners were more adversely affected by English than Chinese babble. These findings demonstrate informational/cognitive masking on sentence-in-noise recognition in the form of linguistic competition. Whether this competition is at the lexical or sublexical level and whether it is modulated by the phonetic similarity between the target and noise languages remains to be determined.

  9. Age of second-language acquisition and perception of speech in noise.

    Science.gov (United States)

    Mayo, L H; Florentine, M; Buus, S

    1997-06-01

    To determine how age of acquisition influences perception of second-language speech, the Speech Perception in Noise (SPIN) test was administered to native Mexican-Spanish-speaking listeners who learned fluent English before age 6 (early bilinguals) or after age 14 (late bilinguals) and monolingual American-English speakers (monolinguals). Results show that the levels of noise at which the speech was intelligible were significantly higher and the benefit from context was significantly greater for monolinguals and early bilinguals than for late bilinguals. These findings indicate that learning a second language at an early age is important for the acquisition of efficient high-level processing of it, at least in the presence of noise. PMID:9210123

  10. Bullying in Children Who Stutter: Speech-Language Pathologists' Perceptions and Intervention Strategies

    Science.gov (United States)

    Blood, Gordon W.; Boyle, Michael P.; Blood, Ingrid M.; Nalesnik, Gina R.

    2010-01-01

    Bullying in school-age children is a global epidemic. School personnel play a critical role in eliminating this problem. The goals of this study were to examine speech-language pathologists' (SLPs) perceptions of bullying, endorsement of potential strategies for dealing with bullying, and associations among SLPs' responses and specific demographic…

  11. Is the Sensorimotor Cortex Relevant for Speech Perception and Understanding? An Integrative Review

    Science.gov (United States)

    Schomers, Malte R.; Pulvermüller, Friedemann

    2016-01-01

    In the neuroscience of language, phonemes are frequently described as multimodal units whose neuronal representations are distributed across perisylvian cortical regions, including auditory and sensorimotor areas. A different position views phonemes primarily as acoustic entities with posterior temporal localization, which are functionally independent from frontoparietal articulatory programs. To address this current controversy, we here discuss experimental results from functional magnetic resonance imaging (fMRI) as well as transcranial magnetic stimulation (TMS) studies. On first glance, a mixed picture emerges, with earlier research documenting neurofunctional distinctions between phonemes in both temporal and frontoparietal sensorimotor systems, but some recent work seemingly failing to replicate the latter. Detailed analysis of methodological differences between studies reveals that the way experiments are set up explains whether sensorimotor cortex maps phonological information during speech perception or not. In particular, acoustic noise during the experiment and ‘motor noise’ caused by button press tasks work against the frontoparietal manifestation of phonemes. We highlight recent studies using sparse imaging and passive speech perception tasks along with multivariate pattern analysis (MVPA) and especially representational similarity analysis (RSA), which succeeded in separating acoustic-phonological from general-acoustic processes and in mapping specific phonological information on temporal and frontoparietal regions. The question about a causal role of sensorimotor cortex on speech perception and understanding is addressed by reviewing recent TMS studies. We conclude that frontoparietal cortices, including ventral motor and somatosensory areas, reflect phonological information during speech perception and exert a causal influence on language understanding. PMID:27708566

  12. Effects of Removing Low-Frequency Electric Information on Speech Perception with Bimodal Hearing

    Science.gov (United States)

    Fowler, Jennifer R.; Eggleston, Jessica L.; Reavis, Kelly M.; McMillan, Garnett P.; Reiss, Lina A. J.

    2016-01-01

    Purpose: The objective was to determine whether speech perception could be improved for bimodal listeners (those using a cochlear implant [CI] in one ear and hearing aid in the contralateral ear) by removing low-frequency information provided by the CI, thereby reducing acoustic-electric overlap. Method: Subjects were adult CI subjects with at…

  13. The Effects of Corrective Feedback on Instructed L2 Speech Perception

    Science.gov (United States)

    Lee, Andrew H.; Lyster, Roy

    2016-01-01

    To what extent do second language (L2) learners benefit from instruction that includes corrective feedback (CF) on L2 speech perception? This article addresses this question by reporting the results of a classroom-based experimental study conducted with 32 young adult Korean learners of English. An instruction-only group and an instruction + CF…

  14. Gender and Speech Rate in the Perception of Competence and Social Attractiveness.

    Science.gov (United States)

    Feldstein, Stanley; Dohm, Faith-Anne; Crown, Cynthia L.

    2001-01-01

    Presents a study that explores (1) whether listeners regard speakers with similar global speech rates as more competent and attractive and (2) the influence of gender on their perceptions. Explains that the judges consisted of 17 male and 28 female listeners. (CMK)

  15. Putative mechanisms mediating tolerance for audiovisual stimulus onset asynchrony.

    Science.gov (United States)

    Bhat, Jyoti; Miller, Lee M; Pitt, Mark A; Shahin, Antoine J

    2015-03-01

    Audiovisual (AV) speech perception is robust to temporal asynchronies between visual and auditory stimuli. We investigated the neural mechanisms that facilitate tolerance for audiovisual stimulus onset asynchrony (AVOA) with EEG. Individuals were presented with AV words that were asynchronous in onsets of voice and mouth movement and judged whether they were synchronous or not. Behaviorally, individuals tolerated (perceived as synchronous) longer AVOAs when mouth movement preceded the speech (V-A) stimuli than when the speech preceded mouth movement (A-V). Neurophysiologically, the P1-N1-P2 auditory evoked potentials (AEPs), time-locked to sound onsets and known to arise in and surrounding the primary auditory cortex (PAC), were smaller for the in-sync than the out-of-sync percepts. Spectral power of oscillatory activity in the beta band (14-30 Hz) following the AEPs was larger during the in-sync than out-of-sync perception for both A-V and V-A conditions. However, alpha power (8-14 Hz), also following AEPs, was larger for the in-sync than out-of-sync percepts only in the V-A condition. These results demonstrate that AVOA tolerance is enhanced by inhibiting low-level auditory activity (e.g., AEPs representing generators in and surrounding PAC) that code for acoustic onsets. By reducing sensitivity to acoustic onsets, visual-to-auditory onset mapping is weakened, allowing for greater AVOA tolerance. In contrast, beta and alpha results suggest the involvement of higher-level neural processes that may code for language cues (phonetic, lexical), selective attention, and binding of AV percepts, allowing for wider neural windows of temporal integration, i.e., greater AVOA tolerance. PMID:25505102

  16. Melodic Contour Training and Its Effect on Speech in Noise, Consonant Discrimination, and Prosody Perception for Cochlear Implant Recipients

    Directory of Open Access Journals (Sweden)

    Chi Yhun Lo

    2015-01-01

    Full Text Available Cochlear implant (CI recipients generally have good perception of speech in quiet environments but difficulty perceiving speech in noisy conditions, reduced sensitivity to speech prosody, and difficulty appreciating music. Auditory training has been proposed as a method of improving speech perception for CI recipients, and recent efforts have focussed on the potential benefits of music-based training. This study evaluated two melodic contour training programs and their relative efficacy as measured on a number of speech perception tasks. These melodic contours were simple 5-note sequences formed into 9 contour patterns, such as “rising” or “rising-falling.” One training program controlled difficulty by manipulating interval sizes, the other by note durations. Sixteen adult CI recipients (aged 26–86 years and twelve normal hearing (NH adult listeners (aged 21–42 years were tested on a speech perception battery at baseline and then after 6 weeks of melodic contour training. Results indicated that there were some benefits for speech perception tasks for CI recipients after melodic contour training. Specifically, consonant perception in quiet and question/statement prosody was improved. In comparison, NH listeners performed at ceiling for these tasks. There was no significant difference between the posttraining results for either training program, suggesting that both conferred benefits for training CI recipients to better perceive speech.

  17. Testing Speech Recognition in Spanish-English Bilingual Children with the Computer-Assisted Speech Perception Assessment (CASPA): Initial Report.

    Science.gov (United States)

    García, Paula B; Rosado Rogers, Lydia; Nishi, Kanae

    2016-01-01

    This study evaluated the English version of Computer-Assisted Speech Perception Assessment (E-CASPA) with Spanish-English bilingual children. E-CASPA has been evaluated with monolingual English speakers ages 5 years and older, but it is unknown whether a separate norm is necessary for bilingual children. Eleven Spanish-English bilingual and 12 English monolingual children (6 to 12 years old) with normal hearing participated. Responses were scored by word, phoneme, consonant, and vowel. Regardless of scores, performance across three signal-to-noise ratio conditions was similar between groups, suggesting that the same norm can be used for both bilingual and monolingual children.

  18. Visual Speech Perception in Children with Language Learning Impairments

    Science.gov (United States)

    Knowland, Victoria C. P.; Evans, Sam; Snell, Caroline; Rosen, Stuart

    2016-01-01

    Purpose: The purpose of the study was to assess the ability of children with developmental language learning impairments (LLIs) to use visual speech cues from the talking face. Method: In this cross-sectional study, 41 typically developing children (mean age: 8 years 0 months, range: 4 years 5 months to 11 years 10 months) and 27 children with…

  19. Speech and sign perception in deaf children with cochlear implants

    NARCIS (Netherlands)

    M.R. Giezen

    2011-01-01

    Although a cochlear implant (CI) restores access to sound and speech for profoundly deaf children, there is substantial inter-individual variation in outcomes and many children with a CI continue to be delayed in their spoken language development. This suggests that they may benefit from alternative

  20. Influence of musical training on perception of L2 speech

    NARCIS (Netherlands)

    Sadakata, M.; Zanden, L.D.T. van der; Sekiyama, K.

    2010-01-01

    The current study reports specific cases in which a positive transfer of perceptual ability from the music domain to the language domain occurs. We tested whether musical training enhances discrimination and identification performance of L2 speech sounds (timing features, nasal consonants and vowels

  1. Speaker's hand gestures modulate speech perception through phase resetting of ongoing neural oscillations.

    Science.gov (United States)

    Biau, Emmanuel; Torralba, Mireia; Fuentemilla, Lluis; de Diego Balaguer, Ruth; Soto-Faraco, Salvador

    2015-07-01

    Speakers often accompany speech with spontaneous beat gestures in natural spoken communication. These gestures are usually aligned with lexical stress and can modulate the saliency of their affiliate words. Here we addressed the consequences of beat gestures on the neural correlates of speech perception. Previous studies have highlighted the role played by theta oscillations in temporal prediction of speech. We hypothesized that the sight of beat gestures may influence ongoing low-frequency neural oscillations around the onset of the corresponding words. Electroencephalographic (EEG) recordings were acquired while participants watched a continuous, naturally recorded discourse. The phase-locking value (PLV) at word onset was calculated from the EEG from pairs of identical words that had been pronounced with and without a concurrent beat gesture in the discourse. We observed an increase in PLV in the 5-6 Hz theta range as well as a desynchronization in the 8-10 Hz alpha band around the onset of words preceded by a beat gesture. These findings suggest that beats help tune low-frequency oscillatory activity at relevant moments during natural speech perception, providing a new insight of how speech and paralinguistic information are integrated. PMID:25595613

  2. Speech perception of young children using nucleus 22-channel or CLARION cochlear implants.

    Science.gov (United States)

    Young, N M; Grohne, K M; Carrasco, V N; Brown, C

    1999-04-01

    This study compares the auditory perceptual skill development of 23 congenitally deaf children who received the Nucleus 22-channel cochlear implant with the SPEAK speech coding strategy, and 20 children who received the CLARION Multi-Strategy Cochlear Implant with the Continuous Interleaved Sampler (CIS) speech coding strategy. All were under 5 years old at implantation. Preimplantation, there were no significant differences between the groups in age, length of hearing aid use, or communication mode. Auditory skills were assessed at 6 months and 12 months after implantation. Postimplantation, the mean scores on all speech perception tests were higher for the Clarion group. These differences were statistically significant for the pattern perception and monosyllable subtests of the Early Speech Perception battery at 6 months, and for the Glendonald Auditory Screening Procedure at 12 months. Multiple regression analysis revealed that device type accounted for the greatest variance in performance after 12 months of implant use. We conclude that children using the CIS strategy implemented in the Clarion implant may develop better auditory perceptual skills during the first year postimplantation than children using the SPEAK strategy with the Nucleus device. PMID:10214811

  3. Speech perception and quality of life of open-fit hearing aid users

    Science.gov (United States)

    GARCIA, Tatiana Manfrini; JACOB, Regina Tangerino de Souza; MONDELLI, Maria Fernanda Capoani Garcia

    2016-01-01

    ABSTRACT Objective To relate the performance of individuals with hearing loss at high frequencies in speech perception with the quality of life before and after the fitting of an open-fit hearing aid (HA). Methods The WHOQOL-BREF had been used before the fitting and 90 days after the use of HA. The Hearing in Noise Test (HINT) had been conducted in two phases: (1) at the time of fitting without an HA (situation A) and with an HA (situation B); (2) with an HA 90 days after fitting (situation C). Study Sample Thirty subjects with sensorineural hearing loss at high frequencies. Results By using an analysis of variance and the Tukey’s test comparing the three HINT situations in quiet and noisy environments, an improvement has been observed after the HA fitting. The results of the WHOQOL-BREF have showed an improvement in the quality of life after the HA fitting (paired t-test). The relationship between speech perception and quality of life before the HA fitting indicated a significant relationship between speech recognition in noisy environments and in the domain of social relations after the HA fitting (Pearson’s correlation coefficient). Conclusions The auditory stimulation has improved speech perception and the quality of life of individuals. PMID:27383708

  4. Predicting individual variation in language from infant speech perception measures

    OpenAIRE

    Cristia, A.; Seidl, A; Junge, C.; Soderstrom, M.; Hagoort, P.

    2014-01-01

    There are increasing reports that individual variation in behavioral and neurophysiological measures of infant speech processing predicts later language outcomes, and specifically concurrent or subsequent vocabulary size. If such findings are held up under scrutiny, they could both illuminate theoretical models of language development and contribute to the prediction of communicative disorders. A qualitative, systematic review of this emergent literature illustrated the variety of approaches ...

  5. Dissociating speech perception and comprehension at reduced levels of awareness

    OpenAIRE

    Davis, Matthew H.; Coleman, Martin R.; Absalom, Anthony R.; Rodd, Jennifer M.; Johnsrude, Ingrid S.; Matta, Basil F.; Owen, Adrian M.; Menon, David K.

    2007-01-01

    We used functional MRI and the anesthetic agent propofol to assess the relationship among neural responses to speech, successful comprehension, and conscious awareness. Volunteers were scanned while listening to sentences containing ambiguous words, matched sentences without ambiguous words, and signal-correlated noise (SCN). During three scanning sessions, participants were nonsedated (awake), lightly sedated (a slowed response to conversation), and deeply sedated (no conversational response...

  6. Accent, intelligibility, and comprehensibility in the perception of foreign-accented Lombard speech

    Science.gov (United States)

    Li, Chi-Nin

    2003-10-01

    Speech produced in noise (Lombard speech) has been reported to be more intelligible than speech produced in quiet (normal speech). This study examined the perception of non-native Lombard speech in terms of intelligibility, comprehensibility, and degree of foreign accent. Twelve Cantonese speakers and a comparison group of English speakers read simple true and false English statements in quiet and in 70 dB of masking noise. Lombard and normal utterances were mixed with noise at a constant signal-to-noise ratio, and presented along with noise-free stimuli to eight new English listeners who provided transcription scores, comprehensibility ratings, and accent ratings. Analyses showed that, as expected, utterances presented in noise were less well perceived than were noise-free sentences, and that the Cantonese speakers' productions were more accented, but less intelligible and less comprehensible than those of the English speakers. For both groups of speakers, the Lombard sentences were correctly transcribed more often than their normal utterances in noisy conditions. However, the Cantonese-accented Lombard sentences were not rated as easier to understand than was the normal speech in all conditions. The assigned accent ratings were similar throughout all listening conditions. Implications of these findings will be discussed.

  7. The impact of phonetic dissimilarity on the perception of foreign accented speech

    Science.gov (United States)

    Weil, Shawn A.

    2003-10-01

    Non-normative speech (i.e., synthetic speech, pathological speech, foreign accented speech) is more difficult to process for native listeners than is normative speech. Does perceptual dissimilarity affect only intelligibility, or are there other costs to processing? The current series of experiments investigates both the intelligibility and time course of foreign accented speech (FAS) perception. Native English listeners heard single English words spoken by both native English speakers and non-native speakers (Mandarin or Russian). Words were chosen based on the similarity between the phonetic inventories of the respective languages. Three experimental designs were used: a cross-modal matching task, a word repetition (shadowing) task, and two subjective ratings tasks which measured impressions of accentedness and effortfulness. The results replicate previous investigations that have found that FAS significantly lowers word intelligibility. Furthermore, in FAS as well as perceptual effort, in the word repetition task, correct responses are slower to accented words than to nonaccented words. An analysis indicates that both intelligibility and reaction time are, in part, functions of the similarity between the talker's utterance and the listener's representation of the word.

  8. Evaluation of Speech-Perception Training for Hearing Aid Users: A Multisite Study in Progress.

    Science.gov (United States)

    Miller, James D; Watson, Charles S; Dubno, Judy R; Leek, Marjorie R

    2015-11-01

    Following an overview of theoretical issues in speech-perception training and of previous efforts to enhance hearing aid use through training, a multisite study, designed to evaluate the efficacy of two types of computerized speech-perception training for adults who use hearing aids, is described. One training method focuses on the identification of 109 syllable constituents (45 onsets, 28 nuclei, and 36 codas) in quiet and in noise, and on the perception of words in sentences presented in various levels of noise. In a second type of training, participants listen to 6- to 7-minute narratives in noise and are asked several questions about each narrative. Two groups of listeners are trained, each using one of these types of training, performed in a laboratory setting. The training for both groups is preceded and followed by a series of speech-perception tests. Subjects listen in a sound field while wearing their hearing aids at their usual settings. The training continues over 15 to 20 visits, with subjects completing at least 30 hours of focused training with one of the two methods. The two types of training are described in detail, together with a summary of other perceptual and cognitive measures obtained from all participants. PMID:27587914

  9. Communication Between Speech Production and Perception Within the Brain--Observation and Simulation

    Institute of Scientific and Technical Information of China (English)

    Jianwu Dang; Masato Akagi; Kiyoshi Honda

    2006-01-01

    Realization of an intelligent human-machine interface requires us to investigate human mechanisms and learn from them. This study focuses on communication between speech production and perception within human brain and realizing it in an artificial system. A physiological research study based on electromyographic signals (Honda, 1996) suggested that speech communication in human brain might be based on a topological mapping between speech production and perception, according to an analogous topology between motor and sensory representations. Following this hypothesis, this study first investigated the topologies of the vowel system across the motor, kinematic, and acoustic spaces by means of a model simulation, and then examined the linkage between vowel production and perception in terms of a transformed auditory feedback (TAF) experiment. The model simulation indicated that there exists an invariant mapping from muscle activations (motor space) to articulations (kinematic space) via a coordinate consisting of force-dependent equilibrium positions, and the mapping from the motor space to kinematic space is unique. The motor-kinematic-acoustic deduction in the model simulation showed that the topologies were compatible from one space to another. In the TAF experiment, vowel production exhibited a compensatory response for a perturbation in the feedback sound. This implied that vowel production is controlled in reference to perception monitoring.

  10. Audiovisual Styling and the Film Experience

    DEFF Research Database (Denmark)

    Langkjær, Birger

    2015-01-01

    Approaches to music and audiovisual meaning in film appear to be very different in nature and scope when considered from the point of view of experimental psychology or humanistic studies. Nevertheless, this article argues that experimental studies square with ideas of audiovisual perception and ...

  11. Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech

    CERN Document Server

    Meyer, Julien

    2007-01-01

    Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height perceived simultaneously by the human ear in the complex frequency spectrum of the spoken or sung voice (pitch, timbre). As a consequence, this practice underlines key acoustic cues for the intelligibility of the concerned languages. The present study provides an analysis of the acoustic and phonetic features selected by whistled speech in several traditions either in purely oral whistles (Spanish, Turkish, Mazatec) or in whistles produced with an instrument like a leaf (Akha, Hmong). It underlines the convergences with the strategies of the singing ...

  12. Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech

    OpenAIRE

    Meyer, Julien

    2007-01-01

    International audience Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height per...

  13. An understanding of how rhetoric, metaphors and dehumanization in political speeches affect the image and perception of Muslims and IS

    OpenAIRE

    Clausen, Thomas Wolff

    2016-01-01

    This paper includes a theoretical understanding of the affects of rhetoric, metaphors and dehumanisations in political speeches. This theoretical framework is used to analyse specific chosen speeches of Barack Obama, David Cameron, Donald Trump and Hillary Clinton. The analysis is done in order to get a comprehension of how rhetoric, metaphors and dehumanisations, in the analysed speeches, are influencing the image and perception of Muslims and IS. An understanding of what affect the modern m...

  14. Reading fluency and speech perception speed of beginning readers with persistent reading problems: the perception of initial stop consonants and consonant clusters

    OpenAIRE

    Snellings, P.; Leij, van der, A.R.; Blok, H.; Jong, de, P.F.

    2010-01-01

    This study investigated the role of speech perception accuracy and speed in fluent word decoding of reading disabled (RD) children. A same-different phoneme discrimination task with natural speech tested the perception of single consonants and consonant clusters by young but persistent RD children. RD children were slower than chronological age (CA) controls in recognizing identical sounds, suggesting less distinct phonemic categories. In addition, after controlling for phonetic similarity Ta...

  15. Keeping time in the brain: Autism spectrum disorder and audiovisual temporal processing.

    Science.gov (United States)

    Stevenson, Ryan A; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Camarata, Stephen; Wallace, Mark T

    2016-07-01

    A growing area of interest and relevance in the study of autism spectrum disorder (ASD) focuses on the relationship between multisensory temporal function and the behavioral, perceptual, and cognitive impairments observed in ASD. Atypical sensory processing is becoming increasingly recognized as a core component of autism, with evidence of atypical processing across a number of sensory modalities. These deviations from typical processing underscore the value of interpreting ASD within a multisensory framework. Furthermore, converging evidence illustrates that these differences in audiovisual processing may be specifically related to temporal processing. This review seeks to bridge the connection between temporal processing and audiovisual perception, and to elaborate on emerging data showing differences in audiovisual temporal function in autism. We also discuss the consequence of such changes, the specific impact on the processing of different classes of audiovisual stimuli (e.g. speech vs. nonspeech, etc.), and the presumptive brain processes and networks underlying audiovisual temporal integration. Finally, possible downstream behavioral implications, and possible remediation strategies are outlined. Autism Res 2016, 9: 720-738. © 2015 International Society for Autism Research, Wiley Periodicals, Inc.

  16. Keeping time in the brain: Autism spectrum disorder and audiovisual temporal processing.

    Science.gov (United States)

    Stevenson, Ryan A; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Camarata, Stephen; Wallace, Mark T

    2016-07-01

    A growing area of interest and relevance in the study of autism spectrum disorder (ASD) focuses on the relationship between multisensory temporal function and the behavioral, perceptual, and cognitive impairments observed in ASD. Atypical sensory processing is becoming increasingly recognized as a core component of autism, with evidence of atypical processing across a number of sensory modalities. These deviations from typical processing underscore the value of interpreting ASD within a multisensory framework. Furthermore, converging evidence illustrates that these differences in audiovisual processing may be specifically related to temporal processing. This review seeks to bridge the connection between temporal processing and audiovisual perception, and to elaborate on emerging data showing differences in audiovisual temporal function in autism. We also discuss the consequence of such changes, the specific impact on the processing of different classes of audiovisual stimuli (e.g. speech vs. nonspeech, etc.), and the presumptive brain processes and networks underlying audiovisual temporal integration. Finally, possible downstream behavioral implications, and possible remediation strategies are outlined. Autism Res 2016, 9: 720-738. © 2015 International Society for Autism Research, Wiley Periodicals, Inc. PMID:26402725

  17. The relationship between the neural computations for speech and music perception is context-dependent: an activation likelihood estimate study

    OpenAIRE

    LaCroix, Arianna N.; Alvaro F. Diaz; Rogalsky, Corianne

    2015-01-01

    The relationship between the neurobiology of speech and music has been investigated for more than a century. There remains no widespread agreement regarding how (or to what extent) music perception utilizes the neural circuitry that is engaged in speech processing, particularly at the cortical level. Prominent models such as Patel's Shared Syntactic Integration Resource Hypothesis (SSIRH) and Koelsch's neurocognitive model of music perception suggest a high degree of overlap, particularly in ...

  18. A Multidimensional Scaling Study of Native and Non-Native Listeners' Perception of Second Language Speech.

    Science.gov (United States)

    Foote, Jennifer A; Trofimovich, Pavel

    2016-04-01

    Second language speech learning is predicated on learners' ability to notice differences between their own language output and that of their interlocutors. Because many learners interact primarily with other second language users, it is crucial to understand which dimensions underlie the perception of second language speech by learners, compared to native speakers. For this study, 15 non-native and 10 native English speakers rated 30-s language audio-recordings from controlled reading and interview tasks for dissimilarity, using all pairwise combinations of recordings. PROXSCAL multidimensional scaling analyses revealed fluency and aspects of speakers' pronunciation as components underlying listener judgments but showed little agreement across listeners. Results contribute to an understanding of why second language speech learning is difficult and provide implications for language training.

  19. A Multidimensional Scaling Study of Native and Non-Native Listeners' Perception of Second Language Speech.

    Science.gov (United States)

    Foote, Jennifer A; Trofimovich, Pavel

    2016-04-01

    Second language speech learning is predicated on learners' ability to notice differences between their own language output and that of their interlocutors. Because many learners interact primarily with other second language users, it is crucial to understand which dimensions underlie the perception of second language speech by learners, compared to native speakers. For this study, 15 non-native and 10 native English speakers rated 30-s language audio-recordings from controlled reading and interview tasks for dissimilarity, using all pairwise combinations of recordings. PROXSCAL multidimensional scaling analyses revealed fluency and aspects of speakers' pronunciation as components underlying listener judgments but showed little agreement across listeners. Results contribute to an understanding of why second language speech learning is difficult and provide implications for language training. PMID:27166328

  20. A speech perception test for children in classrooms

    Science.gov (United States)

    Feijoo, Sergio; Fernandez, Santiago; Alvarez, Jose Manuel

    2002-11-01

    The combined effects of excessive ambient noise and reverberation in classrooms interfere with speech recognition and tend to degrade the learning process of young children. This paper reports a detailed analysis of a speech recognition test carried out with two different children populations of ages 8-9 and 10-11. Unlike English, Spanish has few minimal pairs to be used for phoneme recognition in a closed set manner. The test consisted in a series of two-syllable nonsense words formed by the combination of all possible syllables in Spanish. The test was administered to the children as a dictation task in which they had to write down the words spoken by their female teacher. The test was administered in two blocks on different days, and later repeated to analyze its consistency. The rationale for this procedure was (a) the test should reproduce normal academic situations, (b) all phonological and lexical context effects should be avoided, (c) errors in both words and phonemes should be scored to unveil any possible acoustic base for them. Although word recognition scores were similar among age groups and repetitions, phoneme errors showed high variability questioning the validity of such a test for classroom assessment.

  1. A speech-perception training tool to improve phonetic transcription

    Science.gov (United States)

    Padgitt, Noelle R.; Munson, Benjamin; Carney, Edward J.

    2005-09-01

    University instruction in phonetics requires students to associate a set of quasialphabetic symbols and diacritics with speech sounds. In the case of narrow phonetic transcription, students are required to associate symbols with sounds that do not function contrastively in the language. This learning task is challenging, given that students must discriminate among different variants of sounds that are not used to convey differences in lexical meaning. Consequently, many students fail to learn phonetic transcription to a level of proficiency needed for practical application (B. Munson and K. N. Brinkman, Am. J. Speech Lang. Path. [2004]). In an effort to improve students' phonetic transcription skills, a computerized training program was developed to trains students' discrimination and identification of selected phonetic contrasts. The design of the training tool was based on similar tools that have been used to train phonetic contrasts in second-language learners of English (e.g., A. Bradlow et al., J. Acoust. Soc. Am. 102, 3115 [1997]). It consists of multiple stages (bombardment, discrimination, identification) containing phonetic contrasts that students have identified as particularly difficult to perceive. This presentation will provide a demonstration of the training tool, and will present preliminary data on the efficacy of this tool in improving students' phonetic transcription abilities.

  2. Are Speech Perception Deficits Associated with Developmental Dyslexia?

    Science.gov (United States)

    Manis, Franklin R.; And Others

    1997-01-01

    Administered phonological awareness and phoneme identification tasks to dyslexic children and chronological age (CA) and reading-level (RL) comparison groups. Found no real differences in categorical perception between dyslexic and RL groups; however, more dyslexics (7 of 25) had abnormal identification functions. Results suggest that some…

  3. Adaptive plasticity in speech perception: Effects of external information and internal predictions.

    Science.gov (United States)

    Guediche, Sara; Fiez, Julie A; Holt, Lori L

    2016-07-01

    When listeners encounter speech under adverse listening conditions, adaptive adjustments in perception can improve comprehension over time. In some cases, these adaptive changes require the presence of external information that disambiguates the distorted speech signals, whereas in other cases mere exposure is sufficient. Both external (e.g., written feedback) and internal (e.g., prior word knowledge) sources of information can be used to generate predictions about the correct mapping of a distorted speech signal. We hypothesize that these predictions provide a basis for determining the discrepancy between the expected and actual speech signal that can be used to guide adaptive changes in perception. This study provides the first empirical investigation that manipulates external and internal factors through (a) the availability of explicit external disambiguating information via the presence or absence of postresponse orthographic information paired with a repetition of the degraded stimulus, and (b) the accuracy of internally generated predictions; an acoustic distortion is introduced either abruptly or incrementally. The results demonstrate that the impact of external information on adaptive plasticity is contingent upon whether the intelligibility of the stimuli permits accurate internally generated predictions during exposure. External information sources enhance adaptive plasticity only when input signals are severely degraded and cannot reliably access internal predictions. This is consistent with a computational framework for adaptive plasticity in which error-driven supervised learning relies on the ability to compute sensory prediction error signals from both internal and external sources of information. (PsycINFO Database Record PMID:26854531

  4. Speech perception in the child brain: cortical timing and its relevance to literacy acquisition.

    Science.gov (United States)

    Parviainen, Tiina; Helenius, Päivi; Poskiparta, Elisa; Niemi, Pekka; Salmelin, Riitta

    2011-12-01

    Speech processing skills go through intensive development during mid-childhood, providing basis also for literacy acquisition. The sequence of auditory cortical processing of speech has been characterized in adults, but very little is known about the neural representation of speech sound perception in the developing brain. We used whole-head magnetoencephalography (MEG) to record neural responses to speech and nonspeech sounds in first-graders (7-8-year-old) and compared the activation sequence to that in adults. In children, the general location of neural activity in the superior temporal cortex was similar to that in adults, but in the time domain the sequence of activation was strikingly different. Cortical differentiation between sound types emerged in a prolonged response pattern at about 250 ms after sound onset, in both hemispheres, clearly later than the corresponding effect at about 100 ms in adults that was detected specifically in the left hemisphere. Better reading skills were linked with shorter-lasting neural activation, speaking for interdependence of the maturing neural processes of auditory perception and developing linguistic skills. This study uniquely utilized the potential of MEG in comparing both spatial and temporal characteristics of neural activation between adults and children. Besides depicting the group-typical features in cortical auditory processing, the results revealed marked interindividual variability in children.

  5. Influence of anesthesia techniques of caesarean section on memory, perception and speech

    Directory of Open Access Journals (Sweden)

    Volkov O.O.

    2014-06-01

    Full Text Available In obstetrics postoperative cognitive dysfunctions may take place after caesarean section and vaginal delivery with poor results both for mother and child. The goal was to study influence of anesthesia techniques following caesarian section on memory, perception and speech. Having agreed with local ethics committee and obtained informed consent depending on anesthesia method, pregnant women were divided into 2 groups: 1st group (n=31 had spinal anesthesia, 2nd group (n=34 – total intravenous anesthesia. Spinal anesthesia: 1.8-2.2 mLs of hyperbaric 0.5% bupivacaine. ТIVА: Thiopental sodium (4 mgs kg-1, succinylcholine (1-1.5 mgs kg-1. Phentanyl (10-5-3 µgs kg-1 hour and Diazepam (10 mgs were used after newborn extraction. We used Luria’s test for memory assessment, perception was studied by test “recognition of time”. Speech was studied by test "name of fingers". Control points: 1 - before the surgery, 2 - in 24h after the caesarian section, 3 - on day 3 after surgery, 4 - at discharge from hospital (5-7th day. The study showed that initially decreased memory level in expectant mothers regressed along with the time after caesarean section. Memory is restored in 3 days after surgery regardless of anesthesia techniques. In spinal anesthesia on 5-7th postoperative day memory level exceeds that of used in total intravenous anesthesia. The perception and speech do not depend on the term of postoperative period. Anesthesia technique does not influence perception and speech restoration after caesarean sections.

  6. Learning to perceive speech: How fricative perception changes, and how it stays the same

    OpenAIRE

    Nittrouer, Susan

    2002-01-01

    A part of becoming a mature perceiver involves learning what signal properties provide relevant information about objects and events in the environment. Regarding speech perception, evidence supports the position that allocation of attention to various signal properties changes as children gain experience with their native language, and so learn what information is relevant to recognizing phonetic structure in that language. However, one weakness in that work has been that data have largely c...

  7. Frame rate of motion picture and its influence on speech perception

    Science.gov (United States)

    Nakazono, Kaoru

    1996-03-01

    The preservation of QoS for multimedia traffic through a data network is a difficult problem. We focus our attention on video frame rate and study its influence on speech perception. When sound and picture are discrepant (e.g., acoustic `ba' combined with visual `ga'), subjects perceive a different sound (such as `da'). This phenomenon is known as the McGurk effect. In this paper, the influence of degraded video frame rate on speech perception was studied. It was shown that when frame rate decreases, correct hearing is improved for discrepant stimuli and is degraded for congruent (voice and picture are the same) stimuli. Furthermore, we studied the case where lip closure was always captured by the synchronization of sampling time and lip position. In this case, frame rate has little effect on mishearing for congruent stimuli. For discrepant stimuli, mishearing is decreased with degraded frame rate. These results indicate that stiff motion of lips resulting from low frame rate cannot give enough labial information for speech perception. In addition, the effect of delaying the picture to correct for low frame rate was studied. The results, however, were not as definitive as expected because of compound effects related to the synchronization of sound and picture.

  8. From vibration to perception: using Large Multi-Actuator Panels (LaMAPs) to create coherent audio-visual environments

    OpenAIRE

    Rébillat, Marc; Corteel, Etienne; Katz, Brian,; Boutillon, Xavier

    2012-01-01

    International audience Virtual reality aims at providing users with audio-visual worlds where they will behave and learn as if they were in the real world. In this context, specific acoustic transducers are needed to fulfill simultaneous spatial requirements on visual and audio rendering in order to make them coherent. Large multi-actuator panels (LaMAPs) allow for the combined construction of a projection screen and loudspeaker array, and thus allows for the coherent creation of an audio ...

  9. Musicians have enhanced audiovisual multisensory binding: experience-dependent effects in the double-flash illusion.

    Science.gov (United States)

    Bidelman, Gavin M

    2016-10-01

    Musical training is associated with behavioral and neurophysiological enhancements in auditory processing for both musical and nonmusical sounds (e.g., speech). Yet, whether the benefits of musicianship extend beyond enhancements to auditory-specific skills and impact multisensory (e.g., audiovisual) processing has yet to be fully validated. Here, we investigated multisensory integration of auditory and visual information in musicians and nonmusicians using a double-flash illusion, whereby the presentation of multiple auditory stimuli (beeps) concurrent with a single visual object (flash) induces an illusory perception of multiple flashes. We parametrically varied the onset asynchrony between auditory and visual events (leads and lags of ±300 ms) to quantify participants' "temporal window" of integration, i.e., stimuli in which auditory and visual cues were fused into a single percept. Results show that musically trained individuals were both faster and more accurate at processing concurrent audiovisual cues than their nonmusician peers; nonmusicians had a higher susceptibility for responding to audiovisual illusions and perceived double flashes over an extended range of onset asynchronies compared to trained musicians. Moreover, temporal window estimates indicated that musicians' windows (audiovisual binding. Collectively, findings indicate a more refined binding of auditory and visual cues in musically trained individuals. We conclude that experience-dependent plasticity of intensive musical experience extends beyond simple listening skills, improving multimodal processing and the integration of multiple sensory systems in a domain-general manner.

  10. The neural dynamics of speech perception: Dissociable networks for processing linguistic content and monitoring speaker turn-taking.

    Science.gov (United States)

    Foti, Dan; Roberts, Felicia

    2016-01-01

    The neural circuitry for speech perception is well-characterized, yet the temporal dynamics therein are largely unknown. This timing information is critical in that spoken language almost always occurs in the context of joint speech (i.e., conversations) where effective communication requires the precise timing of speaker turn-taking-a core aspect of prosody. Here, we used event-related potentials to characterize neural activity elicited by conversation stimuli within a large, unselected adult sample (N=115). We focused on two stages of speech perception: inter-speaker gaps and speaker responses. We found activation in two known speech perception networks, with functional and neuroanatomical specificity: silence during inter-speaker gaps primarily activated the posterior pathway involving the supramarginal gyrus and premotor cortex, whereas hearing speaker responses primarily activated the anterior pathway involving the superior temporal gyrus. These data provide the first direct evidence that the posterior pathway is uniquely involved in monitoring speaker turn-taking. PMID:27177112

  11. Impact of a moving noise masker on speech perception in cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Tobias Weissgerber

    Full Text Available Previous studies investigating speech perception in noise have typically been conducted with static masker positions. The aim of this study was to investigate the effect of spatial separation of source and masker (spatial release from masking, SRM in a moving masker setup and to evaluate the impact of adaptive beamforming in comparison with fixed directional microphones in cochlear implant (CI users.Speech reception thresholds (SRT were measured in S0N0 and in a moving masker setup (S0Nmove in 12 normal hearing participants and 14 CI users (7 subjects bilateral, 7 bimodal with a hearing aid in the contralateral ear. Speech processor settings were a moderately directional microphone, a fixed beamformer, or an adaptive beamformer. The moving noise source was generated by means of wave field synthesis and was smoothly moved in a shape of a half-circle from one ear to the contralateral ear. Noise was presented in either of two conditions: continuous or modulated.SRTs in the S0Nmove setup were significantly improved compared to the S0N0 setup for both the normal hearing control group and the bilateral group in continuous noise, and for the control group in modulated noise. There was no effect of subject group. A significant effect of directional sensitivity was found in the S0Nmove setup. In the bilateral group, the adaptive beamformer achieved lower SRTs than the fixed beamformer setting. Adaptive beamforming improved SRT in both CI user groups substantially by about 3 dB (bimodal group and 8 dB (bilateral group depending on masker type.CI users showed SRM that was comparable to normal hearing subjects. In listening situations of everyday life with spatial separation of source and masker, directional microphones significantly improved speech perception with individual improvements of up to 15 dB SNR. Users of bilateral speech processors with both directional microphones obtained the highest benefit.

  12. Perceptions of The Seriousness of Mispronunciations of English Speech Sounds

    Directory of Open Access Journals (Sweden)

    Moedjito Moedjito

    2006-01-01

    Full Text Available The present study attempts to investigate Indonesian EFL teachers’ and native English speakers’ perceptions of mispronunciations of English sounds by Indonesian EFL learners. For this purpose, a paper-form questionnaire consisting of 32 target mispronunciations was distributed to Indonesian secondary school teachers of English and also to native English speakers. An analysis of the respondents’ perceptions has discovered that 14 out of the 32 target mispronunciations are pedagogically significant in pronunciation instruction. A further analysis of the reasons for these major mispronunciations has reconfirmed the prevalence of interference of learners’ native language in their English pronunciation as a major cause of mispronunciations. It has also revealed Indonesian EFL teachers’ tendency to overestimate the seriousness of their learners’ pronunciations. Based on these findings, the study makes suggestions for better English pronunciation teaching in Indonesia or other EFL countries.

  13. No Lexical-Prelexical Feedback during Speech Perception or: Is It Time to Stop Playing Those Christmas Tapes?

    Science.gov (United States)

    McQueen, James M.; Jesse, Alexandra; Norris, Dennis

    2009-01-01

    The strongest support for feedback in speech perception comes from evidence of apparent lexical influence on prelexical fricative-stop compensation for coarticulation. Lexical knowledge (e.g., that the ambiguous final fricative of "Christma?" should be [s]) apparently influences perception of following stops. We argue that all such previous…

  14. The relationship between the neural computations for speech and music perception is context-dependent: an activation likelihood estimate study

    Directory of Open Access Journals (Sweden)

    Arianna eLaCroix

    2015-08-01

    Full Text Available The relationship between the neurobiology of speech and music has been investigated for more than a century. There remains no widespread agreement regarding how (or to what extent music perception utilizes the neural circuitry that is engaged in speech processing, particularly at the cortical level. Prominent models such as Patel’s Shared Syntactic Integration Resource Hypothesis (SSIRH and Koelsch’s neurocognitive model of music perception suggest a high degree of overlap, particularly in the frontal lobe, but also perhaps more distinct representations in the temporal lobe with hemispheric asymmetries. The present meta-analysis study used activation likelihood estimate analyses to identify the brain regions consistently activated for music as compared to speech across the functional neuroimaging (fMRI and PET literature. Eighty music and 91 speech neuroimaging studies of healthy adult control subjects were analyzed. Peak activations reported in the music and speech studies were divided into four paradigm categories: passive listening, discrimination tasks, error/anomaly detection tasks and memory-related tasks. We then compared activation likelihood estimates within each category for music versus speech, and each music condition with passive listening. We found that listening to music and to speech preferentially activate distinct temporo-parietal bilateral cortical networks. We also found music and speech to have shared resources in the left pars opercularis but speech-specific resources in the left pars triangularis. The extent to which music recruited speech-activated frontal resources was modulated by task. While there are certainly limitations to meta-analysis techniques particularly regarding sensitivity, this work suggests that the extent of shared resources between speech and music may be task-dependent and highlights the need to consider how task effects may be affecting conclusions regarding the neurobiology of speech and music.

  15. Talker-specific learning in amnesia: Insight into mechanisms of adaptive speech perception.

    Science.gov (United States)

    Trude, Alison M; Duff, Melissa C; Brown-Schmidt, Sarah

    2014-05-01

    A hallmark of human speech perception is the ability to comprehend speech quickly and effortlessly despite enormous variability across talkers. However, current theories of speech perception do not make specific claims about the memory mechanisms involved in this process. To examine whether declarative memory is necessary for talker-specific learning, we tested the ability of amnesic patients with severe declarative memory deficits to learn and distinguish the accents of two unfamiliar talkers by monitoring their eye-gaze as they followed spoken instructions. Analyses of the time-course of eye fixations showed that amnesic patients rapidly learned to distinguish these accents and tailored perceptual processes to the voice of each talker. These results demonstrate that declarative memory is not necessary for this ability and points to the involvement of non-declarative memory mechanisms. These results are consistent with findings that other social and accommodative behaviors are preserved in amnesia and contribute to our understanding of the interactions of multiple memory systems in the use and understanding of spoken language. PMID:24657480

  16. A positron emission tomography study of the neural basis of informational and energetic masking effects in speech perception

    Science.gov (United States)

    Scott, Sophie K.; Rosen, Stuart; Wickham, Lindsay; Wise, Richard J. S.

    2004-02-01

    Positron emission tomography (PET) was used to investigate the neural basis of the comprehension of speech in unmodulated noise (``energetic'' masking, dominated by effects at the auditory periphery), and when presented with another speaker (``informational'' masking, dominated by more central effects). Each type of signal was presented at four different signal-to-noise ratios (SNRs) (+3, 0, -3, -6 dB for the speech-in-speech, +6, +3, 0, -3 dB for the speech-in-noise), with listeners instructed to listen for meaning to the target speaker. Consistent with behavioral studies, there was SNR-dependent activation associated with the comprehension of speech in noise, with no SNR-dependent activity for the comprehension of speech-in-speech (at low or negative SNRs). There was, in addition, activation in bilateral superior temporal gyri which was associated with the informational masking condition. The extent to which this activation of classical ``speech'' areas of the temporal lobes might delineate the neural basis of the informational masking is considered, as is the relationship of these findings to the interfering effects of unattended speech and sound on more explicit working memory tasks. This study is a novel demonstration of candidate neural systems involved in the perception of speech in noisy environments, and of the processing of multiple speakers in the dorso-lateral temporal lobes.

  17. Improved perception of speech in noise and Mandarin tones with acoustic simulations of harmonic coding for cochlear implants.

    Science.gov (United States)

    Li, Xing; Nie, Kaibao; Imennov, Nikita S; Won, Jong Ho; Drennan, Ward R; Rubinstein, Jay T; Atlas, Les E

    2012-11-01

    Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs. PMID:23145619

  18. Audiovisual Integration in High Functioning Adults with Autism

    Science.gov (United States)

    Keane, Brian P.; Rosenthal, Orna; Chun, Nicole H.; Shams, Ladan

    2010-01-01

    Autism involves various perceptual benefits and deficits, but it is unclear if the disorder also involves anomalous audiovisual integration. To address this issue, we compared the performance of high-functioning adults with autism and matched controls on experiments investigating the audiovisual integration of speech, spatiotemporal relations, and…

  19. Audio-visual gender recognition

    Science.gov (United States)

    Liu, Ming; Xu, Xun; Huang, Thomas S.

    2007-11-01

    Combining different modalities for pattern recognition task is a very promising field. Basically, human always fuse information from different modalities to recognize object and perform inference, etc. Audio-Visual gender recognition is one of the most common task in human social communication. Human can identify the gender by facial appearance, by speech and also by body gait. Indeed, human gender recognition is a multi-modal data acquisition and processing procedure. However, computational multimodal gender recognition has not been extensively investigated in the literature. In this paper, speech and facial image are fused to perform a mutli-modal gender recognition for exploring the improvement of combining different modalities.

  20. Bayesian model of categorical effects in L1 and L2 speech perception

    Science.gov (United States)

    Kronrod, Yakov

    In this dissertation I present a model that captures categorical effects in both first language (L1) and second language (L2) speech perception. In L1 perception, categorical effects range between extremely strong for consonants to nearly continuous perception of vowels. I treat the problem of speech perception as a statistical inference problem and by quantifying categoricity I obtain a unified model of both strong and weak categorical effects. In this optimal inference mechanism, the listener uses their knowledge of categories and the acoustics of the signal to infer the intended productions of the speaker. The model splits up speech variability into meaningful category variance and perceptual noise variance. The ratio of these two variances, which I call Tau, directly correlates with the degree of categorical effects for a given phoneme or continuum. By fitting the model to behavioral data from different phonemes, I show how a single parametric quantitative variation can lead to the different degrees of categorical effects seen in perception experiments with different phonemes. In L2 perception, L1 categories have been shown to exert an effect on how L2 sounds are identified and how well the listener is able to discriminate them. Various models have been developed to relate the state of L1 categories with both the initial and eventual ability to process the L2. These models largely lacked a formalized metric to measure perceptual distance, a means of making a-priori predictions of behavior for a new contrast, and a way of describing non-discrete gradient effects. In the second part of my dissertation, I apply the same computational model that I used to unify L1 categorical effects to examining L2 perception. I show that we can use the model to make the same type of predictions as other SLA models, but also provide a quantitative framework while formalizing all measures of similarity and bias. Further, I show how using this model to consider L2 learners at

  1. Development of Sensitivity to Audiovisual Temporal Asynchrony during Midchildhood

    Science.gov (United States)

    Kaganovich, Natalya

    2016-01-01

    Temporal proximity is one of the key factors determining whether events in different modalities are integrated into a unified percept. Sensitivity to audiovisual temporal asynchrony has been studied in adults in great detail. However, how such sensitivity matures during childhood is poorly understood. We examined perception of audiovisual temporal…

  2. Visual Contribution to Speech Perception: Measuring the Intelligibility of Animated Talking Heads

    Directory of Open Access Journals (Sweden)

    Slim Ouni

    2006-10-01

    Full Text Available Animated agents are becoming increasingly frequent in research and applications in speech science. An important challenge is to evaluate the effectiveness of the agent in terms of the intelligibility of its visible speech. In three experiments, we extend and test the Sumby and Pollack (1954 metric to allow the comparison of an agent relative to a standard or reference, and also propose a new metric based on the fuzzy logical model of perception (FLMP to describe the benefit provided by a synthetic animated face relative to the benefit provided by a natural face. A valid metric would allow direct comparisons accross different experiments and would give measures of the benfit of a synthetic animated face relative to a natural face (or indeed any two conditions and how this benefit varies as a function of the type of synthetic face, the test items (e.g., syllables versus sentences, different individuals, and applications.

  3. On the role of phonetic inventory in the perception of foreign-accented speech

    Science.gov (United States)

    Sereno, Joan; McCall, Joyce; Jongman, Allard; Dijkstra, Ton; van Heuven, Walter

    2002-05-01

    The current study investigates the effect of phonetic inventory on perception of foreign-accented speech. The perception of native English speech was compared to the perception of foreign-accented English (Dutch-accented English), with selection of stimuli determined on the basis of phonetic inventory. Half of the stimuli contained phonemes that are unique to English and do not occur in Dutch (e.g., [θ] and [æ]), and the other half contained only phonemes that are similar in both English and Dutch (e.g., [s], [i]). Both word and nonword stimuli were included to investigate the role of lexical status. A native speaker of English and a native speaker of Dutch recorded all stimuli. Stimuli were then presented to 40 American listeners using a randomized blocked design in a lexical decision experiment. Results reveal an interaction between speaker (native English versus native Dutch) and phonetic inventory (unique versus common phonemes). Specifically, Dutch-accented stimuli with common phonemes were recognized faster and more accurately than Dutch-accented stimuli with unique phonemes. Results will be discussed in terms of the influence of foreign accent on word recognition processes.

  4. Speech perception and reading: two parallel modes of understanding language and implications for acquiring literacy naturally.

    Science.gov (United States)

    Massaro, Dominic W

    2012-01-01

    I review 2 seminal research reports published in this journal during its second decade more than a century ago. Given psychology's subdisciplines, they would not normally be reviewed together because one involves reading and the other speech perception. The small amount of interaction between these domains might have limited research and theoretical progress. In fact, the 2 early research reports revealed common processes involved in these 2 forms of language processing. Their illustration of the role of Wundt's apperceptive process in reading and speech perception anticipated descriptions of contemporary theories of pattern recognition, such as the fuzzy logical model of perception. Based on the commonalities between reading and listening, one can question why they have been viewed so differently. It is commonly believed that learning to read requires formal instruction and schooling, whereas spoken language is acquired from birth onward through natural interactions with people who talk. Most researchers and educators believe that spoken language is acquired naturally from birth onward and even prenatally. Learning to read, on the other hand, is not possible until the child has acquired spoken language, reaches school age, and receives formal instruction. If an appropriate form of written text is made available early in a child's life, however, the current hypothesis is that reading will also be learned inductively and emerge naturally, with no significant negative consequences. If this proposal is true, it should soon be possible to create an interactive system, Technology Assisted Reading Acquisition, to allow children to acquire literacy naturally. PMID:22953690

  5. Decoding speech perception by native and non-native speakers using single-trial electrophysiological data.

    Directory of Open Access Journals (Sweden)

    Alex Brandmeyer

    Full Text Available Brain-computer interfaces (BCIs are systems that use real-time analysis of neuroimaging data to determine the mental state of their user for purposes such as providing neurofeedback. Here, we investigate the feasibility of a BCI based on speech perception. Multivariate pattern classification methods were applied to single-trial EEG data collected during speech perception by native and non-native speakers. Two principal questions were asked: 1 Can differences in the perceived categories of pairs of phonemes be decoded at the single-trial level? 2 Can these same categorical differences be decoded across participants, within or between native-language groups? Results indicated that classification performance progressively increased with respect to the categorical status (within, boundary or across of the stimulus contrast, and was also influenced by the native language of individual participants. Classifier performance showed strong relationships with traditional event-related potential measures and behavioral responses. The results of the cross-participant analysis indicated an overall increase in average classifier performance when trained on data from all participants (native and non-native. A second cross-participant classifier trained only on data from native speakers led to an overall improvement in performance for native speakers, but a reduction in performance for non-native speakers. We also found that the native language of a given participant could be decoded on the basis of EEG data with accuracy above 80%. These results indicate that electrophysiological responses underlying speech perception can be decoded at the single-trial level, and that decoding performance systematically reflects graded changes in the responses related to the phonological status of the stimuli. This approach could be used in extensions of the BCI paradigm to support perceptual learning during second language acquisition.

  6. Multisensory and Modality Specific Processing of Visual Speech in Different Regions of the Premotor Cortex

    Directory of Open Access Journals (Sweden)

    Daniel eCallan

    2014-05-01

    Full Text Available Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex has been shown to be active during both observation and execution of action (‘Mirror System’ properties, and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI study, participants identified vowels produced by a speaker in audio-visual (saw the speaker’s articulating face and heard her voice, visual only (only saw the speaker’s articulating face, and audio only (only heard the speaker’s voice conditions with varying audio signal-to-noise ratios in order to determine the regions of the premotor cortex involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the fMRI analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and premotor cortex. The left ventral inferior premotor cortex showed properties of multimodal (audio-visual enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the premotor cortex are involved with mapping unimodal (in this case visual sensory features of the speech signal with

  7. Evaluating proposed dorsal and ventral route functions in speech perception and phonological short-term memory: Evidence from aphasia

    Directory of Open Access Journals (Sweden)

    Heather Raye Dial

    2015-04-01

    When the lexical and sublexical stimuli were matched in discriminability, scores were highly correlated and no individual demonstrated substantially better performance on lexical than sublexical perception (Figures 1a-c. However, when the word discriminations were easier (as in prior studies; e.g., Miceli et al., 1980, patients with impaired syllable discrimination were within the control range on word discrimination (Figure 1d. Finally, digit matching showed no significant relation to perception tasks (e.g., Figure 1e. Moreover, there was a wide range of digit matching spans for patients performing well on speech perception tasks (e.g., > 1.5 on syllable discrimination and digit matching ranging from 3.6 to 6.0. These data fail to support dual route claims, suggesting that lexical processing depends on sublexical perception and suggesting that phonological STM depends on a buffer separate from speech perception mechanisms.

  8. Speech perception and language acquisition in the first year of life.

    Science.gov (United States)

    Gervain, Judit; Mehler, Jacques

    2010-01-01

    During the first year of life, infants pass important milestones in language development. We review some of the experimental evidence concerning these milestones in the domains of speech perception, phonological development, word learning, morphosyntactic acquisition, and bilingualism, emphasizing their interactions. We discuss them in the context of their biological underpinnings, introducing the most recent advances not only in language development, but also in neighboring areas such as genetics and the comparative research on animal communication systems. We argue for a theory of language acquisition that integrates behavioral, cognitive, neural, and evolutionary considerations and proposes to unify previously opposing theoretical stances, such as statistical learning, rule-based nativist accounts, and perceptual learning theories.

  9. A Psychophysical Imaging Method Evidencing Auditory Cue Extraction during Speech Perception: A Group Analysis of Auditory Classification Images

    OpenAIRE

    Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel

    2015-01-01

    Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique t...

  10. Evaluation of an audiovisual-FM system: speechreading performance as a function of distance.

    Science.gov (United States)

    Gagné, Jean-Pierre; Charest, Monique; Le Monday, K; Desbiens, C

    2006-05-01

    A research program was undertaken to evaluate the efficacy of an audiovisual-FM system as a speechreading aid. The present study investigated the effects of the distance between the talker and the speechreader on a visual-speech perception task. Sentences were recorded simultaneously with a conventional Hi8 mm video camera, and with the microcamera of an audiovisual-FM system. The recordings were obtained from two talkers at three different distances: 1.83 m, 3.66 m, and 7.32 m. Sixteen subjects completed a visual-keyword recognition task. The main results of the investigation were as follows: For the recordings obtained with the conventional video camera, there was a significant decrease in speechreading performance as the distance between the talker and the camera increased. For the recordings obtained with the microcamera of the audiovisual-FM system, there were no differences in speechreading as a function of the test distances. The findings of the investigation confirm that in a classroom setting the use of an audiovisual-FM system may constitute an effective way of overcoming the deleterious effects of distance on speechreading performance. PMID:16717020

  11. Audiovisual Interaction

    Science.gov (United States)

    Möttönen, Riikka; Sams, Mikko

    Information about the objects and events in the external world is received via multiple sense organs, especially via eyes and ears. For example, a singing bird can be heard and seen. Typically, audiovisual objects are detected, localized and identified more rapidly and accurately than objects which are perceived via only one sensory system (see, e.g. Welch and Warren, 1986; Stein and Meredith, 1993; de Gelder and Bertelson, 2003; Calvert et al., 2004). The ability of the central nervous system to utilize sensory inputs mediated by different sense organs is called multisensory processing.

  12. An audiovisual emotion recognition system

    Science.gov (United States)

    Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun

    2007-12-01

    Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.

  13. Visual anticipatory information modulates multisensory interactions of artificial audiovisual stimuli.

    Science.gov (United States)

    Vroomen, Jean; Stekelenburg, Jeroen J

    2010-07-01

    The neural activity of speech sound processing (the N1 component of the auditory ERP) can be suppressed if a speech sound is accompanied by concordant lip movements. Here we demonstrate that this audiovisual interaction is neither speech specific nor linked to humanlike actions but can be observed with artificial stimuli if their timing is made predictable. In Experiment 1, a pure tone synchronized with a deformation of a rectangle induced a smaller auditory N1 than auditory-only presentations if the temporal occurrence of this audiovisual event was made predictable by two moving disks that touched the rectangle. Local autoregressive average source estimation indicated that this audiovisual interaction may be related to integrative processing in auditory areas. When the moving disks did not precede the audiovisual stimulus--making the onset unpredictable--there was no N1 reduction. In Experiment 2, the predictability of the leading visual signal was manipulated by introducing a temporal asynchrony between the audiovisual event and the collision of moving disks. Audiovisual events occurred either at the moment, before (too "early"), or after (too "late") the disks collided on the rectangle. When asynchronies varied from trial to trial--rendering the moving disks unreliable temporal predictors of the audiovisual event--the N1 reduction was abolished. These results demonstrate that the N1 suppression is induced by visual information that both precedes and reliably predicts audiovisual onset, without a necessary link to human action-related neural mechanisms.

  14. Auditory Perception, Suprasegmental Speech Processing, and Vocabulary Development in Chinese Preschoolers.

    Science.gov (United States)

    Wang, Hsiao-Lan S; Chen, I-Chen; Chiang, Chun-Han; Lai, Ying-Hui; Tsao, Yu

    2016-10-01

    The current study examined the associations between basic auditory perception, speech prosodic processing, and vocabulary development in Chinese kindergartners, specifically, whether early basic auditory perception may be related to linguistic prosodic processing in Chinese Mandarin vocabulary acquisition. A series of language, auditory, and linguistic prosodic tests were given to 100 preschool children who had not yet learned how to read Chinese characters. The results suggested that lexical tone sensitivity and intonation production were significantly correlated with children's general vocabulary abilities. In particular, tone awareness was associated with comprehensive language development, whereas intonation production was associated with both comprehensive and expressive language development. Regression analyses revealed that tone sensitivity accounted for 36% of the unique variance in vocabulary development, whereas intonation production accounted for 6% of the variance in vocabulary development. Moreover, auditory frequency discrimination was significantly correlated with lexical tone sensitivity, syllable duration discrimination, and intonation production in Mandarin Chinese. Also it provided significant contributions to tone sensitivity and intonation production. Auditory frequency discrimination may indirectly affect early vocabulary development through Chinese speech prosody.

  15. Knowledge and attitudes of teachers regarding the impact of classroom acoustics on speech perception and learning.

    Science.gov (United States)

    Ramma, Lebogang

    2009-01-01

    This study investigated the knowledge and attitude of primary school teachers regarding the impact of poor classroom acoustics on learners' speech perception and learning in class. Classrooms with excessive background noise and reflective surfaces could be a barrier to learning, and it is important that teachers are aware of this. There is currently limited research data about teachers' knowledge regarding the topic of classroom acoustics. Seventy teachers from three Johannesburg primary schools participated in this study. A survey by way of structured self-administered questionnaire was the primary data collection method. The findings of this study showed that most of the participants in this study did not have adequate knowledge of classroom acoustics. Most of the participants were also unaware of the impact that classrooms with poor acoustic environments can have on speech perception and learning. These results are discussed in relation to the practical implication of empowering teachers to manage the acoustic environment of their classrooms, limitations of the study as well as implications for future research.

  16. Auditory Perception, Suprasegmental Speech Processing, and Vocabulary Development in Chinese Preschoolers.

    Science.gov (United States)

    Wang, Hsiao-Lan S; Chen, I-Chen; Chiang, Chun-Han; Lai, Ying-Hui; Tsao, Yu

    2016-10-01

    The current study examined the associations between basic auditory perception, speech prosodic processing, and vocabulary development in Chinese kindergartners, specifically, whether early basic auditory perception may be related to linguistic prosodic processing in Chinese Mandarin vocabulary acquisition. A series of language, auditory, and linguistic prosodic tests were given to 100 preschool children who had not yet learned how to read Chinese characters. The results suggested that lexical tone sensitivity and intonation production were significantly correlated with children's general vocabulary abilities. In particular, tone awareness was associated with comprehensive language development, whereas intonation production was associated with both comprehensive and expressive language development. Regression analyses revealed that tone sensitivity accounted for 36% of the unique variance in vocabulary development, whereas intonation production accounted for 6% of the variance in vocabulary development. Moreover, auditory frequency discrimination was significantly correlated with lexical tone sensitivity, syllable duration discrimination, and intonation production in Mandarin Chinese. Also it provided significant contributions to tone sensitivity and intonation production. Auditory frequency discrimination may indirectly affect early vocabulary development through Chinese speech prosody. PMID:27519239

  17. Audiovisual quality assessment and prediction for videotelephony

    CERN Document Server

    Belmudez, Benjamin

    2015-01-01

    The work presented in this book focuses on modeling audiovisual quality as perceived by the users of IP-based solutions for video communication like videotelephony. It also extends the current framework for the parametric prediction of audiovisual call quality. The book addresses several aspects related to the quality perception of entire video calls, namely, the quality estimation of the single audio and video modalities in an interactive context, the audiovisual quality integration of these modalities and the temporal pooling of short sample-based quality scores to account for the perceptual quality impact of time-varying degradations.

  18. The socially-weighted encoding of spoken words: A dual-route approach to speech perception

    Directory of Open Access Journals (Sweden)

    Meghan eSumner

    2014-01-01

    Full Text Available Spoken words are highly variable. A single word may never be uttered the same way twice. As listeners, we regularly encounter speakers of different ages, genders, and accents, increasing the amount of variation we face. How listeners understand spoken words as quickly and adeptly as they do despite this variation remains an issue central to linguistic theory. We propose that learned acoustic patterns are mapped simultaneously to linguistic representations and to social representations. In doing so, we illuminate a paradox that results in the literature from, we argue, the focus on representations and the peripheral treatment of word-level phonetic variation. We consider phonetic variation more fully and highlight a growing body of work that is problematic for current theory: Words with different pronunciation variants are recognized equally well in immediate processing tasks, while an atypical, infrequent, but socially-idealized form is remembered better in the long-term. We suggest that the perception of spoken words is socially-weighted, resulting in sparse, but high-resolution clusters of socially-idealized episodes that are robust in immediate processing and are more strongly encoded, predicting memory inequality. Our proposal includes a dual-route approach to speech perception in which listeners map acoustic patterns in speech to linguistic and social representations in tandem. This approach makes novel predictions about the extraction of information from the speech signal, and provides a framework with which we can ask new questions. We propose that language comprehension, broadly, results from the integration of both linguistic and social information.

  19. Instruction of foreign language pragmatics: the teaching and acquisition of multiple speech acts using an explicit focus on forms approach, audiovisual input and conversation analysis tools

    OpenAIRE

    Rodríguez Peñarroja, Manuel

    2016-01-01

    This thesis describes the teaching and learning of multiple speech acts from an interlanguage pragmatics perspective since the already existing materials for that purpose have been considered as impoverished in terms of reflecting the use of language in its context. The first chapter "Pragmatics and Speech Act theory" includes the description of Pragmatics as the main area of study which this thesis is based on. It also includes the description of concepts related to pragmatics, such as speec...

  20. Audiovisual Interaction

    DEFF Research Database (Denmark)

    Karandreas, Theodoros-Alexandros

    given product is rarely perceived in isolation, but rather judged within a global context which includes information from all modalities (senses). This PhD thesis investigates the relative importance of audio and visual information in subjective evaluations of a product. A multimodal setup was developed......Product sound quality evaluation aims to identify relevant attributes and assess their influence on the overall auditory impression. This results in an accurate representation of the product in a singular modality - usually the one primarily associated with the product's main function. However, any...... in a manner that allowed the subjective audiovisual evaluation of loudspeakers under controlled conditions. Additionally, unimodal audio and visual evaluations were used as a baseline for comparison. The same procedure was applied in the investigation of the validity of less than optimal stimuli presentations...

  1. The neural processing of foreign-accented speech and its relationship to listener bias

    Directory of Open Access Journals (Sweden)

    Han-Gyol eYi

    2014-10-01

    Full Text Available Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners’ perceptual ability to process Korean-accented English audiovisual speech (Yi et al., 2013. Here, we examine the neural mechanism underlying the influence of listener bias to foreign faces on speech perception. In a functional magnetic resonance imaging (fMRI study, native English speakers listened to native- and Korean-accented English sentences, with or without faces. The participants’ Asian-foreign association was measured using an implicit association test (IAT, conducted outside the scanner. We found that foreign-accented speech evoked greater activity in the bilateral primary auditory cortices and the inferior frontal gyri, potentially reflecting greater computational demand. Higher IAT scores, indicating greater bias, were associated with increased BOLD response to foreign-accented speech with faces in the primary auditory cortex, the early node for spectrotemporal analysis. We conclude the following: (1 foreign-accented speech perception places greater demand on the neural systems underlying speech perception; (2 face of the talker can exaggerate the perceived foreignness of foreign-accented speech; (3 implicit Asian-foreign association is associated with decreased neural efficiency in early spectrotemporal processing.

  2. Auditory, Visual, and Auditory-Visual Speech Perception by Individuals with Cochlear Implants versus Individuals with Hearing Aids

    Science.gov (United States)

    Most, Tova; Rothem, Hilla; Luntz, Michal

    2009-01-01

    The researchers evaluated the contribution of cochlear implants (CIs) to speech perception by a sample of prelingually deaf individuals implanted after age 8 years. This group was compared with a group with profound hearing impairment (HA-P), and with a group with severe hearing impairment (HA-S), both of which used hearing aids. Words and…

  3. Auditory Sensitivity, Speech Perception, L1 Chinese, and L2 English Reading Abilities in Hong Kong Chinese Children

    Science.gov (United States)

    Zhang, Juan; McBride-Chang, Catherine

    2014-01-01

    A 4-stage developmental model, in which auditory sensitivity is fully mediated by speech perception at both the segmental and suprasegmental levels, which are further related to word reading through their associations with phonological awareness, rapid automatized naming, verbal short-term memory and morphological awareness, was tested with…

  4. Individual Differences in Language Ability Are Related to Variation in Word Recognition, Not Speech Perception: Evidence from Eye Movements

    Science.gov (United States)

    McMurray, Bob; Munson, Cheyenne; Tomblin, J. Bruce

    2014-01-01

    Purpose: The authors examined speech perception deficits associated with individual differences in language ability, contrasting auditory, phonological, or lexical accounts by asking whether lexical competition is differentially sensitive to fine-grained acoustic variation. Method: Adolescents with a range of language abilities (N = 74, including…

  5. Thinking outside the (Voice) Box: A Case Study of Students' Perceptions of the Relevance of Anatomy to Speech Pathology

    Science.gov (United States)

    Weir, Kristy A.

    2008-01-01

    Speech pathology students readily identify the importance of a sound understanding of anatomical structures central to their intended profession. In contrast, they often do not recognize the relevance of a broader understanding of structure and function. This study aimed to explore students' perceptions of the relevance of anatomy to speech…

  6. The Neurobiology of Speech Perception and Production-Can Functional Imaging Tell Us Anything We Did Not Already Know?

    Science.gov (United States)

    Scott, Sophie K.

    2012-01-01

    Our understanding of the neurobiological basis for human speech production and perception has benefited from insights from psychology, neuropsychology and neurology. In this overview, I outline some of the ways that functional imaging has added to this knowledge and argue that, as a neuroanatomical tool, functional imaging has led to some…

  7. Comparison of Word-, Sentence-, and Phoneme-Based Training Strategies in Improving the Perception of Spectrally Distorted Speech

    Science.gov (United States)

    Stacey, Paula C.; Summerfield, A. Quentin

    2008-01-01

    Purpose: To compare the effectiveness of 3 self-administered strategies for auditory training that might improve speech perception by adult users of cochlear implants. The strategies are based, respectively, on discriminating isolated words, words in sentences, and phonemes in nonsense syllables. Method: Participants were 18 normal-hearing adults…

  8. The Effect of Frequency Transposition on Speech Perception in Adolescents and Young Adults with Profound Hearing Loss

    Science.gov (United States)

    Gou, J.; Smith, J.; Valero, J.; Rubio, I.

    2011-01-01

    This paper reports on a clinical trial evaluating outcomes of a frequency-lowering technique for adolescents and young adults with severe to profound hearing impairment. Outcomes were defined by changes in aided thresholds, speech perception, and acceptance. The participants comprised seven young people aged between 13 and 25 years. They were…

  9. Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes

    Directory of Open Access Journals (Sweden)

    Annalisa eSetti

    2013-09-01

    Full Text Available Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the ‘McGurk illusion’, in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.

  10. Modeling auditory processing and speech perception in hearing-impaired listeners

    DEFF Research Database (Denmark)

    Jepsen, Morten Løve

    . It was shown that most observations in the measured consonant discrimination error patterns were predicted by the model, although error rates were systematically underestimated by the model in few particular acoustic-phonetic features. These results reflect a relation between basic auditory processing deficits....... It was shown that an accurate simulation of cochlear input-output functions, in addition to the audiogram, played a major role in accounting both for sensitivity and supra-threshold processing. Finally, the model was used as a front-end in a framework developed to predict consonant discrimination...... and reduced speech perception performance in the listeners with cochlear hearing loss. Overall, this work suggests a possible explanation of the variability in consequences of cochlear hearing loss. The proposed model might be an interesting tool for, e.g., evaluation of hearing-aid signal processing....

  11. Historia audiovisual para una sociedad audiovisual

    Directory of Open Access Journals (Sweden)

    Julio Montero Díaz

    2013-04-01

    Full Text Available This article analyzes the possibilities of presenting an audiovisual history in a society in which audiovisual media has progressively gained greater protagonism. We analyze specific cases of films and historical documentaries and we assess the difficulties faced by historians to understand the keys of audiovisual language and by filmmakers to understand and incorporate history into their productions. We conclude that it would not be possible to disseminate history in the western world without audiovisual resources circulated through various types of screens (cinema, television, computer, mobile phone, video games.

  12. Temporal dynamics of sensorimotor integration in speech perception and production: Independent component analysis of EEG data

    Directory of Open Access Journals (Sweden)

    David eJenson

    2014-07-01

    Full Text Available Activity in premotor and sensorimotor cortices is found in speech production and some perception tasks. Yet, how sensorimotor integration supports these functions is unclear due to a lack of data examining the timing of activity from these regions. Beta (~20Hz and alpha (~10Hz spectral power within the EEG µ rhythm are considered indices of motor and somatosensory activity, respectively. In the current study, perception conditions required discrimination (same/different of syllables pairs (/ba/ and /da/ in quiet and noisy conditions. Production conditions required covert and overt syllable productions and overt word production. Independent component analysis was performed on EEG data obtained during these conditions to 1 identify clusters of µ components common to all conditions and 2 examine real-time event-related spectral perturbations (ERSP within alpha and beta bands. 17 and 15 out of 20 participants produced left and right µ-components, respectively, localized to precentral gyri. Discrimination conditions were characterized by significant (pFDR<.05 early alpha event-related synchronization (ERS prior to and during stimulus presentation and later alpha event-related desynchronization (ERD following stimulus offset. Beta ERD began early and gained strength across time. Differences were found between quiet and noisy discrimination conditions. Both overt syllable and word productions yielded similar alpha/beta ERD that began prior to production and was strongest during muscle activity. Findings during covert production were weaker than during overt production. One explanation for these findings is that µ-beta ERD indexes early predictive coding (e.g., internal modeling and/or overt and covert attentional / motor processes. µ-alpha ERS may index inhibitory input to the premotor cortex from sensory regions prior to and during discrimination, while µ-alpha ERD may index re-afferent sensory feedback during speech rehearsal and production.

  13. Effects of language experience and stimulus context on the neural organization and categorical perception of speech.

    Science.gov (United States)

    Bidelman, Gavin M; Lee, Chia-Cheng

    2015-10-15

    Categorical perception (CP) represents a fundamental process in converting continuous speech acoustics into invariant percepts. Using scalp-recorded event-related brain potentials (ERPs), we investigated how tone-language experience and stimulus context influence the CP for lexical tones-pitch patterns used by a majority of the world's languages to signal word meaning. Stimuli were vowel pairs overlaid with a high-level tone (T1) followed by a pitch continuum spanning between dipping (T3) and rising (T2) contours of the Mandarin tonal space. To vary context, T1 either preceded or followed the critical T2/T3 continuum. Behaviorally, native Chinese showed stronger CP as evident by their steeper, more dichotomous psychometric functions and faster identification of linguistic pitch patterns than native English-speaking controls. Stimulus context produced shifts in both groups' categorical boundary but was more exaggerated in native listeners. Analysis of source activity extracted from primary auditory cortex revealed overall stronger neural encoding of tone in Chinese compared to English, indicating experience-dependent plasticity in cortical pitch processing. More critically, "neurometric" functions derived from multidimensional scaling and clustering of source ERPs established: (i) early auditory cortical activity could accurately predict listeners' psychometric speech identification and contextual shifts in the perceptual boundary; (ii) neurometric profiles were organized more categorically in native speakers. Our data show that tone-language experience refines early auditory cortical brain representations so as to supply more faithful templates to neural mechanisms subserving lexical pitch categorization. We infer that contextual influence on the CP for tones is determined by language experience and the frequency of pitch patterns as they occur in listeners' native lexicon. PMID:26146197

  14. Effects of language experience and stimulus context on the neural organization and categorical perception of speech.

    Science.gov (United States)

    Bidelman, Gavin M; Lee, Chia-Cheng

    2015-10-15

    Categorical perception (CP) represents a fundamental process in converting continuous speech acoustics into invariant percepts. Using scalp-recorded event-related brain potentials (ERPs), we investigated how tone-language experience and stimulus context influence the CP for lexical tones-pitch patterns used by a majority of the world's languages to signal word meaning. Stimuli were vowel pairs overlaid with a high-level tone (T1) followed by a pitch continuum spanning between dipping (T3) and rising (T2) contours of the Mandarin tonal space. To vary context, T1 either preceded or followed the critical T2/T3 continuum. Behaviorally, native Chinese showed stronger CP as evident by their steeper, more dichotomous psychometric functions and faster identification of linguistic pitch patterns than native English-speaking controls. Stimulus context produced shifts in both groups' categorical boundary but was more exaggerated in native listeners. Analysis of source activity extracted from primary auditory cortex revealed overall stronger neural encoding of tone in Chinese compared to English, indicating experience-dependent plasticity in cortical pitch processing. More critically, "neurometric" functions derived from multidimensional scaling and clustering of source ERPs established: (i) early auditory cortical activity could accurately predict listeners' psychometric speech identification and contextual shifts in the perceptual boundary; (ii) neurometric profiles were organized more categorically in native speakers. Our data show that tone-language experience refines early auditory cortical brain representations so as to supply more faithful templates to neural mechanisms subserving lexical pitch categorization. We infer that contextual influence on the CP for tones is determined by language experience and the frequency of pitch patterns as they occur in listeners' native lexicon.

  15. Compliments in Audiovisual Translation – issues in character identity

    Directory of Open Access Journals (Sweden)

    Isabel Fernandes Silva

    2011-12-01

    Full Text Available Over the last decades, audiovisual translation has gained increased significance in Translation Studies as well as an interdisciplinary subject within other fields (media, cinema studies etc. Although many articles have been published on communicative aspects of translation such as politeness, only recently have scholars taken an interest in the translation of compliments. This study will focus on both these areas from a multimodal and pragmatic perspective, emphasizing the links between these fields and how this multidisciplinary approach will evidence the polysemiotic nature of the translation process. In Audiovisual Translation both text and image are at play, therefore, the translation of speech produced by the characters may either omit (because it is provided by visualgestual signs or it may emphasize information. A selection was made of the compliments present in the film What Women Want, our focus being on subtitles which did not successfully convey the compliment expressed in the source text, as well as analyze the reasons for this, namely difference in register, Culture Specific Items and repetitions. These differences lead to a different portrayal/identity/perception of the main character in the English version (original soundtrack and subtitled versions in Portuguese and Italian.

  16. Discrimination of static and dynamic spectral patterns by children and young adults in relationship to speech perception in noise

    Directory of Open Access Journals (Sweden)

    Hanin Rayes

    2014-03-01

    Full Text Available Past work has shown relationship between the ability to discriminate spectral patterns and measures of speech intelligibility. The purpose of this study was to investigate the ability of both children and young adults to discriminate static and dynamic spectral patterns, comparing performance between the two groups and evaluating within- group results in terms of relationship to speech-in-noise perception. Data were collected from normal-hearing children (age range: 5.4-12.8 years and young adults (mean age: 22.8 years on two spectral discrimination tasks and speech-in-noise perception. The first discrimination task, involving static spectral profiles, measured the ability to detect a change in the phase of a low-density sinusoidal spectral ripple of wideband noise. Using dynamic spectral patterns, the second task determined the signal-to-noise ratio needed to discriminate the temporal pattern of frequency fluctuation imposed by stochastic lowrate frequency modulation (FM. Children performed significantly poorer than young adults on both discrimination tasks. For children, a significant correlation between speech-in-noise perception and spectral- pattern discrimination was obtained only with the dynamic patterns of the FM condition, with partial correlation suggesting that factors related to the children’s age mediated the relationship.

  17. Sources of Variability in Consonant Perception and Implications for Speech Perception Modeling

    DEFF Research Database (Denmark)

    Zaar, Johannes; Dau, Torsten

    2016-01-01

    The  present  study  investigated  the  influence  of  various  sources  of response  variability  in  consonant  perception.  A  distinction  was  made  between source­induced variability and receiver­related variability. The former refers to perceptual differences induced by differences in the ......  and  of  similar magnitude. Even time­shifts in the  waveforms of white masking noise produced a significant effect, which was well above the within­listener  variability  (the  smallest effect). Two auditory...

  18. Neural networks for learning and prediction with applications to remote sensing and speech perception

    Science.gov (United States)

    Gjaja, Marin N.

    1997-11-01

    Neural networks for supervised and unsupervised learning are developed and applied to problems in remote sensing, continuous map learning, and speech perception. Adaptive Resonance Theory (ART) models are real-time neural networks for category learning, pattern recognition, and prediction. Unsupervised fuzzy ART networks synthesize fuzzy logic and neural networks, and supervised ARTMAP networks incorporate ART modules for prediction and classification. New ART and ARTMAP methods resulting from analyses of data structure, parameter specification, and category selection are developed. Architectural modifications providing flexibility for a variety of applications are also introduced and explored. A new methodology for automatic mapping from Landsat Thematic Mapper (TM) and terrain data, based on fuzzy ARTMAP, is developed. System capabilities are tested on a challenging remote sensing problem, prediction of vegetation classes in the Cleveland National Forest from spectral and terrain features. After training at the pixel level, performance is tested at the stand level, using sites not seen during training. Results are compared to those of maximum likelihood classifiers, back propagation neural networks, and K-nearest neighbor algorithms. Best performance is obtained using a hybrid system based on a convex combination of fuzzy ARTMAP and maximum likelihood predictions. This work forms the foundation for additional studies exploring fuzzy ARTMAP's capability to estimate class mixture composition for non-homogeneous sites. Exploratory simulations apply ARTMAP to the problem of learning continuous multidimensional mappings. A novel system architecture retains basic ARTMAP properties of incremental and fast learning in an on-line setting while adding components to solve this class of problems. The perceptual magnet effect is a language-specific phenomenon arising early in infant speech development that is characterized by a warping of speech sound perception. An

  19. Parents and Speech Therapist Perception of Parental Involvement in Kailila Therapy Center, Jakarta, Indonesia

    Science.gov (United States)

    Jane, Griselda; Tunjungsari, Harini

    2015-01-01

    Parental involvement in a speech therapy has not been prioritized in most therapy centers in Indonesia. One of the therapy centers that has recognized the importance of parental involvement is Kailila Speech Therapy Center. In Kailila speech therapy center, parental involvement in children's speech therapy is an obligation that has been…

  20. Language and Speech Processing

    CERN Document Server

    Mariani, Joseph

    2008-01-01

    Speech processing addresses various scientific and technological areas. It includes speech analysis and variable rate coding, in order to store or transmit speech. It also covers speech synthesis, especially from text, speech recognition, including speaker and language identification, and spoken language understanding. This book covers the following topics: how to realize speech production and perception systems, how to synthesize and understand speech using state-of-the-art methods in signal processing, pattern recognition, stochastic modelling computational linguistics and human factor studi

  1. On the matching of top-down knowledge with sensory input in the perception of ambiguous speech

    Directory of Open Access Journals (Sweden)

    Hannemann R

    2010-06-01

    Full Text Available Abstract Background How does the brain repair obliterated speech and cope with acoustically ambivalent situations? A widely discussed possibility is to use top-down information for solving the ambiguity problem. In the case of speech, this may lead to a match of bottom-up sensory input with lexical expectations resulting in resonant states which are reflected in the induced gamma-band activity (GBA. Methods In the present EEG study, we compared the subject's pre-attentive GBA responses to obliterated speech segments presented after a series of correct words. The words were a minimal pair in German and differed with respect to the degree of specificity of segmental phonological information. Results The induced GBA was larger when the expected lexical information was phonologically fully specified compared to the underspecified condition. Thus, the degree of specificity of phonological information in the mental lexicon correlates with the intensity of the matching process of bottom-up sensory input with lexical information. Conclusions These results together with those of a behavioural control experiment support the notion of multi-level mechanisms involved in the repair of deficient speech. The delineated alignment of pre-existing knowledge with sensory input is in accordance with recent ideas about the role of internal forward models in speech perception.

  2. Autonomic nervous system responses during perception of masked speech may reflect constructs other than subjective listening effort

    Directory of Open Access Journals (Sweden)

    Alexander L. Francis

    2016-03-01

    Full Text Available Typically, understanding speech seems effortless and automatic. However, a variety of factors may, independently or interactively, make listening more effortful. Physiological measures may help to distinguish between the application of different cognitive mechanisms whose operation is perceived as effortful. In the present study, physiological and behavioral measures associated with task demand were collected along with behavioral measures of performance while participants listened to and repeated sentences. The goal was to measure psychophysiological reactivity associated with three degraded listening conditions, each of which differed in terms of the source of the difficulty (distortion, energetic masking, and informational masking, and therefore were expected to engage different cognitive mechanisms. These conditions were chosen to be matched for overall performance (keywords correct, and were compared to listening to unmasked speech produced by a natural voice. The three degraded conditions were: (1 Unmasked speech produced by a computer speech synthesizer, (2 Speech produced by a natural voice and masked by speech-shaped noise and (3 Speech produced by a natural voice and masked by two-talker babble. Masked conditions were both presented at a -8 dB signal to noise ratio (SNR, a level shown in previous research to result in comparable levels of performance for these stimuli and maskers. Performance was measured in terms of proportion of key words identified correctly, and task demand or effort was quantified subjectively by self-report. Measures of psychophysiological reactivity included electrodermal (skin conductance response frequency and amplitude, blood pulse amplitude and pulse rate. Results suggest that the two masked conditions evoked stronger psychophysiological reactivity than did the two unmasked conditions even when behavioral measures of listening performance and listeners’ subjective perception of task demand were comparable

  3. How musical expertise shapes speech perception: Evidence from auditory classification images

    OpenAIRE

    Léo Varnet; Tianyun Wang; Chloe Peter; Fanny Meunier; Michel Hoen

    2015-01-01

    It is now well established that extensive musical training percolates to higher levels of cognition, such as speech processing. However, the lack of a precise technique to investigate the specific listening strategy involved in speech comprehension has made it difficult to determine how musicians’ higher performance in non-speech tasks contributes to their enhanced speech comprehension. The recently developed Auditory Classification Image approach reveals the precise time-frequency regions us...

  4. Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy

    Science.gov (United States)

    Ramirez, Joshua; Mann, Virginia

    2005-08-01

    Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.

  5. The effect of speaking rate on perception of syllables in second-language speech

    Science.gov (United States)

    Tajima, Keiichi; Akahane-Yamada, Reiko

    2005-04-01

    Past studies on second-language (L2) speech perception have suggested that L2 learners have difficulty exploiting contextual information when perceiving L2 utterances, and that they exhibit greater difficulty than native listeners when faced with variability in temporal context. The present study investigated the extent to which native Japanese listeners, who are known to have difficulties perceiving English syllables, are influenced by changes in speaking rate when asked to count syllables in spoken English words. The stimuli consisted of a set of English words and nonwords varying in syllable structure spoken at three rates by a native English speaker. The stimuli produced at the three rates were presented to native Japanese listeners in a random order. Results indicated that listeners' identification accuracy did not vary as a function of speaking rate, although it decreased significantly as the syllable structure of the stimuli became more complex. Moreover, even though speaking rate varied from trial to trial, Japanese listeners' performance did not decline compared to a condition in which the speaking rate was fixed. Theoretical and practical implications of these findings will be discussed. [Work supported by JSPS and NICT.

  6. Change in Speech Perception and Auditory Evoked Potentials over Time after Unilateral Cochlear Implantation in Postlingually Deaf Adults.

    Science.gov (United States)

    Purdy, Suzanne C; Kelly, Andrea S

    2016-02-01

    Speech perception varies widely across cochlear implant (CI) users and typically improves over time after implantation. There is also some evidence for improved auditory evoked potentials (shorter latencies, larger amplitudes) after implantation but few longitudinal studies have examined the relationship between behavioral and evoked potential measures after implantation in postlingually deaf adults. The relationship between speech perception and auditory evoked potentials was investigated in newly implanted cochlear implant users from the day of implant activation to 9 months postimplantation, on five occasions, in 10 adults age 27 to 57 years who had been bilaterally profoundly deaf for 1 to 30 years prior to receiving a unilateral CI24 cochlear implant. Changes over time in middle latency response (MLR), mismatch negativity, and obligatory cortical auditory evoked potentials and word and sentence speech perception scores were examined. Speech perception improved significantly over the 9-month period. MLRs varied and showed no consistent change over time. Three participants aged in their 50s had absent MLRs. The pattern of change in N1 amplitudes over the five visits varied across participants. P2 area increased significantly for 1,000- and 4,000-Hz tones but not for 250 Hz. The greatest change in P2 area occurred after 6 months of implant experience. Although there was a trend for mismatch negativity peak latency to reduce and width to increase after 3 months of implant experience, there was considerable variability and these changes were not significant. Only 60% of participants had a detectable mismatch initially; this increased to 100% at 9 months. The continued change in P2 area over the period evaluated, with a trend for greater change for right hemisphere recordings, is consistent with the pattern of incremental change in speech perception scores over time. MLR, N1, and mismatch negativity changes were inconsistent and hence P2 may be a more robust measure

  7. Mapping the Developmental Trajectory and Correlates of Enhanced Pitch Perception on Speech Processing in Adults with ASD.

    Science.gov (United States)

    Mayer, Jennifer L; Hannent, Ian; Heaton, Pamela F

    2016-05-01

    Whilst enhanced perception has been widely reported in individuals with Autism Spectrum Disorders (ASDs), relatively little is known about the developmental trajectory and impact of atypical auditory processing on speech perception in intellectually high-functioning adults with ASD. This paper presents data on perception of complex tones and speech pitch in adult participants with high-functioning ASD and typical development, and compares these with pre-existing data using the same paradigm with groups of children and adolescents with and without ASD. As perceptual processing abnormalities are likely to influence behavioural performance, regression analyses were carried out on the adult data set. The findings revealed markedly different pitch discrimination trajectories and language correlates across diagnostic groups. While pitch discrimination increased with age and correlated with receptive vocabulary in groups without ASD, it was enhanced in childhood and stable across development in ASD. Pitch discrimination scores did not correlate with receptive vocabulary scores in the ASD group and for adults with ASD superior pitch perception was associated with sensory atypicalities and diagnostic measures of symptom severity. We conclude that the development of pitch discrimination, and its associated mechanisms markedly distinguish those with and without ASD. PMID:25106823

  8. Autonomic Nervous System Responses During Perception of Masked Speech may Reflect Constructs other than Subjective Listening Effort.

    Science.gov (United States)

    Francis, Alexander L; MacPherson, Megan K; Chandrasekaran, Bharath; Alvar, Ann M

    2016-01-01

    Typically, understanding speech seems effortless and automatic. However, a variety of factors may, independently or interactively, make listening more effortful. Physiological measures may help to distinguish between the application of different cognitive mechanisms whose operation is perceived as effortful. In the present study, physiological and behavioral measures associated with task demand were collected along with behavioral measures of performance while participants listened to and repeated sentences. The goal was to measure psychophysiological reactivity associated with three degraded listening conditions, each of which differed in terms of the source of the difficulty (distortion, energetic masking, and informational masking), and therefore were expected to engage different cognitive mechanisms. These conditions were chosen to be matched for overall performance (keywords correct), and were compared to listening to unmasked speech produced by a natural voice. The three degraded conditions were: (1) Unmasked speech produced by a computer speech synthesizer, (2) Speech produced by a natural voice and masked byspeech-shaped noise and (3) Speech produced by a natural voice and masked by two-talker babble. Masked conditions were both presented at a -8 dB signal to noise ratio (SNR), a level shown in previous research to result in comparable levels of performance for these stimuli and maskers. Performance was measured in terms of proportion of key words identified correctly, and task demand or effort was quantified subjectively by self-report. Measures of psychophysiological reactivity included electrodermal (skin conductance) response frequency and amplitude, blood pulse amplitude and pulse rate. Results suggest that the two masked conditions evoked stronger psychophysiological reactivity than did the two unmasked conditions even when behavioral measures of listening performance and listeners' subjective perception of task demand were comparable across the three

  9. 非言语声音影响汉语听者言语声音的知觉%The Non-speech Sounds Affect the Perception of Speech Sounds in Chinese Listeners

    Institute of Scientific and Technical Information of China (English)

    刘文理; 乐国安

    2012-01-01

    采用启动范式,以汉语听者为被试,考察了非言语声音是否影响言语声音的知觉.实验1考察了纯音对辅音范畴连续体知觉的影响,结果发现纯音影响到辅音范畴连续体的知觉,表现出频谱对比效应.实验2考察了纯音和复合音对元音知觉的影响,结果发现与元音共振峰频率一致的纯音或复合音加快了元音的识别,表现出启动效应.两个实验一致发现非言语声音能够影响言语声音的知觉,表明言语声音知觉也需要一个前言语的频谱特征分析阶段,这与言语知觉听觉理论的观点一致.%A long-standing debate in the field of speech perception concerns whether specialized processing mechanisms are necessary to perceive speech sounds. The motor theory argues that speech perception is a special process and non-speech sounds don't affect the perception of speech sounds. The auditory theory suggests that speech perception can be understood in terms of general auditory process, which is shared with the perception of non-speech sounds. The findings from English subjects indicate that the processing of non-speech sounds affects the perception of speech sounds. Few studies have been administered in Chinese. The present study administered two experiments to examine whether the processing of non-speech sounds could affect the perception of speech segments in Chinese listeners. In experiment 1, speech sounds were a continuum of synthesized consonant category ranging from /ba/ to /da/. Non-speech sounds were two sine wave tones, with frequency equal to the onset frequency of F2 of/ba/ and /da/, respectively. Following the two tones, the /ba/-/da/ series were presented with a 50ms ISI. Undergraduate participants were asked to identify the speech sounds. The results found that non-speech tones influenced identification of speech targets: when the frequency of tone was equal to F2 onset frequency of /ba/, participants were more likely to identify consonant

  10. Reduced audiovisual recalibration in the elderly.

    Science.gov (United States)

    Chan, Yu Man; Pianta, Michael J; McKendrick, Allison M

    2014-01-01

    Perceived synchrony of visual and auditory signals can be altered by exposure to a stream of temporally offset stimulus pairs. Previous literature suggests that adapting to audiovisual temporal offsets is an important recalibration to correctly combine audiovisual stimuli into a single percept across a range of source distances. Healthy aging results in synchrony perception over a wider range of temporally offset visual and auditory signals, independent of age-related unisensory declines in vision and hearing sensitivities. However, the impact of aging on audiovisual recalibration is unknown. Audiovisual synchrony perception for sound-lead and sound-lag stimuli was measured for 15 younger (22-32 years old) and 15 older (64-74 years old) healthy adults using a method-of-constant-stimuli, after adapting to a stream of visual and auditory pairs. The adaptation pairs were either synchronous or asynchronous (sound-lag of 230 ms). The adaptation effect for each observer was computed as the shift in the mean of the individually fitted psychometric functions after adapting to asynchrony. Post-adaptation to synchrony, the younger and older observers had average window widths (±standard deviation) of 326 (±80) and 448 (±105) ms, respectively. There was no adaptation effect for sound-lead pairs. Both the younger and older observers, however, perceived more sound-lag pairs as synchronous. The magnitude of the adaptation effect in the older observers was not correlated with how often they saw the adapting sound-lag stimuli as asynchronous. Our finding demonstrates that audiovisual synchrony perception adapts less with advancing age.

  11. Speech perception with interaction-compensated simultaneous stimulation and long pulse durations in cochlear implant users.

    Science.gov (United States)

    Schatzer, Reinhold; Koroleva, Inna; Griessner, Andreas; Levin, Sergey; Kusovkov, Vladislav; Yanov, Yuri; Zierhofer, Clemens

    2015-04-01

    Early multi-channel designs in the history of cochlear implant development were based on a vocoder-type processing of frequency channels and presented bands of compressed analog stimulus waveforms simultaneously on multiple tonotopically arranged electrodes. The realization that the direct summation of electrical fields as a result of simultaneous electrode stimulation exacerbates interactions among the stimulation channels and limits cochlear implant outcome led to the breakthrough in the development of cochlear implants, the continuous interleaved (CIS) sampling coding strategy. By interleaving stimulation pulses across electrodes, CIS activates only a single electrode at each point in time, preventing a direct summation of electrical fields and hence the primary component of channel interactions. In this paper we show that a previously presented approach of simultaneous stimulation with channel interaction compensation (CIC) may also ameliorate the deleterious effects of simultaneous channel interaction on speech perception. In an acute study conducted in eleven experienced MED-EL implant users, configurations involving simultaneous stimulation with CIC and doubled pulse phase durations have been investigated. As pairs of electrodes were activated simultaneously and pulse durations were doubled, carrier rates remained the same. Comparison conditions involved both CIS and fine structure (FS) strategies, either with strictly sequential or paired-simultaneous stimulation. Results showed no statistical difference in the perception of sentences in noise and monosyllables for sequential and paired-simultaneous stimulation with doubled phase durations. This suggests that CIC can largely compensate for the effects of simultaneous channel interaction, for both CIS and FS coding strategies. A simultaneous stimulation paradigm has a number of potential advantages over a traditional sequential interleaved design. The flexibility gained when dropping the requirement of

  12. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    Science.gov (United States)

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  13. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity

    Science.gov (United States)

    Baese-Berk, Melissa M.; Dilley, Laura C.; Schmidt, Stephanie; Morrill, Tuuli H.; Pitt, Mark A.

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate. PMID:27603209

  14. Revisiting Neil Armstrongs Moon-Landing Quote: Implications for Speech Perception, Function Word Reduction, and Acoustic Ambiguity.

    Science.gov (United States)

    Baese-Berk, Melissa M; Dilley, Laura C; Schmidt, Stephanie; Morrill, Tuuli H; Pitt, Mark A

    2016-01-01

    Neil Armstrong insisted that his quote upon landing on the moon was misheard, and that he had said one small step for a man, instead of one small step for man. What he said is unclear in part because function words like a can be reduced and spectrally indistinguishable from the preceding context. Therefore, their presence can be ambiguous, and they may disappear perceptually depending on the rate of surrounding speech. Two experiments are presented examining production and perception of reduced tokens of for and for a in spontaneous speech. Experiment 1 investigates the distributions of several acoustic features of for and for a. The results suggest that the distributions of for and for a overlap substantially, both in terms of temporal and spectral characteristics. Experiment 2 examines perception of these same tokens when the context speaking rate differs. The perceptibility of the function word a varies as a function of this context speaking rate. These results demonstrate that substantial ambiguity exists in the original quote from Armstrong, and that this ambiguity may be understood through context speaking rate.

  15. Audiovisual integration facilitates unconscious visual scene processing.

    Science.gov (United States)

    Tan, Jye-Sheng; Yeh, Su-Ling

    2015-10-01

    Meanings of masked complex scenes can be extracted without awareness; however, it remains unknown whether audiovisual integration occurs with an invisible complex visual scene. The authors examine whether a scenery soundtrack can facilitate unconscious processing of a subliminal visual scene. The continuous flash suppression paradigm was used to render a complex scene picture invisible, and the picture was paired with a semantically congruent or incongruent scenery soundtrack. Participants were asked to respond as quickly as possible if they detected any part of the scene. Release-from-suppression time was used as an index of unconscious processing of the complex scene, which was shorter in the audiovisual congruent condition than in the incongruent condition (Experiment 1). The possibility that participants adopted different detection criteria for the 2 conditions was excluded (Experiment 2). The audiovisual congruency effect did not occur for objects-only (Experiment 3) and background-only (Experiment 4) pictures, and it did not result from consciously mediated conceptual priming (Experiment 5). The congruency effect was replicated when catch trials without scene pictures were added to exclude participants with high false-alarm rates (Experiment 6). This is the first study demonstrating unconscious audiovisual integration with subliminal scene pictures, and it suggests expansions of scene-perception theories to include unconscious audiovisual integration.

  16. A Case Study of Parental Perceptions of Literacy Skill Development for Severe Speech Impairments

    Science.gov (United States)

    Sweat, Karen

    2014-01-01

    Students exhibiting speech deficits may not have the appropriate skills or support structures necessary to obtain adequate or acceptable literacy development as mixed results from past research have indicated that some students with speech impairments have the capacity to gain appropriate literacy skills. The purpose of the qualitative holistic…

  17. The Effect of Hearing Loss on the Perception of Infant- and Adult-Directed Speech

    Science.gov (United States)

    Robertson, Susie; von Hapsburg, Deborah; Hay, Jessica S.

    2013-01-01

    Purpose: Infant-directed speech (IDS) facilitates language learning in infants with normal hearing, compared to adult-directed speech (ADS). It is well established that infants with normal hearing prefer to listen to IDS over ADS. The purpose of this study was to determine whether infants with hearing impairment (HI), like their NH peers, show a…

  18. Speech across species : on the mechanistic fundamentals of vocal production and perception

    NARCIS (Netherlands)

    Ohms, Verena Regina

    2011-01-01

    Birdsong and human speech are both complex behaviours which show striking similarities mainly thought to be present in the area of development and learning. The most important parameters in human speech are vocal tract resonances, called formants. Different formant patterns characterize different vo

  19. Audiovisual Simultaneity Judgment and Rapid Recalibration throughout the Lifespan.

    Science.gov (United States)

    Noel, Jean-Paul; De Niear, Matthew; Van der Burg, Erik; Wallace, Mark T

    2016-01-01

    Multisensory interactions are well established to convey an array of perceptual and behavioral benefits. One of the key features of multisensory interactions is the temporal structure of the stimuli combined. In an effort to better characterize how temporal factors influence multisensory interactions across the lifespan, we examined audiovisual simultaneity judgment and the degree of rapid recalibration to paired audiovisual stimuli (Flash-Beep and Speech) in a sample of 220 participants ranging from 7 to 86 years of age. Results demonstrate a surprisingly protracted developmental time-course for both audiovisual simultaneity judgment and rapid recalibration, with neither reaching maturity until well into adolescence. Interestingly, correlational analyses revealed that audiovisual simultaneity judgments (i.e., the size of the audiovisual temporal window of simultaneity) and rapid recalibration significantly co-varied as a function of age. Together, our results represent the most complete description of age-related changes in audiovisual simultaneity judgments to date, as well as being the first to describe changes in the degree of rapid recalibration as a function of age. We propose that the developmental time-course of rapid recalibration scaffolds the maturation of more durable audiovisual temporal representations.

  20. Evaluation of temporal difference limen in preoperative non-invasive ear canal audiometry as a predictive factor for speech perception after cochlear implantation

    Directory of Open Access Journals (Sweden)

    Saku T. Sinkkonen

    2014-03-01

    Full Text Available The temporal difference limen (TDL can be measured with noninvasive electrical ear canal stimulation. The objective of the study wa to determine the role of preoperative TDL measurements in predicting patients’ speech perception after cochlear implantation. We carried out a retrospective chart analysis of fifty-four cochlear implant (CI patients with preoperative TDL and postoperative bisyllabic word recognition measurements in Helsinki University Central Hospital between March 1994 and March 2011. Our results show that there is no correlation between TDL and postoperative speech perception. However, patient’s advancing age correlates with longer TDL but notdirectly with poorer speech perception. The results are in line with previous results concerning the lack of predictive value of preoperativ TDL measurements in CI patients.

  1. Hearing (Rivaling Lips and Seeing Voices: How Audiovisual Interactions Modulate Perceptual Stabilization in Binocular Rivalry

    Directory of Open Access Journals (Sweden)

    Manuel eVidal

    2014-09-01

    Full Text Available In binocular rivalry (BR, sensory input remains the same yet subjective experience fluctuates irremediably between two mutually exclusive representations. We investigated the perceptual stabilization effect of an additional sound on the BR dynamics using speech stimuli known to involve robust audiovisual (AV interactions at several cortical levels. Subjects sensitive to the McGurk effect were presented looping videos of rivaling faces uttering /aba/ and /aga/ respectively, while synchronously hearing the voice /aba/. They reported continuously the dominant percept, either observing passively or trying actively to promote one of the faces. The few studies that investigated the influence of information from an external modality on perceptual competition reported results that seem at first sight inconsistent. Since these differences could stem from how well the modalities matched, we addressed this by comparing two levels of AV congruence: real (/aba/ viseme vs. illusory (/aga/ viseme producing the /ada/ McGurk fusion. First, adding the voice /aba/ stabilized both real and illusory congruent lips percept. Second, real congruence of the added voice improved volitional control whereas illusory congruence did not, suggesting a graded contribution to the top-down sensitivity control of selective attention. In conclusion, a congruent sound enhanced considerably attentional control over the perceptual outcome selection; however, differences between passive stabilization and active control according to AV congruency suggest these are governed by two distinct mechanisms. Based on existing theoretical models of BR, selective attention and AV interaction in speech perception, we provide a general interpretation of our findings.

  2. Digital audiovisual archives

    CERN Document Server

    Stockinger, Peter

    2013-01-01

    Today, huge quantities of digital audiovisual resources are already available - everywhere and at any time - through Web portals, online archives and libraries, and video blogs. One central question with respect to this huge amount of audiovisual data is how they can be used in specific (social, pedagogical, etc.) contexts and what are their potential interest for target groups (communities, professionals, students, researchers, etc.).This book examines the question of the (creative) exploitation of digital audiovisual archives from a theoretical, methodological, technical and practical

  3. Impact of second-language experience in infancy: brain measures of first- and second-language speech perception.

    Science.gov (United States)

    Conboy, Barbara T; Kuhl, Patricia K

    2011-03-01

    Language experience 'narrows' speech perception by the end of infants' first year, reducing discrimination of non-native phoneme contrasts while improving native-contrast discrimination. Previous research showed that declines in non-native discrimination were reversed by second-language experience provided at 9-10 months, but it is not known whether second-language experience affects first-language speech sound processing. Using event-related potentials (ERPs), we examined learning-related changes in brain activity to Spanish and English phoneme contrasts in monolingual English-learning infants pre- and post-exposure to Spanish from 9.5-10.5 months of age. Infants showed a significant discriminatory ERP response to the Spanish contrast at 11 months (post-exposure), but not at 9 months (pre-exposure). The English contrast elicited an earlier discriminatory response at 11 months than at 9 months, suggesting improvement in native-language processing. The results show that infants rapidly encode new phonetic information, and that improvement in native speech processing can occur during second-language learning in infancy.

  4. Music training improves speech-in-noise perception: Longitudinal evidence from a community-based music program.

    Science.gov (United States)

    Slater, Jessica; Skoe, Erika; Strait, Dana L; O'Connell, Samantha; Thompson, Elaine; Kraus, Nina

    2015-09-15

    Music training may strengthen auditory skills that help children not only in musical performance but in everyday communication. Comparisons of musicians and non-musicians across the lifespan have provided some evidence for a "musician advantage" in understanding speech in noise, although reports have been mixed. Controlled longitudinal studies are essential to disentangle effects of training from pre-existing differences, and to determine how much music training is necessary to confer benefits. We followed a cohort of elementary school children for 2 years, assessing their ability to perceive speech in noise before and after musical training. After the initial assessment, participants were randomly assigned to one of two groups: one group began music training right away and completed 2 years of training, while the second group waited a year and then received 1 year of music training. Outcomes provide the first longitudinal evidence that speech-in-noise perception improves after 2 years of group music training. The children were enrolled in an established and successful community-based music program and followed the standard curriculum, therefore these findings provide an important link between laboratory-based research and real-world assessment of the impact of music training on everyday communication skills.

  5. A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception.

    Science.gov (United States)

    Schädler, Marc René; Warzybok, Anna; Ewert, Stephan D; Kollmeier, Birger

    2016-05-01

    A framework for simulating auditory discrimination experiments, based on an approach from Schädler, Warzybok, Hochmuth, and Kollmeier [(2015). Int. J. Audiol. 54, 100-107] which was originally designed to predict speech recognition thresholds, is extended to also predict psychoacoustic thresholds. The proposed framework is used to assess the suitability of different auditory-inspired feature sets for a range of auditory discrimination experiments that included psychoacoustic as well as speech recognition experiments in noise. The considered experiments were 2 kHz tone-in-broadband-noise simultaneous masking depending on the tone length, spectral masking with simultaneously presented tone signals and narrow-band noise maskers, and German Matrix sentence test reception threshold in stationary and modulated noise. The employed feature sets included spectro-temporal Gabor filter bank features, Mel-frequency cepstral coefficients, logarithmically scaled Mel-spectrograms, and the internal representation of the Perception Model from Dau, Kollmeier, and Kohlrausch [(1997). J. Acoust. Soc. Am. 102(5), 2892-2905]. The proposed framework was successfully employed to simulate all experiments with a common parameter set and obtain objective thresholds with less assumptions compared to traditional modeling approaches. Depending on the feature set, the simulated reference-free thresholds were found to agree with-and hence to predict-empirical data from the literature. Across-frequency processing was found to be crucial to accurately model the lower speech reception threshold in modulated noise conditions than in stationary noise conditions. PMID:27250164

  6. Music training improves speech-in-noise perception: Longitudinal evidence from a community-based music program.

    Science.gov (United States)

    Slater, Jessica; Skoe, Erika; Strait, Dana L; O'Connell, Samantha; Thompson, Elaine; Kraus, Nina

    2015-09-15

    Music training may strengthen auditory skills that help children not only in musical performance but in everyday communication. Comparisons of musicians and non-musicians across the lifespan have provided some evidence for a "musician advantage" in understanding speech in noise, although reports have been mixed. Controlled longitudinal studies are essential to disentangle effects of training from pre-existing differences, and to determine how much music training is necessary to confer benefits. We followed a cohort of elementary school children for 2 years, assessing their ability to perceive speech in noise before and after musical training. After the initial assessment, participants were randomly assigned to one of two groups: one group began music training right away and completed 2 years of training, while the second group waited a year and then received 1 year of music training. Outcomes provide the first longitudinal evidence that speech-in-noise perception improves after 2 years of group music training. The children were enrolled in an established and successful community-based music program and followed the standard curriculum, therefore these findings provide an important link between laboratory-based research and real-world assessment of the impact of music training on everyday communication skills. PMID:26005127

  7. Speech perception and production of L2 oral reading%二语朗读的言语感知和输出

    Institute of Scientific and Technical Information of China (English)

    黎素薇

    2011-01-01

    Researches on the development of learners' phonological competence have been done mainly from the aspects of physical prosperities of phonology and interlanguage of L2 acquisition, ignoring the effect of speech perception and production on it. Based on theories of cognition and psychology, this paper attempts to explore the prosperities and pattern of L2 oral reading speech perception. It indicates that learner is the subject of L2 oral reading speech perception, which is constrained by speech organs, cognitive ability and pattern of L1 Speech perception. In Addiction, there exist differences between L1 and I2 phonology perception, psychology perception and concept perception. L2 oral reading is essentially a physical and cognitive experience, the construction basis for the empirically cognitive teaching model.%国内已有的二语朗读研究主要从音系的物理特性和二语习得中介语的角度来探讨学习者音系发展水平,却忽略了言语感知和输出对二语朗读发展水平的作用。研究表明,学习者是二语朗读的主体,二语朗读受到发青器官、认知水平和母语感知方式的制约;二语朗读在语音感知、情感感知和概念感知方面与母语者存在差别。二语朗读的本质是生理和认知的体验性,这一特性正是二语朗读听读说叠加教学模式构建的基础。

  8. On the perception of speech in primary school classrooms: ranking of noise interference and of age influence.

    Science.gov (United States)

    Prodi, Nicola; Visentin, Chiara; Feletti, Alice

    2013-01-01

    It is well documented that the interference of noise in the classroom puts younger pupils at a disadvantage for speech perception tasks. Nevertheless, the dependence of this phenomenon on the type of noise, and the way it is realized for each class by a specific combination of intelligibility and effort have not been fully investigated. Following on a previous laboratory study on "listening efficiency," which stems from a combination of accuracy and latency measures, this work tackles the problems above to better understand the basic mechanisms governing the speech perception performance of pupils in noisy classrooms. Listening tests were conducted in real classrooms for a relevant number of students, and tests in quiet were also developed. The statistical analysis is based on stochastic ordering and is able to clarify the behavior of the classes and the different impacts of noises on performance. It is found that the joint babble and activity noise has the worst effect on performance whereas tapping and external traffic noises are less disruptive. PMID:23297900

  9. The development of multisensory speech perception continues into the late childhood years

    OpenAIRE

    Ross, Lars A.; Molholm, Sophie; Blanco, Daniella; Gomez-Ramirez, Manuel; Saint-Amour, Dave; Foxe, John J.

    2011-01-01

    Observing a speaker’s articulations substantially improves intelligibility of spoken speech, especially under noisy listening conditions. This multisensory integration of speech inputs is crucial to effective communication. Appropriate development of this ability has major implications for children in classroom and social settings, and deficits in it have been linked to a number of neurodevelopmental disorders, especially autism. It is clear from structural imaging studies that there is a pro...

  10. The perception of speech modulation cues in lexical tones is guided by early language-specific experience

    Directory of Open Access Journals (Sweden)

    Laurianne eCabrera

    2015-08-01

    Full Text Available A number of studies showed that infants reorganize their perception of speech sounds according to their native language categories during their first year of life. Still, information is lacking about the contribution of basic auditory mechanisms to this process. This study aimed to evaluate when native language experience starts to noticeably affect the perceptual processing of basic acoustic cues (i.e., frequency-modulation (FM and amplitude-modulation (AM information known to be crucial for speech perception in adults. The discrimination of a lexical-tone contrast (rising versus low was assessed in 6- and 10-month-old infants learning either French or Mandarin using a visual habituation paradigm. The lexical tones were presented in two conditions designed to either keep intact or to severely degrade the FM and fine spectral cues needed to accurately perceive voice-pitch trajectory. A third condition was designed to assess the discrimination of the same voice-pitch trajectories using click trains containing only the FM cues related to the fundamental-frequency (F0 in French- and Mandarin-learning 10-month-old infants. Results showed that the younger infants of both language groups and the Mandarin-learning 10-month-olds discriminated the intact lexical-tone contrast while French-learning 10-month-olds failed. However, only the French 10-month-olds discriminated degraded lexical tones when FM, and thus voice-pitch cues were reduced. Moreover, Mandarin-learning 10-month-olds were found to discriminate the pitch trajectories as presented in click trains better than French infants. Altogether, these results reveal that the perceptual reorganization occurring during the first year of life for lexical tones is coupled with changes in the auditory ability to use speech modulation cues.

  11. Sequencing at the syllabic and supra-syllabic levels during speech perception: an fMRI study

    Directory of Open Access Journals (Sweden)

    Isabelle eDeschamps

    2014-07-01

    Full Text Available The processing of fluent speech involves complex computational steps that begin with the segmentation of the continuous flow of speech sounds into syllables and words. One question that naturally arises pertains to the type of syllabic information that speech processes act upon. Here, we used functional magnetic resonance imaging to profile regions, using a combination of whole-brain and exploratory anatomical region-of-interest (ROI approaches, that were sensitive to syllabic information during speech perception by parametrically manipulating syllabic complexity along two dimensions: (1 individual syllable complexity, and (2 sequence complexity (supra-syllabic. We manipulated the complexity of the syllable by using the simplest syllable template—a consonant and vowel (CV-and inserting an additional consonant to create a complex onset (CCV. The supra-syllabic complexity was manipulated by creating sequences composed of the same syllable repeated 6 times (e.g. /pa-pa-pa-pa-pa-pa/ and sequences of 3 different syllables each repeated twice (e.g. /pa-ta-ka-pa-ta-ka/. This parametrical design allowed us to identify brain regions sensitive to (1 syllabic complexity independent of supra-syllabic complexity, (2 supra-syllabic complexity independent of syllabic complexity and, (3 both syllabic and supra-syllabic complexity. High-resolution scans were acquired for 15 healthy adults. An exploratory anatomical ROI analysis of the supratemporal plane (STP identified bilateral regions within the anterior two-third of the planum temporale, the primary auditory cortices as well as the anterior two-third of the superior temporal gyrus that showed different patterns of sensitivity to syllabic and supra-syllabic information. These findings demonstrate that during passive listening of syllable sequences, sublexical information is processed automatically, and sensitivity to syllabic and supra-syllabic information is localized almost exclusively within the STP.

  12. Sequencing at the syllabic and supra-syllabic levels during speech perception: an fMRI study.

    Science.gov (United States)

    Deschamps, Isabelle; Tremblay, Pascale

    2014-01-01

    The processing of fluent speech involves complex computational steps that begin with the segmentation of the continuous flow of speech sounds into syllables and words. One question that naturally arises pertains to the type of syllabic information that speech processes act upon. Here, we used functional magnetic resonance imaging to profile regions, using a combination of whole-brain and exploratory anatomical region-of-interest (ROI) approaches, that were sensitive to syllabic information during speech perception by parametrically manipulating syllabic complexity along two dimensions: (1) individual syllable complexity, and (2) sequence complexity (supra-syllabic). We manipulated the complexity of the syllable by using the simplest syllable template-a consonant and vowel (CV)-and inserting an additional consonant to create a complex onset (CCV). The supra-syllabic complexity was manipulated by creating sequences composed of the same syllable repeated six times (e.g., /pa-pa-pa-pa-pa-pa/) and sequences of three different syllables each repeated twice (e.g., /pa-ta-ka-pa-ta-ka/). This parametrical design allowed us to identify brain regions sensitive to (1) syllabic complexity independent of supra-syllabic complexity, (2) supra-syllabic complexity independent of syllabic complexity and, (3) both syllabic and supra-syllabic complexity. High-resolution scans were acquired for 15 healthy adults. An exploratory anatomical ROI analysis of the supratemporal plane (STP) identified bilateral regions within the anterior two-third of the planum temporale, the primary auditory cortices as well as the anterior two-third of the superior temporal gyrus that showed different patterns of sensitivity to syllabic and supra-syllabic information. These findings demonstrate that during passive listening of syllable sequences, sublexical information is processed automatically, and sensitivity to syllabic and supra-syllabic information is localized almost exclusively within the STP.

  13. Temporal Fine-Structure Coding and Lateralized Speech Perception in Normal-Hearing and Hearing-Impaired Listeners

    Science.gov (United States)

    Pedersen, Julie H.; Laugesen, Søren; Santurette, Sébastien; Dau, Torsten; MacDonald, Ewen N.

    2016-01-01

    This study investigated the relationship between speech perception performance in spatially complex, lateralized listening scenarios and temporal fine-structure (TFS) coding at low frequencies. Young normal-hearing (NH) and two groups of elderly hearing-impaired (HI) listeners with mild or moderate hearing loss above 1.5 kHz participated in the study. Speech reception thresholds (SRTs) were estimated in the presence of either speech-shaped noise, two-, four-, or eight-talker babble played reversed, or a nonreversed two-talker masker. Target audibility was ensured by applying individualized linear gains to the stimuli, which were presented over headphones. The target and masker streams were lateralized to the same or to opposite sides of the head by introducing 0.7-ms interaural time differences between the ears. TFS coding was assessed by measuring frequency discrimination thresholds and interaural phase difference thresholds at 250 Hz. NH listeners had clearly better SRTs than the HI listeners. However, when maskers were spatially separated from the target, the amount of SRT benefit due to binaural unmasking differed only slightly between the groups. Neither the frequency discrimination threshold nor the interaural phase difference threshold tasks showed a correlation with the SRTs or with the amount of masking release due to binaural unmasking, respectively. The results suggest that, although HI listeners with normal hearing thresholds below 1.5 kHz experienced difficulties with speech understanding in spatially complex environments, these limitations were unrelated to TFS coding abilities and were only weakly associated with a reduction in binaural-unmasking benefit for spatially separated competing sources. PMID:27601071

  14. Positron Emission Tomography Imaging Reveals Auditory and Frontal Cortical Regions Involved with Speech Perception and Loudness Adaptation.

    Directory of Open Access Journals (Sweden)

    Georg Berding

    Full Text Available Considerable progress has been made in the treatment of hearing loss with auditory implants. However, there are still many implanted patients that experience hearing deficiencies, such as limited speech understanding or vanishing perception with continuous stimulation (i.e., abnormal loudness adaptation. The present study aims to identify specific patterns of cerebral cortex activity involved with such deficiencies. We performed O-15-water positron emission tomography (PET in patients implanted with electrodes within the cochlea, brainstem, or midbrain to investigate the pattern of cortical activation in response to speech or continuous multi-tone stimuli directly inputted into the implant processor that then delivered electrical patterns through those electrodes. Statistical parametric mapping was performed on a single subject basis. Better speech understanding was correlated with a larger extent of bilateral auditory cortex activation. In contrast to speech, the continuous multi-tone stimulus elicited mainly unilateral auditory cortical activity in which greater loudness adaptation corresponded to weaker activation and even deactivation. Interestingly, greater loudness adaptation was correlated with stronger activity within the ventral prefrontal cortex, which could be up-regulated to suppress the irrelevant or aberrant signals into the auditory cortex. The ability to detect these specific cortical patterns and differences across patients and stimuli demonstrates the potential for using PET to diagnose auditory function or dysfunction in implant patients, which in turn could guide the development of appropriate stimulation strategies for improving hearing rehabilitation. Beyond hearing restoration, our study also reveals a potential role of the frontal cortex in suppressing irrelevant or aberrant activity within the auditory cortex, and thus may be relevant for understanding and treating tinnitus.

  15. Positron Emission Tomography Imaging Reveals Auditory and Frontal Cortical Regions Involved with Speech Perception and Loudness Adaptation.

    Science.gov (United States)

    Berding, Georg; Wilke, Florian; Rode, Thilo; Haense, Cathleen; Joseph, Gert; Meyer, Geerd J; Mamach, Martin; Lenarz, Minoo; Geworski, Lilli; Bengel, Frank M; Lenarz, Thomas; Lim, Hubert H

    2015-01-01

    Considerable progress has been made in the treatment of hearing loss with auditory implants. However, there are still many implanted patients that experience hearing deficiencies, such as limited speech understanding or vanishing perception with continuous stimulation (i.e., abnormal loudness adaptation). The present study aims to identify specific patterns of cerebral cortex activity involved with such deficiencies. We performed O-15-water positron emission tomography (PET) in patients implanted with electrodes within the cochlea, brainstem, or midbrain to investigate the pattern of cortical activation in response to speech or continuous multi-tone stimuli directly inputted into the implant processor that then delivered electrical patterns through those electrodes. Statistical parametric mapping was performed on a single subject basis. Better speech understanding was correlated with a larger extent of bilateral auditory cortex activation. In contrast to speech, the continuous multi-tone stimulus elicited mainly unilateral auditory cortical activity in which greater loudness adaptation corresponded to weaker activation and even deactivation. Interestingly, greater loudness adaptation was correlated with stronger activity within the ventral prefrontal cortex, which could be up-regulated to suppress the irrelevant or aberrant signals into the auditory cortex. The ability to detect these specific cortical patterns and differences across patients and stimuli demonstrates the potential for using PET to diagnose auditory function or dysfunction in implant patients, which in turn could guide the development of appropriate stimulation strategies for improving hearing rehabilitation. Beyond hearing restoration, our study also reveals a potential role of the frontal cortex in suppressing irrelevant or aberrant activity within the auditory cortex, and thus may be relevant for understanding and treating tinnitus. PMID:26046763

  16. Temporal Fine-Structure Coding and Lateralized Speech Perception in Normal-Hearing and Hearing-Impaired Listeners.

    Science.gov (United States)

    Lőcsei, Gusztáv; Pedersen, Julie H; Laugesen, Søren; Santurette, Sébastien; Dau, Torsten; MacDonald, Ewen N

    2016-01-01

    This study investigated the relationship between speech perception performance in spatially complex, lateralized listening scenarios and temporal fine-structure (TFS) coding at low frequencies. Young normal-hearing (NH) and two groups of elderly hearing-impaired (HI) listeners with mild or moderate hearing loss above 1.5 kHz participated in the study. Speech reception thresholds (SRTs) were estimated in the presence of either speech-shaped noise, two-, four-, or eight-talker babble played reversed, or a nonreversed two-talker masker. Target audibility was ensured by applying individualized linear gains to the stimuli, which were presented over headphones. The target and masker streams were lateralized to the same or to opposite sides of the head by introducing 0.7-ms interaural time differences between the ears. TFS coding was assessed by measuring frequency discrimination thresholds and interaural phase difference thresholds at 250 Hz. NH listeners had clearly better SRTs than the HI listeners. However, when maskers were spatially separated from the target, the amount of SRT benefit due to binaural unmasking differed only slightly between the groups. Neither the frequency discrimination threshold nor the interaural phase difference threshold tasks showed a correlation with the SRTs or with the amount of masking release due to binaural unmasking, respectively. The results suggest that, although HI listeners with normal hearing thresholds below 1.5 kHz experienced difficulties with speech understanding in spatially complex environments, these limitations were unrelated to TFS coding abilities and were only weakly associated with a reduction in binaural-unmasking benefit for spatially separated competing sources. PMID:27601071

  17. Media and journalism as forms of knowledge: a methodology for critical reading of journalistic audiovisual narratives

    OpenAIRE

    Beatriz Becker

    2012-01-01

    The work presents a methodology for the analysis of journalistic audiovisual narratives, and instrument of critical reading of news contents and formats which utilize audiovisual language and multimedia resources on TV and on the web. It is assumed that the comprehension of the dynamic combinations of the elements which constitute the audiovisual text contributes to a better perception of the meanings of the news, and that uses of the digital tools in a critical and creative way can collabora...

  18. Superior temporal activation in response to dynamic audio-visual emotional cues

    OpenAIRE

    Robins, Diana L.; Hunyadi, Elinora; Schultz, Robert T.

    2008-01-01

    Perception of emotion is critical for successful social interaction, yet the neural mechanisms underlying the perception of dynamic, audiovisual emotional cues are poorly understood. Evidence from language and sensory paradigms suggests that the superior temporal sulcus and gyrus (STS/STG) play a key role in the integration of auditory and visual cues. Emotion perception research has focused on static facial cues; however, dynamic audiovisual (AV) cues mimic real-world social cues more accura...

  19. The Effect of Short-Term Auditory Training on Speech in Noise Perception and Cortical Auditory Evoked Potentials in Adults with Cochlear Implants.

    Science.gov (United States)

    Barlow, Nathan; Purdy, Suzanne C; Sharma, Mridula; Giles, Ellen; Narne, Vijay

    2016-02-01

    This study investigated whether a short intensive psychophysical auditory training program is associated with speech perception benefits and changes in cortical auditory evoked potentials (CAEPs) in adult cochlear implant (CI) users. Ten adult implant recipients trained approximately 7 hours on psychophysical tasks (Gap-in-Noise Detection, Frequency Discrimination, Spectral Rippled Noise [SRN], Iterated Rippled Noise, Temporal Modulation). Speech performance was assessed before and after training using Lexical Neighborhood Test (LNT) words in quiet and in eight-speaker babble. CAEPs evoked by a natural speech stimulus /baba/ with varying syllable stress were assessed pre- and post-training, in quiet and in noise. SRN psychophysical thresholds showed a significant improvement (78% on average) over the training period, but performance on other psychophysical tasks did not change. LNT scores in noise improved significantly post-training by 11% on average compared with three pretraining baseline measures. N1P2 amplitude changed post-training for /baba/ in quiet (p = 0.005, visit 3 pretraining versus visit 4 post-training). CAEP changes did not correlate with behavioral measures. CI recipients' clinical records indicated a plateau in speech perception performance prior to participation in the study. A short period of intensive psychophysical training produced small but significant gains in speech perception in noise and spectral discrimination ability. There remain questions about the most appropriate type of training and the duration or dosage of training that provides the most robust outcomes for adults with CIs. PMID:27587925

  20. Speech perception of sine-wave signals by children with cochlear implants.

    Science.gov (United States)

    Nittrouer, Susan; Kuess, Jamie; Lowenstein, Joanna H

    2015-05-01

    Children need to discover linguistically meaningful structures in the acoustic speech signal. Being attentive to recurring, time-varying formant patterns helps in that process. However, that kind of acoustic structure may not be available to children with cochlear implants (CIs), thus hindering development. The major goal of this study was to examine whether children with CIs are as sensitive to time-varying formant structure as children with normal hearing (NH) by asking them to recognize sine-wave speech. The same materials were presented as speech in noise, as well, to evaluate whether any group differences might simply reflect general perceptual deficits on the part of children with CIs. Vocabulary knowledge, phonemic awareness, and "top-down" language effects were all also assessed. Finally, treatment factors were examined as possible predictors of outcomes. Results showed that children with CIs were as accurate as children with NH at recognizing sine-wave speech, but poorer at recognizing speech in noise. Phonemic awareness was related to that recognition. Top-down effects were similar across groups. Having had a period of bimodal stimulation near the time of receiving a first CI facilitated these effects. Results suggest that children with CIs have access to the important time-varying structure of vocal-tract formants. PMID:25994709

  1. Tone classification of syllable-segmented Thai speech based on multilayer perception

    Science.gov (United States)

    Satravaha, Nuttavudh; Klinkhachorn, Powsiri; Lass, Norman

    2002-05-01

    Thai is a monosyllabic tonal language that uses tone to convey lexical information about the meaning of a syllable. Thus to completely recognize a spoken Thai syllable, a speech recognition system not only has to recognize a base syllable but also must correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system. Thai has five distinctive tones (``mid,'' ``low,'' ``falling,'' ``high,'' and ``rising'') and each tone is represented by a single fundamental frequency (F0) pattern. However, several factors, including tonal coarticulation, stress, intonation, and speaker variability, affect the F0 pattern of a syllable in continuous Thai speech. In this study, an efficient method for tone classification of syllable-segmented Thai speech, which incorporates the effects of tonal coarticulation, stress, and intonation, as well as a method to perform automatic syllable segmentation, were developed. Acoustic parameters were used as the main discriminating parameters. The F0 contour of a segmented syllable was normalized by using a z-score transformation before being presented to a tone classifier. The proposed system was evaluated on 920 test utterances spoken by 8 speakers. A recognition rate of 91.36% was achieved by the proposed system.

  2. Joint evaluation of communication quality and user experience in an audio-visual virtual reality meeting

    DEFF Research Database (Denmark)

    Møller, Anders Kalsgaard; Hoffmann, Pablo F.; Carrozzino, Marcello;

    2013-01-01

    The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...

  3. The Acquisitional Value of Recasts in Instructed Second Language Speech Learning: Teaching the Perception and Production of English /?/ to Adult Japanese Learners

    Science.gov (United States)

    Saito, Kazuya

    2013-01-01

    The current study investigated the impact of recasts together with form-focused instruction (FFI) on the development of second language speech perception and production of English /?/ by Japanese learners. Forty-five learners were randomly assigned to three groups--FFI recasts, FFI only, and Control--and exposed to four hours of communicatively…

  4. Perception of Filtered Speech by Children with Developmental Dyslexia and Children with Specific Language Impairment

    Directory of Open Access Journals (Sweden)

    Usha eGoswami

    2016-05-01

    Full Text Available Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz versus faster (~33 Hz temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1 or speech and language impairments (SLIs, Experiment 2 to groups of typically-developing (TD children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (< 4 Hz or band-pass filtered (22 – 40 Hz. Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral speech and language impairments (SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI sample were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognising both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed.

  5. The unity assumption facilitates cross-modal binding of musical, non-speech stimuli: The role of spectral and amplitude envelope cues.

    Science.gov (United States)

    Chuen, Lorraine; Schutz, Michael

    2016-07-01

    An observer's inference that multimodal signals originate from a common underlying source facilitates cross-modal binding. This 'unity assumption' causes asynchronous auditory and visual speech streams to seem simultaneous (Vatakis & Spence, Perception & Psychophysics, 69(5), 744-756, 2007). Subsequent tests of non-speech stimuli such as musical and impact events found no evidence for the unity assumption, suggesting the effect is speech-specific (Vatakis & Spence, Acta Psychologica, 127(1), 12-23, 2008). However, the role of amplitude envelope (the changes in energy of a sound over time) was not previously appreciated within this paradigm. Here, we explore whether previous findings suggesting speech-specificity of the unity assumption were confounded by similarities in the amplitude envelopes of the contrasted auditory stimuli. Experiment 1 used natural events with clearly differentiated envelopes: single notes played on either a cello (bowing motion) or marimba (striking motion). Participants performed an un-speeded temporal order judgments task; viewing audio-visually matched (e.g., marimba auditory with marimba video) and mismatched (e.g., cello auditory with marimba video) versions of stimuli at various stimulus onset asynchronies, and were required to indicate which modality was presented first. As predicted, participants were less sensitive to temporal order in matched conditions, demonstrating that the unity assumption can facilitate the perception of synchrony outside of speech stimuli. Results from Experiments 2 and 3 revealed that when spectral information was removed from the original auditory stimuli, amplitude envelope alone could not facilitate the influence of audiovisual unity. We propose that both amplitude envelope and spectral acoustic cues affect the percept of audiovisual unity, working in concert to help an observer determine when to integrate across modalities.

  6. The unity assumption facilitates cross-modal binding of musical, non-speech stimuli: The role of spectral and amplitude envelope cues.

    Science.gov (United States)

    Chuen, Lorraine; Schutz, Michael

    2016-07-01

    An observer's inference that multimodal signals originate from a common underlying source facilitates cross-modal binding. This 'unity assumption' causes asynchronous auditory and visual speech streams to seem simultaneous (Vatakis & Spence, Perception & Psychophysics, 69(5), 744-756, 2007). Subsequent tests of non-speech stimuli such as musical and impact events found no evidence for the unity assumption, suggesting the effect is speech-specific (Vatakis & Spence, Acta Psychologica, 127(1), 12-23, 2008). However, the role of amplitude envelope (the changes in energy of a sound over time) was not previously appreciated within this paradigm. Here, we explore whether previous findings suggesting speech-specificity of the unity assumption were confounded by similarities in the amplitude envelopes of the contrasted auditory stimuli. Experiment 1 used natural events with clearly differentiated envelopes: single notes played on either a cello (bowing motion) or marimba (striking motion). Participants performed an un-speeded temporal order judgments task; viewing audio-visually matched (e.g., marimba auditory with marimba video) and mismatched (e.g., cello auditory with marimba video) versions of stimuli at various stimulus onset asynchronies, and were required to indicate which modality was presented first. As predicted, participants were less sensitive to temporal order in matched conditions, demonstrating that the unity assumption can facilitate the perception of synchrony outside of speech stimuli. Results from Experiments 2 and 3 revealed that when spectral information was removed from the original auditory stimuli, amplitude envelope alone could not facilitate the influence of audiovisual unity. We propose that both amplitude envelope and spectral acoustic cues affect the percept of audiovisual unity, working in concert to help an observer determine when to integrate across modalities. PMID:27084701

  7. Top-Down Modulation on the Perception and Categorization of Identical Pitch Contours in Speech and Music.

    Science.gov (United States)

    Weidema, Joey L; Roncaglia-Denissen, M P; Honing, Henkjan

    2016-01-01

    Whether pitch in language and music is governed by domain-specific or domain-general cognitive mechanisms is contentiously debated. The aim of the present study was to investigate whether mechanisms governing pitch contour perception operate differently when pitch information is interpreted as either speech or music. By modulating listening mode, this study aspired to demonstrate that pitch contour perception relies on domain-specific cognitive mechanisms, which are regulated by top-down influences from language and music. Three groups of participants (Mandarin speakers, Dutch speaking non-musicians, and Dutch musicians) were exposed to identical pitch contours, and tested on their ability to identify these contours in a language and musical context. Stimuli consisted of disyllabic words spoken in Mandarin, and melodic tonal analogs, embedded in a linguistic and melodic carrier phrase, respectively. Participants classified identical pitch contours as significantly different depending on listening mode. Top-down influences from language appeared to alter the perception of pitch contour in speakers of Mandarin. This was not the case for non-musician speakers of Dutch. Moreover, this effect was lacking in Dutch speaking musicians. The classification patterns of pitch contours in language and music seem to suggest that domain-specific categorization is modulated by top-down influences from language and music.

  8. Top–Down Modulation on the Perception and Categorization of Identical Pitch Contours in Speech and Music

    Science.gov (United States)

    Weidema, Joey L.; Roncaglia-Denissen, M. P.; Honing, Henkjan

    2016-01-01

    Whether pitch in language and music is governed by domain-specific or domain-general cognitive mechanisms is contentiously debated. The aim of the present study was to investigate whether mechanisms governing pitch contour perception operate differently when pitch information is interpreted as either speech or music. By modulating listening mode, this study aspired to demonstrate that pitch contour perception relies on domain-specific cognitive mechanisms, which are regulated by top–down influences from language and music. Three groups of participants (Mandarin speakers, Dutch speaking non-musicians, and Dutch musicians) were exposed to identical pitch contours, and tested on their ability to identify these contours in a language and musical context. Stimuli consisted of disyllabic words spoken in Mandarin, and melodic tonal analogs, embedded in a linguistic and melodic carrier phrase, respectively. Participants classified identical pitch contours as significantly different depending on listening mode. Top–down influences from language appeared to alter the perception of pitch contour in speakers of Mandarin. This was not the case for non-musician speakers of Dutch. Moreover, this effect was lacking in Dutch speaking musicians. The classification patterns of pitch contours in language and music seem to suggest that domain-specific categorization is modulated by top–down influences from language and music. PMID:27313552

  9. Top-Down Modulation on the Perception and Categorization of Identical Pitch Contours in Speech and Music

    Directory of Open Access Journals (Sweden)

    Joey L. Weidema

    2016-06-01

    Full Text Available Whether pitch in language and music is governed by domain-specific or domain-general cognitive mechanisms is contentiously debated. The aim of the present study was to investigate whether mechanisms governing pitch contour perception operate differently when pitch information is interpreted as either speech or music. By modulating listening mode, this study aspired to demonstrate that pitch contour perception relies on domain-specific cognitive mechanisms, which are regulated by top-down influences from language and music. Three groups of participants (Mandarin speakers, Dutch speaking non-musicians, and Dutch musicians were exposed to identical pitch contours, and tested on their ability to identify these contours in a language and musical context. Stimuli consisted of disyllabic words spoken in Mandarin, and melodic tonal analogues, embedded in a linguistic and melodic carrier phrase, respectively. Participants classified identical pitch contours as significantly different depending on listening mode. Top-down influences from language appeared to alter the perception of pitch contour in speakers of Mandarin. This was not the case for non-musician speakers of Dutch. Moreover, this effect was lacking in Dutch speaking musicians. The classification patterns of pitch contours in language and music seem to suggest that domain-specific categorization is modulated by top-down influences from language and music.

  10. Top-Down Modulation on the Perception and Categorization of Identical Pitch Contours in Speech and Music.

    Science.gov (United States)

    Weidema, Joey L; Roncaglia-Denissen, M P; Honing, Henkjan

    2016-01-01

    Whether pitch in language and music is governed by domain-specific or domain-general cognitive mechanisms is contentiously debated. The aim of the present study was to investigate whether mechanisms governing pitch contour perception operate differently when pitch information is interpreted as either speech or music. By modulating listening mode, this study aspired to demonstrate that pitch contour perception relies on domain-specific cognitive mechanisms, which are regulated by top-down influences from language and music. Three groups of participants (Mandarin speakers, Dutch speaking non-musicians, and Dutch musicians) were exposed to identical pitch contours, and tested on their ability to identify these contours in a language and musical context. Stimuli consisted of disyllabic words spoken in Mandarin, and melodic tonal analogs, embedded in a linguistic and melodic carrier phrase, respectively. Participants classified identical pitch contours as significantly different depending on listening mode. Top-down influences from language appeared to alter the perception of pitch contour in speakers of Mandarin. This was not the case for non-musician speakers of Dutch. Moreover, this effect was lacking in Dutch speaking musicians. The classification patterns of pitch contours in language and music seem to suggest that domain-specific categorization is modulated by top-down influences from language and music. PMID:27313552

  11. Read My Lips: Brain Dynamics Associated with Audiovisual Integration and Deviance Detection.

    Science.gov (United States)

    Tse, Chun-Yu; Gratton, Gabriele; Garnsey, Susan M; Novak, Michael A; Fabiani, Monica

    2015-09-01

    Information from different modalities is initially processed in different brain areas, yet real-world perception often requires the integration of multisensory signals into a single percept. An example is the McGurk effect, in which people viewing a speaker whose lip movements do not match the utterance perceive the spoken sounds incorrectly, hearing them as more similar to those signaled by the visual rather than the auditory input. This indicates that audiovisual integration is important for generating the phoneme percept. Here we asked when and where the audiovisual integration process occurs, providing spatial and temporal boundaries for the processes generating phoneme perception. Specifically, we wanted to separate audiovisual integration from other processes, such as simple deviance detection. Building on previous work employing ERPs, we used an oddball paradigm in which task-irrelevant audiovisually deviant stimuli were embedded in strings of non-deviant stimuli. We also recorded the event-related optical signal, an imaging method combining spatial and temporal resolution, to investigate the time course and neuroanatomical substrate of audiovisual integration. We found that audiovisual deviants elicit a short duration response in the middle/superior temporal gyrus, whereas audiovisual integration elicits a more extended response involving also inferior frontal and occipital regions. Interactions between audiovisual integration and deviance detection processes were observed in the posterior/superior temporal gyrus. These data suggest that dynamic interactions between inferior frontal cortex and sensory regions play a significant role in multimodal integration.

  12. Influences de l'écrit sur la perception auditive : le cas de locuteurs hindiphones apprenant le français

    OpenAIRE

    Chadee, Tania

    2013-01-01

    It is commonly admitted today that speech perception is more performing in an audiovisual context than in a visual one (Benoît, Mohamadi and Kandel, 1994, Schwartz, Berthommier and Savariaux, 2004). Visual information in this situation often consists of the speaker’s articulatory and facial gestures provided by the face-to-face interaction. However, when learning a foreign language, another type of visual help is generally available to identify oral forms: their written forms. And yet, in the...

  13. Perception of Foreign Accent Syndrome Speech and Its Relation to Segmental Characteristics

    Science.gov (United States)

    Dankovicova, Jana; Hunt, Claire

    2011-01-01

    Foreign accent syndrome (FAS) is an acquired neurogenic disorder characterized by altered speech that sounds foreign-accented. This study presents a British subject perceived to speak with an Italian (or Greek) accent after a brainstem (pontine) stroke. Native English listeners rated the strength of foreign accent and impairment they perceived in…

  14. Native Speakers' Perceptions of Fluency and Accent in L2 Speech

    Science.gov (United States)

    Pinget, Anne-France; Bosker, Hans Rutger; Quené, Hugo; de Jong, Nivja H.

    2014-01-01

    Oral fluency and foreign accent distinguish L2 from L1 speech production. In language testing practices, both fluency and accent are usually assessed by raters. This study investigates what exactly native raters of fluency and accent take into account when judging L2. Our aim is to explore the relationship between objectively measured temporal,…

  15. Listening with an Accent: Speech Perception in a Second Language by Late Bilinguals

    Science.gov (United States)

    Leikin, Mark; Ibrahim, Raphiq; Eviatar, Zohar; Sapir, Shimon

    2009-01-01

    The goal of the present study was to examine functioning of late bilinguals in their second language. Specifically, we asked how native and non-native Hebrew speaking listeners perceive accented and native-accented Hebrew speech. To achieve this goal we used the gating paradigm to explore the ability of healthy late fluent bilinguals (Russian and…

  16. Compensation for Complete Assimilation in Speech Perception: The Case of Korean Labial-to-Velar Assimilation

    Science.gov (United States)

    Mitterer, Holger; Kim, Sahyang; Cho, Taehong

    2013-01-01

    In connected speech, phonological assimilation to neighboring words can lead to pronunciation variants (e.g., "garden bench" [arrow right] "garde'm' bench"). A large body of literature suggests that listeners use the phonetic context to reconstruct the intended word for assimilation types that often lead to incomplete assimilations (e.g., a…

  17. Familiarity Breeds Support: Speech-Language Pathologists' Perceptions of Bullying of Students with Autism Spectrum Disorders

    Science.gov (United States)

    Blood, Gordon W.; Blood, Ingrid M.; Coniglio, Amy D.; Finke, Erinn H.; Boyle, Michael P.

    2013-01-01

    Children with autism spectrum disorders (ASD) are primary targets for bullies and victimization. Research shows school personnel may be uneducated about bullying and ways to intervene. Speech-language pathologists (SLPs) in schools often work with children with ASD and may have victims of bullying on their caseloads. These victims may feel most…

  18. School hearing health actions in the municipality of SobralCE: perception of speech therapists

    Directory of Open Access Journals (Sweden)

    Rafaela Bezerra Façanha Correia

    2012-06-01

    Full Text Available Objective: To evaluate the school hearing health actions developed in the Listen Sobral Project. Methods: Qualitative study, conducted at the Department of Hearing Health Care (SASA of the city of Sobral - CE, Brazil, from April to June, 2010. Study participants were the Listen Sobral Project’s coordinator and four speech therapists attending Multidisciplinary Residency in Family Health, working in partnership with the project. Data collection was performed through semi-structured interviews, adopting the technique of content analysis according to the convergence of speech, in which the categories emerged: school hearing health actions; benefits from the actions; difficulties in developing the actions; and changes for improvement in the actions. Results: According to discourse of speech therapists, one realizes that school hearing health actions are developed centered on health promotion, prevention and early identification of hearing loss. However, weak points were identified, especially regarding the teacher training; partnership between school and speech therapists; ear, nose and throat care; and suitable facilities. Conclusion: School hearing health actions have become part of reality in the city of Sobral, although not yet fully at the present time. It is therefore necessary to maintain these actions, but with some changes toward the elaboration of a more organized structure, in order to promote care of superior quality for school children.

  19. Cognitive Compensation of Speech Perception With Hearing Impairment, Cochlear Implants, and Aging

    Science.gov (United States)

    Clarke, Jeanne; Pals, Carina; Benard, Michel R.; Bhargava, Pranesh; Saija, Jefta; Sarampalis, Anastasios; Wagner, Anita; Gaudrain, Etienne

    2016-01-01

    External degradations in incoming speech reduce understanding, and hearing impairment further compounds the problem. While cognitive mechanisms alleviate some of the difficulties, their effectiveness may change with age. In our research, reviewed here, we investigated cognitive compensation with hearing impairment, cochlear implants, and aging, via (a) phonemic restoration as a measure of top-down filling of missing speech, (b) listening effort and response times as a measure of increased cognitive processing, and (c) visual world paradigm and eye gazing as a measure of the use of context and its time course. Our results indicate that between speech degradations and their cognitive compensation, there is a fine balance that seems to vary greatly across individuals. Hearing impairment or inadequate hearing device settings may limit compensation benefits. Cochlear implants seem to allow the effective use of sentential context, but likely at the cost of delayed processing. Linguistic and lexical knowledge, which play an important role in compensation, may be successfully employed in advanced age, as some compensatory mechanisms seem to be preserved. These findings indicate that cognitive compensation in hearing impairment can be highly complicated—not always absent, but also not easily predicted by speech intelligibility tests only.

  20. Cortical oscillations in auditory perception and speech: evidence for two temporal windows in human auditory cortex

    Directory of Open Access Journals (Sweden)

    Huan eLuo

    2012-05-01

    Full Text Available Natural sounds, including vocal communication sounds, contain critical information at multiple time scales. Two essential temporal modulation rates in speech have been argued to be in the low gamma band (~20-80 ms duration information and the theta band (~150-300 ms, corresponding to segmental and syllabic modulation rates, respectively. On one hypothesis, auditory cortex implements temporal integration using time constants closely related to these values. The neural correlates of a proposed dual temporal window mechanism in human auditory cortex remain poorly understood. We recorded MEG responses from participants listening to non-speech auditory stimuli with different temporal structures, created by concatenating frequency-modulated segments of varied segment durations. We show that these non-speech stimuli with temporal structure matching speech-relevant scales (~25 ms and ~200 ms elicit reliable phase tracking in the corresponding associated oscillatory frequencies (low gamma and theta bands. In contrast, stimuli with non-matching temporal structure do not. Furthermore, the topography of theta band phase tracking shows rightward lateralization while gamma band phase tracking occurs bilaterally. The results support the hypothesis that there exists multi-time resolution processing in cortex on discontinuous scales and provide evidence for an asymmetric organization of temporal analysis (asymmetrical sampling in time, AST. The data argue for a macroscopic-level neural mechanism underlying multi-time resolution processing: the sliding and resetting of intrinsic temporal windows on privileged time scales.

  1. Speech Perception and Production by Sequential Bilingual Children: A Longitudinal Study of Voice Onset Time Acquisition

    Science.gov (United States)

    McCarthy, Kathleen M.; Mahon, Merle; Rosen, Stuart; Evans, Bronwen G.

    2014-01-01

    The majority of bilingual speech research has focused on simultaneous bilinguals. Yet, in immigrant communities, children are often initially exposed to their family language (L1), before becoming gradually immersed in the host country's language (L2). This is typically referred to as sequential bilingualism. Using a longitudinal design, this…

  2. Perception of Filtered Speech by Children with Developmental Dyslexia and Children with Specific Language Impairments.

    Science.gov (United States)

    Goswami, Usha; Cumming, Ruth; Chait, Maria; Huss, Martina; Mead, Natasha; Wilson, Angela M; Barnes, Lisa; Fosker, Tim

    2016-01-01

    Here we use two filtered speech tasks to investigate children's processing of slow (dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed. PMID:27303348

  3. Perception of Filtered Speech by Children with Developmental Dyslexia and Children with Specific Language Impairments

    Science.gov (United States)

    Goswami, Usha; Cumming, Ruth; Chait, Maria; Huss, Martina; Mead, Natasha; Wilson, Angela M.; Barnes, Lisa; Fosker, Tim

    2016-01-01

    Here we use two filtered speech tasks to investigate children’s processing of slow (dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed. PMID:27303348

  4. The Effect of Talker and Intonation Variability on Speech Perception in Noise in Children with Dyslexia

    Science.gov (United States)

    Hazan, Valerie; Messaoud-Galusi, Souhila; Rosen, Stuart

    2013-01-01

    Purpose: In this study, the authors aimed to determine whether children with dyslexia (hereafter referred to as "DYS children") are more affected than children with average reading ability (hereafter referred to as "AR children") by talker and intonation variability when perceiving speech in noise. Method: Thirty-four DYS and 25 AR children were…

  5. Attention to Facial Regions in Segmental and Prosodic Visual Speech Perception Tasks.

    Science.gov (United States)

    Lansing, Charissa R.; McConkie, George W.

    1999-01-01

    Two experiments were conducted to test the hypothesis that visual information related to segmental versus prosodic aspects of speech is distributed differently on the face of the talker. Results indicate that information in the upper part of the talker's face is more critical for intonation pattern decisions than for decisions about word segments…

  6. Universal and language-specific sublexical cues in speech perception: a novel electroencephalography-lesion approach.

    Science.gov (United States)

    Obrig, Hellmuth; Mentzel, Julia; Rossi, Sonja

    2016-06-01

    SEE CAPPA DOI101093/BRAIN/AWW090 FOR A SCIENTIFIC COMMENTARY ON THIS ARTICLE  : The phonological structure of speech supports the highly automatic mapping of sound to meaning. While it is uncontroversial that phonotactic knowledge acts upon lexical access, it is unclear at what stage these combinatorial rules, governing phonological well-formedness in a given language, shape speech comprehension. Moreover few studies have investigated the neuronal network affording this important step in speech comprehension. Therefore we asked 70 participants-half of whom suffered from a chronic left hemispheric lesion-to listen to 252 different monosyllabic pseudowords. The material models universal preferences of phonotactic well-formedness by including naturally spoken pseudowords and digitally reversed exemplars. The latter partially violate phonological structure of all human speech and are rich in universally dispreferred phoneme sequences while preserving basic auditory parameters. Language-specific constraints were modelled in that half of the naturally spoken pseudowords complied with the phonotactics of the native language of the monolingual participants (German) while the other half did not. To ensure universal well-formedness and naturalness, the latter stimuli comply with Slovak phonotactics and all stimuli were produced by an early bilingual speaker. To maximally attenuate lexico-semantic influences, transparent pseudowords were avoided and participants had to detect immediate repetitions, a task orthogonal to the contrasts of interest. The results show that phonological 'well-formedness' modulates implicit processing of speech at different levels: universally dispreferred phonological structure elicits early, medium and late latency differences in the evoked potential. On the contrary, the language-specific phonotactic contrast selectively modulates a medium latency component of the event-related potentials around 400 ms. Using a novel event-related potential

  7. Lexico-semantic and acoustic-phonetic processes in the perception of noise-vocoded speech: implications for cochlear implantation.

    Directory of Open Access Journals (Sweden)

    Carolyn eMcGettigan

    2014-02-01

    Full Text Available Noise-vocoding is a transformation which, when applied to speech, severely reduces spectral resolution and eliminates periodicity, yielding a stimulus that sounds like a harsh whisper (Scott, Blank et al. 2000. This process simulates a cochlear implant, where the activity of many thousand hair cells in the inner ear is replaced by direct stimulation of the auditory nerve by a small number of tonotopically-arranged electrodes. Although a cochlear implant offers a powerful means of restoring some degree of hearing to profoundly deaf individuals, the outcomes for spoken communication are highly variable (Moore and Shannon 2009. Some variability may arise from differences in peripheral representation (e.g. the degree of residual nerve survival but some may reflect differences in higher-order linguistic processing. In order to explore this possibility, we used noise-vocoding to explore speech recognition and perceptual learning in normal-hearing listeners tested across several levels of the linguistic hierarchy: segments (consonants and vowels, single words, and sentences. Listeners improved significantly on all tasks across two test sessions. In the first session, individual differences analyses revealed two independently varying sources of variability: one lexico-semantic in nature and implicating the recognition of words and sentences, and the other an acoustic-phonetic factor associated with words and segments. However, consequent to learning, by the second session there was a more uniform covariance pattern concerning all stimulus types. A further analysis of phonetic feature recognition allowed greater insight into learning-related changes in perception and showed that, surprisingly, participants did not make full use of cues that were preserved in the stimuli (e.g. vowel duration. We discuss these findings in relation cochlear implantation, and suggest auditory training strategies to maximise speech recognition performance in the absence of

  8. The effect of visual apparent motion on audiovisual simultaneity.

    Science.gov (United States)

    Kwon, Jinhwan; Ogawa, Ken-ichiro; Miyake, Yoshihiro

    2014-01-01

    Visual motion information from dynamic environments is important in multisensory temporal perception. However, it is unclear how visual motion information influences the integration of multisensory temporal perceptions. We investigated whether visual apparent motion affects audiovisual temporal perception. Visual apparent motion is a phenomenon in which two flashes presented in sequence in different positions are perceived as continuous motion. Across three experiments, participants performed temporal order judgment (TOJ) tasks. Experiment 1 was a TOJ task conducted in order to assess audiovisual simultaneity during perception of apparent motion. The results showed that the point of subjective simultaneity (PSS) was shifted toward a sound-lead stimulus, and the just noticeable difference (JND) was reduced compared with a normal TOJ task with a single flash. This indicates that visual apparent motion affects audiovisual simultaneity and improves temporal discrimination in audiovisual processing. Experiment 2 was a TOJ task conducted in order to remove the influence of the amount of flash stimulation from Experiment 1. The PSS and JND during perception of apparent motion were almost identical to those in Experiment 1, but differed from those for successive perception when long temporal intervals were included between two flashes without motion. This showed that the result obtained under the apparent motion condition was unaffected by the amount of flash stimulation. Because apparent motion was produced by a constant interval between two flashes, the results may be accounted for by specific prediction. In Experiment 3, we eliminated the influence of prediction by randomizing the intervals between the two flashes. However, the PSS and JND did not differ from those in Experiment 1. It became clear that the results obtained for the perception of visual apparent motion were not attributable to prediction. Our findings suggest that visual apparent motion changes temporal

  9. Compensation for complete assimilation in speech perception: The case of Korean labial-to-velar assimilation

    OpenAIRE

    Mitterer, H.; Kim, S.; Cho, T.

    2013-01-01

    In connected speech, phonological assimilation to neighboring words can lead to pronunciation variants (e.g., 'garden bench'→ "gardem bench"). A large body of literature suggests that listeners use the phonetic context to reconstruct the intended word for assimilation types that often lead to incomplete assimilations (e.g., a pronunciation of "garden" that carries cues for both a labial [m] and an alveolar [n]). In the current paper, we show that a similar context effect is observed for an as...

  10. Development of a topic-related sentence corpus for speech perception research

    Science.gov (United States)

    Helfer, Karen S.; Freyman, Richard L.

    2001-05-01

    A large sentence corpus has been developed for use in speech recognition research. Sentences (n=881, three scoring words per sentence) were developed under 23 topics. In the first phase of development subjects rated each individual scoring word for relatedness to its given topic on a Likert scale. Next, two groups of young, normal-hearing listeners (n=16/group) listened and responded to the recordings of the sentences (spoken by a female talker) presented with one of two types of maskers: steady-state noise (S:N=-13 dB) or two other females speaking random sentences (S:N=-8 dB). Each subject responded to half of the sentences with topic supplied and half with no topic supplied. Data analyses focused on addressing two questions: whether supplementation of topic would be more important in the presence of the speech masker versus the noise masker, and how the degree of relatedness of each key word to the topic influenced the effect of topic on recognition. The data showed little difference in how beneficial the topic was for speech versus noise maskers. Moreover, there was a complex relationship between effect of topic, type of masker, and position of the word in the sentence. [Work supported by NIDCD DC01625.

  11. Speech perception at positive signal-to-noise ratios using adaptive adjustment of time compression.

    Science.gov (United States)

    Schlueter, Anne; Brand, Thomas; Lemke, Ulrike; Nitzschner, Stefan; Kollmeier, Birger; Holube, Inga

    2015-11-01

    Positive signal-to-noise ratios (SNRs) characterize listening situations most relevant for hearing-impaired listeners in daily life and should therefore be considered when evaluating hearing aid algorithms. For this, a speech-in-noise test was developed and evaluated, in which the background noise is presented at fixed positive SNRs and the speech rate (i.e., the time compression of the speech material) is adaptively adjusted. In total, 29 younger and 12 older normal-hearing, as well as 24 older hearing-impaired listeners took part in repeated measurements. Younger normal-hearing and older hearing-impaired listeners conducted one of two adaptive methods which differed in adaptive procedure and step size. Analysis of the measurements with regard to list length and estimation strategy for thresholds resulted in a practical method measuring the time compression for 50% recognition. This method uses time-compression adjustment and step sizes according to Versfeld and Dreschler [(2002). J. Acoust. Soc. Am. 111, 401-408], with sentence scoring, lists of 30 sentences, and a maximum likelihood method for threshold estimation. Evaluation of the procedure showed that older participants obtained higher test-retest reliability compared to younger participants. Depending on the group of listeners, one or two lists are required for training prior to data collection. PMID:26627804

  12. Neural substrates of figurative language during natural speech perception: an fMRI study

    Directory of Open Access Journals (Sweden)

    Arne eNagels

    2013-09-01

    Full Text Available Many figurative expressions are fully conventionalized in everyday speech. Regarding the neural basis of figurative language processing, research has predominantly focused on metaphoric expressions in minimal semantic context. It remains unclear in how far metaphoric expressions during continuous text comprehension activate similar neural networks as isolated metaphors. We therefore investigated the processing of similes (figurative language, e.g. He smokes like a chimney! occurring in a short story.Sixteen healthy, male, native German speakers listened to similes that came about naturally in a short story, while blood-oxygenation-level-dependent (BOLD responses were measured with functional magnetic resonance imaging (fMRI. For the event-related analysis, similes were contrasted with non-figurative control sentences. The stimuli differed with respect to figurativeness, while they were matched for frequency of words, number of syllables, plausibility and comprehensibility.Similes contrasted with control sentences resulted in enhanced BOLD responses in the left inferior (IFG and adjacent middle frontal gyrus. Concrete control sentences as compared to similes activated the bilateral middle temporal gyri as well as the right precuneus and the left middle frontal gyrus.Activation of the left IFG for similes in a short story is consistent with results on single sentence metaphor processing. The findings strengthen the importance of the left inferior frontal region in the processing of abstract figurative speech during continuous, ecologically-valid speech comprehension; the processing of concrete semantic contents goes along with a down-regulation of bilateral temporal regions.

  13. The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise

    NARCIS (Netherlands)

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2008-01-01

    OBJECTIVES: The aim of this study was to evaluate the benefit that listeners obtain from visually presented output from an automatic speech recognition (ASR) system during listening to speech in noise. DESIGN: Auditory-alone and audiovisual speech reception thresholds (SRTs) were measured. The SRT i

  14. Utilizing New Audiovisual Resources

    Science.gov (United States)

    Miller, Glen

    1975-01-01

    The University of Arizona's Agriculture Department has found that video cassette systems and 8 mm films are excellent audiovisual aids to classroom instruction at the high school level in small gasoline engines. Each system is capable of improving the instructional process for motor skill development. (MW)

  15. Perception of Sentence Stress in Speech Correlates with the Temporal Unpredictability of Prosodic Features

    Science.gov (United States)

    Kakouros, Sofoklis; Räsänen, Okko

    2016-01-01

    Numerous studies have examined the acoustic correlates of sentential stress and its underlying linguistic functionality. However, the mechanism that connects stress cues to the listener's attentional processing has remained unclear. Also, the learnability versus innateness of stress perception has not been widely discussed. In this work, we…

  16. Allophonic Mode of Speech Perception in Dutch Children at Risk for Dyslexia: A Longitudinal Study

    Science.gov (United States)

    Noordenbos, M. W.; Segers, E.; Serniclaes, W.; Mitterer, H.; Verhoeven, L.

    2012-01-01

    There is ample evidence that individuals with dyslexia have a phonological deficit. A growing body of research also suggests that individuals with dyslexia have problems with categorical perception, as evidenced by weaker discrimination of between-category differences and better discrimination of within-category differences compared to average…

  17. Perceptions of Speech-Pathology and Audiology Students Concerning Death and Dying: A Preliminary Study

    Science.gov (United States)

    Rivers, Kenyatta O.; Perkins, Rosalie A.; Carson, Cecyle P.

    2009-01-01

    Background: Formal training in dealing with death and dying issues is not a standard content area in communication sciences and disorders programmes' curricula. At the same time, it cannot be presumed that pre-professional students' personal background equips them to deal with these issues. Aim: To investigate the perceptions of pre-professional…

  18. The Audiovisual Temporal Binding Window Narrows in Early Childhood

    Science.gov (United States)

    Lewkowicz, David J.; Flom, Ross

    2014-01-01

    Binding is key in multisensory perception. This study investigated the audio-visual (A-V) temporal binding window in 4-, 5-, and 6-year-old children (total N = 120). Children watched a person uttering a syllable whose auditory and visual components were either temporally synchronized or desynchronized by 366, 500, or 666 ms. They were asked…

  19. Perception of Music and Speech in Adolescents with Cochlear Implants – A Pilot Study on Effects of Intensive Musical Ear Training

    DEFF Research Database (Denmark)

    Petersen, Bjørn; Sørensen, Stine Derdau; Pedersen, Ellen Raben;

    their standard school schedule and received no music training. Before and after the intervention period, both groups completed a set of tests for perception of music, speech and emotional prosody. In addition, the participants filled out a questionnaire which examined music listening habits and enjoyment....... RESULTS CI users significantly improved their overall music perception and discrimination of melodic contour and rhythm in particular. No effect of the music training was found on discrimination of emotional prosody or speech. The CI users described levels of music engagement and enjoyment that were...... combined with their positive feedback suggests that music training could form part of future rehabilitation programs as a strong, motivational and beneficial method of improving auditory skills in adolescent CI users....

  20. Student performance and their perception of a patient-oriented problem-solving approach with audiovisual aids in teaching pathology: a comparison with traditional lectures

    Directory of Open Access Journals (Sweden)

    Arjun Singh

    2010-12-01

    Full Text Available Arjun SinghDepartment of Pathology, Sri Venkateshwara Medical College Hospital and Research Centre, Pondicherry, IndiaPurpose: We use different methods to train our undergraduates. The patient-oriented problem-solving (POPS system is an innovative teaching–learning method that imparts knowledge, enhances intrinsic motivation, promotes self learning, encourages clinical reasoning, and develops long-lasting memory. The aim of this study was to develop POPS in teaching pathology, assess its effectiveness, and assess students’ preference for POPS over didactic lectures.Method: One hundred fifty second-year MBBS students were divided into two groups: A and B. Group A was taught by POPS while group B was taught by traditional lectures. Pre- and post-test numerical scores of both groups were evaluated and compared. Students then completed a self-structured feedback questionnaire for analysis.Results: The mean (SD difference in pre- and post-test scores of groups A and B was 15.98 (3.18 and 7.79 (2.52, respectively. The significance of the difference between scores of group A and group B teaching methods was 16.62 (P < 0.0001, as determined by the z-test. Improvement in post-test performance of group A was significantly greater than of group B, demonstrating the effectiveness of POPS. Students responded that POPS facilitates self-learning, helps in understanding topics, creates interest, and is a scientific approach to teaching. Feedback response on POPS was strong in 57.52% of students, moderate in 35.67%, and negative in only 6.81%, showing that 93.19% students favored POPS over simple lectures.Conclusion: It is not feasible to enforce the PBL method of teaching throughout the entire curriculum; However, POPS can be incorporated along with audiovisual aids to break the monotony of dialectic lectures and as alternative to PBL.Keywords: medical education, problem-solving exercise, problem-based learning

  1. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    Directory of Open Access Journals (Sweden)

    Akitoshi Ogawa

    Full Text Available The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion. Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround, 3D with monaural sound (3D-Mono, 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG. The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life

  2. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    Science.gov (United States)

    Ogawa, Akitoshi; Bordier, Cecile; Macaluso, Emiliano

    2013-01-01

    The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion). Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround), 3D with monaural sound (3D-Mono), 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG). The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life-like stimuli. PMID

  3. Using Cortical Auditory Evoked Potentials as a predictor of speech perception ability in Auditory Neuropathy Spectrum Disorder and conditions with ANSD-like clinical presentation

    OpenAIRE

    Stirling, Francesca

    2015-01-01

    Auditory Neuropathy Spectrum Disorder (ANSD) is diagnosed by the presence of outer hair cell function, and absence or severe abnormality of the auditory brainstem response (ABR). Within the spectrum of ANSD, level of severity varies greatly in two domains: hearing thresholds can range from normal levels to a profound hearing loss, and degree of speech perception impairment also varies. The latter gives a meaningful indication of severity in ANSD. As the ABR does not relate to functional perfo...

  4. Multilevel Analysis in Analyzing Speech Data

    Science.gov (United States)

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  5. Speech perception in older hearing impaired listeners: benefits of perceptual training.

    Directory of Open Access Journals (Sweden)

    David L Woods

    Full Text Available Hearing aids (HAs only partially restore the ability of older hearing impaired (OHI listeners to understand speech in noise, due in large part to persistent deficits in consonant identification. Here, we investigated whether adaptive perceptual training would improve consonant-identification in noise in sixteen aided OHI listeners who underwent 40 hours of computer-based training in their homes. Listeners identified 20 onset and 20 coda consonants in 9,600 consonant-vowel-consonant (CVC syllables containing different vowels (/ɑ/, /i/, or /u/ and spoken by four different talkers. Consonants were presented at three consonant-specific signal-to-noise ratios (SNRs spanning a 12 dB range. Noise levels were adjusted over training sessions based on d' measures. Listeners were tested before and after training to measure (1 changes in consonant-identification thresholds using syllables spoken by familiar and unfamiliar talkers, and (2 sentence reception thresholds (SeRTs using two different sentence tests. Consonant-identification thresholds improved gradually during training. Laboratory tests of d' thresholds showed an average improvement of 9.1 dB, with 94% of listeners showing statistically significant training benefit. Training normalized consonant confusions and improved the thresholds of some consonants into the normal range. Benefits were equivalent for onset and coda consonants, syllables containing different vowels, and syllables presented at different SNRs. Greater training benefits were found for hard-to-identify consonants and for consonants spoken by familiar than unfamiliar talkers. SeRTs, tested with simple sentences, showed less elevation than consonant-identification thresholds prior to training and failed to show significant training benefit, although SeRT improvements did correlate with improvements in consonant thresholds. We argue that the lack of SeRT improvement reflects the dominant role of top-down semantic processing in

  6. The psychology of corporate rights: Perception of corporate versus individual rights to religious liberty, privacy, and free speech.

    Science.gov (United States)

    Mentovich, Avital; Huq, Aziz; Cerf, Moran

    2016-04-01

    The U.S. Supreme Court has increasingly expanded the scope of constitutional rights granted to corporations and other collective entities. Although this tendency receives widespread public and media attention, little empirical research examines how people ascribe rights, commonly thought to belong to natural persons, to corporations. This article explores this issue in 3 studies focusing on different rights (religious liberty, privacy, and free speech). We examined participants' willingness to grant a given right while manipulating the type of entity at stake (from small businesses, to larger corporations, to for-profit and nonprofit companies), and the identity of the right holder (from employees, to owners, to the company itself as a separate entity). We further examined the role of political ideology in perceptions of rights. Results indicated a significant decline in the degree of recognition of entities' rights (the company itself) in comparison to natural persons' rights (owners and employees). Results also demonstrated an effect of the type of entity at stake: Larger, for-profit businesses were less likely to be viewed as rights holders compared with nonprofit entities. Although both tendencies persisted across the ideological spectrum, ideological differences emerged in the relations between corporate and individual rights: these were positively related among conservatives but negatively related among liberals. Finally, we found that the desire to protect citizens (compared with businesses) underlies individuals' willingness to grant rights to companies. These findings show that people (rather than corporations) are more appropriate recipients of rights, and can explain public backlash to judicial expansions of corporate rights.

  7. Base-language effects on word identification in bilingual speech: evidence from categorical perception experiments.

    Science.gov (United States)

    Bürki-Cohen, J; Grosjean, F; Miller, J L

    1989-01-01

    The categorical perception paradigm was used to investigate whether French-English bilinguals categorize a code-switched word as French or English on the basis of its acoustic-phonetic information alone or whether they are influenced by the base-language context in which the word occurs, that is, by the language in which the majority of words are spoken. Subjects identified stimuli from computer-edited series that ranged from an English to a French word as either the English or the French endpoint. The stimuli were preceded by either an English or a French context sentence. In accord with previous studies (Grosjean, 1988), it was found that the base language had a contrastive effect on the perception of a code-switched word when the endpoints of the between-language series were phonetically marked as English and French, respectively. When the endpoints of the series were phonetically unmarked and thus compatible with either language, however, no effect of the base language was found; in particular, we failed to find the assimilative effect that has been observed with other paradigms (Grosjean, 1988; Soares and Grosjean, 1984; Macnamara and Kushnir, 1971). The current results provide confirming evidence that the perception of a code-switched word is influenced by the base-language context in which it occurs and, moreover, that the nature of the effect depends on the acoustic-phonetic characteristics of the code-switched word. In addition, the finding that a contrastive effect occurs across all paradigms used to date, but that an assimilative effect occurs in only some paradigms, suggests that these two context effects may arise at different stages of processing. PMID:2485850

  8. Psychoacoustic Performance and Music and Speech Perception in Prelingually Deafened Children with Cochlear Implants

    Science.gov (United States)

    Jung, Kyu Hwan; Won, Jong Ho; Drennan, Ward R.; Jameyson, Elyse; Miyasaki, Gary; Norton, Susan J.; Rubinstein, Jay T.

    2012-01-01

    The number of pediatric cochlear implant (CI) recipients has increased substantially over the past 10 years, and it has become more important to understand the underlying mechanisms of the variable outcomes in this population. In this study, psychoacoustic measures of spectral-ripple and Schroeder-phase discrimination, the Clinical Assessment of Music Perception, and consonant-nucleus-consonant (CNC) word recognition in quiet and spondee reception threshold (SRT) in noise tests have been presented to 11 prelingually deafened CI users, aged 8–16 years with at least 5 years of CI experience. The children's performance was compared to the previously reported results of postlingually deafened adult CI users. The average spectral-ripple threshold (n = 10) was 2.08 ripples/octave. The average Schroeder-phase discrimination was 67.3% for 50 Hz and 56.5% for 200 Hz (n = 9). The Clinical Assessment of Music Perception test showed that the average complex pitch direction discrimination was 2.98 semitones. The mean melody score was at a chance level, and the mean timbre score was 34.1% correct. The mean CNC word recognition score was 68.6%, and the mean SRT in steady noise was −8.5 dB SNR. The children's spectral-ripple resolution, CNC word recognition, and SRT in noise performances were, within statistical bounds, the same as in a population of postlingually deafened adult CI users. However, Schroeder-phase discrimination and music perception were generally poorer than in the adults. It is possible then that this poorer performance seen in the children might be partly accounted for by the delayed maturation in their temporal processing ability, and because of this, the children's performance may have been driven more by their spectral sensitivity. PMID:22398954

  9. Normal-Hearing Listeners' and Cochlear Implant Users' Perception of Pitch Cues in Emotional Speech.

    Science.gov (United States)

    Gilbers, Steven; Fuller, Christina; Gilbers, Dicky; Broersma, Mirjam; Goudbeek, Martijn; Free, Rolien; Başkent, Deniz

    2015-10-01

    In cochlear implants (CIs), acoustic speech cues, especially for pitch, are delivered in a degraded form. This study's aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these differ. Voice actors were recorded pronouncing a nonce word in four different emotions: anger, sadness, joy, and relief. These recordings' pitch cues were phonetically analyzed. The recordings were used to test 20 normal-hearing listeners' and 20 CI users' emotion recognition. In congruence with previous studies, high-arousal emotions had a higher mean pitch, wider pitch range, and more dominant pitches than low-arousal emotions. Regarding pitch, speakers did not differentiate emotions based on valence but on arousal. Normal-hearing listeners outperformed CI users in emotion recognition, even when presented with CI simulated stimuli. However, only normal-hearing listeners recognized one particular actor's emotions worse than the other actors'. The groups behaved differently when presented with similar input, showing that they had to employ differing strategies. Considering the respective speaker's deviating pronunciation, it appears that for normal-hearing listeners, mean pitch is a more salient cue than pitch range, whereas CI users are biased toward pitch range cues. PMID:27648210

  10. The relation between categorical perception of speech stimuli and reading skills in children

    Science.gov (United States)

    Breier, Joshua; Fletcher, Jack; Klaas, Patricia; Gray, Lincoln

    2005-09-01

    Children ages 7 to 14 years listened to seven tokens, /ga/ to /ka/ synthesized in equal steps from 0 to 60 ms along the voice onset time (VOT) continuum, played in continuous rhythm. All possible changes (21) between the seven tokens were presented seven times at random intervals, maintaining the rhythm. Children were asked to press a button as soon as they detected a change. Maps of the seven tokens, constructed from multidimensional scaling of reaction times, indicated two salient dimensions: one phonological and the other acoustic/phonetic. Better reading, spelling, and phonological processing skills were associated with greater relative weighting of the phonological as compared to the acoustic dimension, suggesting that children with reading difficulty and associated deficits may underweight the phonological and/or overweight the acoustic information in speech signals. This task required no training and only momentary memory of the tokens. That an analysis of a simple task coincides with more complex reading tests suggests a low-level deficit (or shift in listening strategy). Compared to control children, children with reading disabilities may pay more attention to subtle details in these signals and less attention to the global pattern or attribute. [Supported by NIH Grant 1 RO1 HD35938 to JIB.

  11. Electrical stimulation of the auditory nerve: the coding of frequency, the perception of pitch and the development of cochlear implant speech processing strategies for profoundly deaf people.

    Science.gov (United States)

    Clark, G M

    1996-09-01

    . The most recent development, however, presents temporal frequency information as amplitude variations at a constant rate of stimulation. 8. As additional speech frequencies have been encoded as place of stimulation, the mean speech perception scores have continued to increase and are now better than the average scores that severely-profoundly deaf adults and children with some residual hearing obtain with a hearing aid. PMID:8911712

  12. Audiovisual integration of stimulus transients

    DEFF Research Database (Denmark)

    Andersen, Tobias; Mamassian, Pascal

    2008-01-01

    leaving only unsigned stimulus transients as the basis for audiovisual integration. Facilitation of luminance detection occurred even with varying audiovisual stimulus onset asynchrony and even when the sound lagged behind the luminance change by 75 ms supporting the interpretation that perceptual...

  13. The Fungible Audio-Visual Mapping and its Experience

    Directory of Open Access Journals (Sweden)

    Adriana Sa

    2014-12-01

    Full Text Available This article draws a perceptual approach to audio-visual mapping. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping. Our aim here is to glean its constitution and aspect. We will report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, after which we pose quantitative and qualitative questions regarding their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole. 

  14. Summarizing Audiovisual Contents of a Video Program

    Directory of Open Access Journals (Sweden)

    Gong Yihong

    2003-01-01

    Full Text Available In this paper, we focus on video programs that are intended to disseminate information and knowledge such as news, documentaries, seminars, etc, and present an audiovisual summarization system that summarizes the audio and visual contents of the given video separately, and then integrating the two summaries with a partial alignment. The audio summary is created by selecting spoken sentences that best present the main content of the audio speech while the visual summary is created by eliminating duplicates/redundancies and preserving visually rich contents in the image stream. The alignment operation aims to synchronize each spoken sentence in the audio summary with its corresponding speaker′s face and to preserve the rich content in the visual summary. A Bipartite Graph-based audiovisual alignment algorithm is developed to efficiently find the best alignment solution that satisfies these alignment requirements. With the proposed system, we strive to produce a video summary that: (1 provides a natural visual and audio content overview, and (2 maximizes the coverage for both audio and visual contents of the original video without having to sacrifice either of them.

  15. Stream Weight Training Based on MCE for Audio-Visual LVCSR

    Institute of Scientific and Technical Information of China (English)

    LIU Peng; WANG Zuoying

    2005-01-01

    In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re-scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental results show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments.

  16. Categorization of natural dynamic audiovisual scenes.

    Directory of Open Access Journals (Sweden)

    Olli Rummukainen

    Full Text Available This work analyzed the perceptual attributes of natural dynamic audiovisual scenes. We presented thirty participants with 19 natural scenes in a similarity categorization task, followed by a semi-structured interview. The scenes were reproduced with an immersive audiovisual display. Natural scene perception has been studied mainly with unimodal settings, which have identified motion as one of the most salient attributes related to visual scenes, and sound intensity along with pitch trajectories related to auditory scenes. However, controlled laboratory experiments with natural multimodal stimuli are still scarce. Our results show that humans pay attention to similar perceptual attributes in natural scenes, and a two-dimensional perceptual map of the stimulus scenes and perceptual attributes was obtained in this work. The exploratory results show the amount of movement, perceived noisiness, and eventfulness of the scene to be the most important perceptual attributes in naturalistically reproduced real-world urban environments. We found the scene gist properties openness and expansion to remain as important factors in scenes with no salient auditory or visual events. We propose that the study of scene perception should move forward to understand better the processes behind multimodal scene processing in real-world environments. We publish our stimulus scenes as spherical video recordings and sound field recordings in a publicly available database.

  17. Cross-Modal and Intra-Modal Characteristics of Visual Function and Speech Perception Performance in Postlingually Deafened, Cochlear Implant Users.

    Directory of Open Access Journals (Sweden)

    Min-Beom Kim

    Full Text Available Evidence of visual-auditory cross-modal plasticity in deaf individuals has been widely reported. Superior visual abilities of deaf individuals have been shown to result in enhanced reactivity to visual events and/or enhanced peripheral spatial attention. The goal of this study was to investigate the association between visual-auditory cross-modal plasticity and speech perception in post-lingually deafened, adult cochlear implant (CI users. Post-lingually deafened adults with CIs (N = 14 and a group of normal hearing, adult controls (N = 12 participated in this study. The CI participants were divided into a good performer group (good CI, N = 7 and a poor performer group (poor CI, N = 7 based on word recognition scores. Visual evoked potentials (VEP were recorded from the temporal and occipital cortex to assess reactivity. Visual field (VF testing was used to assess spatial attention and Goldmann perimetry measures were analyzed to identify differences across groups in the VF. The association of the amplitude of the P1 VEP response over the right temporal or occipital cortex among three groups (control, good CI, poor CI was analyzed. In addition, the association between VF by different stimuli and word perception score was evaluated. The P1 VEP amplitude recorded from the right temporal cortex was larger in the group of poorly performing CI users than the group of good performers. The P1 amplitude recorded from electrodes near the occipital cortex was smaller for the poor performing group. P1 VEP amplitude in right temporal lobe was negatively correlated with speech perception outcomes for the CI participants (r = -0.736, P = 0.003. However, P1 VEP amplitude measures recorded from near the occipital cortex had a positive correlation with speech perception outcome in the CI participants (r = 0.775, P = 0.001. In VF analysis, CI users showed narrowed central VF (VF to low intensity stimuli. However, their far peripheral VF (VF to high intensity

  18. The use of auditory and visual context in speech perception by listeners with normal hearing and listeners with cochlear implants

    Directory of Open Access Journals (Sweden)

    Matthew eWinn

    2013-11-01

    Full Text Available There is a wide range of acoustic and visual variability across different talkers and different speaking contexts. Listeners with normal hearing accommodate that variability in ways that facilitate efficient perception, but it is not known whether listeners with cochlear implants can do the same. In this study, listeners with normal hearing (NH and listeners with cochlear implants (CIs were tested for accommodation to auditory and visual phonetic contexts created by gender-driven speech differences as well as vowel coarticulation and lip rounding in both consonants and vowels. Accommodation was measured as the shifting of perceptual boundaries between /s/ and /ʃ/ sounds in various contexts, as modeled by mixed-effects logistic regression. Owing to the spectral contrasts thought to underlie these context effects, CI listeners were predicted to perform poorly, but showed considerable success. Listeners with cochlear implants not only showed sensitivity to auditory cues to gender, they were also able to use visual cues to gender (i.e. faces as a supplement or proxy for information in the acoustic domain, in a pattern that was not observed for listeners with normal hearing. Spectrally-degraded stimuli heard by listeners with normal hearing generally did not elicit strong context effects, underscoring the limitations of noise vocoders and/or the importance of experience with electric hearing. Visual cues for consonant lip rounding and vowel lip rounding were perceived in a manner consistent with coarticulation and were generally used more heavily by listeners with CIs. Results suggest that listeners with cochlear implants are able to accommodate various sources of acoustic variability either by attending to appropriate acoustic cues or by inferring them via the visual signal.

  19. The psychology of corporate rights: Perception of corporate versus individual rights to religious liberty, privacy, and free speech.

    Science.gov (United States)

    Mentovich, Avital; Huq, Aziz; Cerf, Moran

    2016-04-01

    The U.S. Supreme Court has increasingly expanded the scope of constitutional rights granted to corporations and other collective entities. Although this tendency receives widespread public and media attention, little empirical research examines how people ascribe rights, commonly thought to belong to natural persons, to corporations. This article explores this issue in 3 studies focusing on different rights (religious liberty, privacy, and free speech). We examined participants' willingness to grant a given right while manipulating the type of entity at stake (from small businesses, to larger corporations, to for-profit and nonprofit companies), and the identity of the right holder (from employees, to owners, to the company itself as a separate entity). We further examined the role of political ideology in perceptions of rights. Results indicated a significant decline in the degree of recognition of entities' rights (the company itself) in comparison to natural persons' rights (owners and employees). Results also demonstrated an effect of the type of entity at stake: Larger, for-profit businesses were less likely to be viewed as rights holders compared with nonprofit entities. Although both tendencies persisted across the ideological spectrum, ideological differences emerged in the relations between corporate and individual rights: these were positively related among conservatives but negatively related among liberals. Finally, we found that the desire to protect citizens (compared with businesses) underlies individuals' willingness to grant rights to companies. These findings show that people (rather than corporations) are more appropriate recipients of rights, and can explain public backlash to judicial expansions of corporate rights. PMID:26502001

  20. Percepção da fala em bebês no primeiro ano de vida Speech perception in infants in their first year of life

    Directory of Open Access Journals (Sweden)

    Rosana Maria Tristão

    2003-12-01

    Full Text Available A fala humana é um som de grande complexidade, cujo processamento perceptual, produção e relações com a linguagem e a cognição necessitam de uma análise integrada, tanto do ponto de vista do conhecimento disponível como também das especificidades metodológicas. Neste artigo faz-se uma breve revisão da literatura sobre as principais aquisições e desenvolvimento da linguagem no primeiro ano de vida de bebês com desenvolvimento normal com enfoque na percepção da fala humana. Busca-se, também, analisar a ocorrência de distúrbios auditivos que podem causar alterações na percepção da fala, com possíveis implicações para o desenvolvimento pré-lingüístico. Atenção especial é dada ao desenvolvimento da habilidade de percepção de fala e de linguagem em bebês com síndrome de Down. É analisada a predisposição, nesta população, a problemas audiológicos, sua relação com alterações no desenvolvimento de linguagem, e a tendência apresentada no primeiro ano de vida para padrões diferenciados de atenção à fala.Human speech is a highly complex sound; whose perceptual processing, production and relations to language and cognition require an integrated analysis, not only from the viewpoint of available knowledge but also of its methodological specificities. This article presents a brief review of the literature on the main acquisitions and development of language in the first year of life of normally developing infants, with emphasis on speech perception. One also analyzes the occurrence of auditory disturbances in the first year of life that could jeopardize speech perception, with possible implications for pre-linguistic development. Special attention is give to the development of speech perception and language in Down syndrome infants. The predisposition to audiologic problems, its relation to impairment in the development of language, and the tendency presented in the first year of life of differential patterns

  1. Audiovisual quality assessment in communications applications: Current status, trends and challenges

    DEFF Research Database (Denmark)

    Korhonen, Jari

    2010-01-01

    Audiovisual quality assessment is one of the major challenges in multimedia communications. Traditionally, algorithm-based (objective) assessment methods have focused primarily on the compression artifacts. However, compression is only one of the numerous factors influencing the perception...... addressed in practical quality metrics is the co-impact of audio and video qualities. This paper provides an overview of the current trends and challenges in objective audiovisual quality assessment, with emphasis on communication applications...

  2. Visual-Auditory Integration during Speech Imitation in Autism

    Science.gov (United States)

    Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

    2004-01-01

    Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…

  3. Effects of Attention on the Strength of Lexical Influences on Speech Perception: Behavioral Experiments and Computational Mechanisms

    Science.gov (United States)

    Mirman, Daniel; McClelland, James L.; Holt, Lori L.; Magnuson, James S.

    2008-01-01

    The effects of lexical context on phonological processing are pervasive and there have been indications that such effects may be modulated by attention. However, attentional modulation in speech processing is neither well documented nor well understood. Experiment 1 demonstrated attentional modulation of lexical facilitation of speech sound…

  4. Boosting pitch encoding with audiovisual interactions in congenital amusia.

    Science.gov (United States)

    Albouy, Philippe; Lévêque, Yohana; Hyde, Krista L; Bouchet, Patrick; Tillmann, Barbara; Caclin, Anne

    2015-01-01

    The combination of information across senses can enhance perception, as revealed for example by decreased reaction times or improved stimulus detection. Interestingly, these facilitatory effects have been shown to be maximal when responses to unisensory modalities are weak. The present study investigated whether audiovisual facilitation can be observed in congenital amusia, a music-specific disorder primarily ascribed to impairments of pitch processing. Amusic individuals and their matched controls performed two tasks. In Task 1, they were required to detect auditory, visual, or audiovisual stimuli as rapidly as possible. In Task 2, they were required to detect as accurately and as rapidly as possible a pitch change within an otherwise monotonic 5-tone sequence that was presented either only auditorily (A condition), or simultaneously with a temporally congruent, but otherwise uninformative visual stimulus (AV condition). Results of Task 1 showed that amusics exhibit typical auditory and visual detection, and typical audiovisual integration capacities: both amusics and controls exhibited shorter response times for audiovisual stimuli than for either auditory stimuli or visual stimuli. Results of Task 2 revealed that both groups benefited from simultaneous uninformative visual stimuli to detect pitch changes: accuracy was higher and response times shorter in the AV condition than in the A condition. The audiovisual improvements of response times were observed for different pitch interval sizes depending on the group. These results suggest that both typical listeners and amusic individuals can benefit from multisensory integration to improve their pitch processing abilities and that this benefit varies as a function of task difficulty. These findings constitute the first step towards the perspective to exploit multisensory paradigms to reduce pitch-related deficits in congenital amusia, notably by suggesting that audiovisual paradigms are effective in an appropriate

  5. Action-outcome learning and prediction shape the window of simultaneity of audiovisual outcomes.

    Science.gov (United States)

    Desantis, Andrea; Haggard, Patrick

    2016-08-01

    To form a coherent representation of the objects around us, the brain must group the different sensory features composing these objects. Here, we investigated whether actions contribute in this grouping process. In particular, we assessed whether action-outcome learning and prediction contribute to audiovisual temporal binding. Participants were presented with two audiovisual pairs: one pair was triggered by a left action, and the other by a right action. In a later test phase, the audio and visual components of these pairs were presented at different onset times. Participants judged whether they were simultaneous or not. To assess the role of action-outcome prediction on audiovisual simultaneity, each action triggered either the same audiovisual pair as in the learning phase ('predicted' pair), or the pair that had previously been associated with the other action ('unpredicted' pair). We found the time window within which auditory and visual events appeared simultaneous increased for predicted compared to unpredicted pairs. However, no change in audiovisual simultaneity was observed when audiovisual pairs followed visual cues, rather than voluntary actions. This suggests that only action-outcome learning promotes temporal grouping of audio and visual effects. In a second experiment we observed that changes in audiovisual simultaneity do not only depend on our ability to predict what outcomes our actions generate, but also on learning the delay between the action and the multisensory outcome. When participants learned that the delay between action and audiovisual pair was variable, the window of audiovisual simultaneity for predicted pairs increased, relative to a fixed action-outcome pair delay. This suggests that participants learn action-based predictions of audiovisual outcome, and adapt their temporal perception of outcome events based on such predictions. PMID:27131076

  6. Action-outcome learning and prediction shape the window of simultaneity of audiovisual outcomes.

    Science.gov (United States)

    Desantis, Andrea; Haggard, Patrick

    2016-08-01

    To form a coherent representation of the objects around us, the brain must group the different sensory features composing these objects. Here, we investigated whether actions contribute in this grouping process. In particular, we assessed whether action-outcome learning and prediction contribute to audiovisual temporal binding. Participants were presented with two audiovisual pairs: one pair was triggered by a left action, and the other by a right action. In a later test phase, the audio and visual components of these pairs were presented at different onset times. Participants judged whether they were simultaneous or not. To assess the role of action-outcome prediction on audiovisual simultaneity, each action triggered either the same audiovisual pair as in the learning phase ('predicted' pair), or the pair that had previously been associated with the other action ('unpredicted' pair). We found the time window within which auditory and visual events appeared simultaneous increased for predicted compared to unpredicted pairs. However, no change in audiovisual simultaneity was observed when audiovisual pairs followed visual cues, rather than voluntary actions. This suggests that only action-outcome learning promotes temporal grouping of audio and visual effects. In a second experiment we observed that changes in audiovisual simultaneity do not only depend on our ability to predict what outcomes our actions generate, but also on learning the delay between the action and the multisensory outcome. When participants learned that the delay between action and audiovisual pair was variable, the window of audiovisual simultaneity for predicted pairs increased, relative to a fixed action-outcome pair delay. This suggests that participants learn action-based predictions of audiovisual outcome, and adapt their temporal perception of outcome events based on such predictions.

  7. The spatial reliability of task-irrelevant sounds modulates bimodal audiovisual integration: An event-related potential study.

    Science.gov (United States)

    Li, Qi; Yu, Hongtao; Wu, Yan; Gao, Ning

    2016-08-26

    The integration of multiple sensory inputs is essential for perception of the external world. The spatial factor is a fundamental property of multisensory audiovisual integration. Previous studies of the spatial constraints on bimodal audiovisual integration have mainly focused on the spatial congruity of audiovisual information. However, the effect of spatial reliability within audiovisual information on bimodal audiovisual integration remains unclear. In this study, we used event-related potentials (ERPs) to examine the effect of spatial reliability of task-irrelevant sounds on audiovisual integration. Three relevant ERP components emerged: the first at 140-200ms over a wide central area, the second at 280-320ms over the fronto-central area, and a third at 380-440ms over the parieto-occipital area. Our results demonstrate that ERP amplitudes elicited by audiovisual stimuli with reliable spatial relationships are larger than those elicited by stimuli with inconsistent spatial relationships. In addition, we hypothesized that spatial reliability within an audiovisual stimulus enhances feedback projections to the primary visual cortex from multisensory integration regions. Overall, our findings suggest that the spatial linking of visual and auditory information depends on spatial reliability within an audiovisual stimulus and occurs at a relatively late stage of processing. PMID:27392755

  8. The spatial reliability of task-irrelevant sounds modulates bimodal audiovisual integration: An event-related potential study.

    Science.gov (United States)

    Li, Qi; Yu, Hongtao; Wu, Yan; Gao, Ning

    2016-08-26

    The integration of multiple sensory inputs is essential for perception of the external world. The spatial factor is a fundamental property of multisensory audiovisual integration. Previous studies of the spatial constraints on bimodal audiovisual integration have mainly focused on the spatial congruity of audiovisual information. However, the effect of spatial reliability within audiovisual information on bimodal audiovisual integration remains unclear. In this study, we used event-related potentials (ERPs) to examine the effect of spatial reliability of task-irrelevant sounds on audiovisual integration. Three relevant ERP components emerged: the first at 140-200ms over a wide central area, the second at 280-320ms over the fronto-central area, and a third at 380-440ms over the parieto-occipital area. Our results demonstrate that ERP amplitudes elicited by audiovisual stimuli with reliable spatial relationships are larger than those elicited by stimuli with inconsistent spatial relationships. In addition, we hypothesized that spatial reliability within an audiovisual stimulus enhances feedback projections to the primary visual cortex from multisensory integration regions. Overall, our findings suggest that the spatial linking of visual and auditory information depends on spatial reliability within an audiovisual stimulus and occurs at a relatively late stage of processing.

  9. On the matching of top-down knowledge with sensory input in the perception of ambiguous speech

    OpenAIRE

    Hannemann R; Eulitz C

    2010-01-01

    Abstract Background How does the brain repair obliterated speech and cope with acoustically ambivalent situations? A widely discussed possibility is to use top-down information for solving the ambiguity problem. In the case of speech, this may lead to a match of bottom-up sensory input with lexical expectations resulting in resonant states which are reflected in the induced gamma-band activity (GBA). Methods In the present EEG study, we compared the subject's pre-attentive GBA responses to ob...

  10. Voice Onset Time and the Perception of Japanese Voicing Contrasts

    OpenAIRE

    Wilson, Ian; Hashimoto Yurika

    2013-01-01

    Much crosslinguistic research exists on the production and perception of voice onset time (VOT). However, most research on the perception of VOT uses synthetic stimuli instead of natural speech stimuli. Effects of synthetic speech on the perception of VOT are not known, but more research needs to be done to see if there are differences between perception using synthetic speech and perception using natural speech. This pilot study uses natural speech to investigate perception of Japanese VO...

  11. Effects of Production Training and Perception Training on Lexical Tone Perception--Are the Effects Domain General or Domain Specific?

    Science.gov (United States)

    Lu, Shuang

    2013-01-01

    The relationship between speech perception and production has been debated for a long time. The Motor Theory of speech perception (Liberman et al., 1989) claims that perceiving speech is identifying the intended articulatory gestures rather than perceiving the sound patterns. It seems to suggest that speech production precedes speech perception,…

  12. An ERP study of good production vis-a-vis poor perception of tones in Cantonese: implications for top-down speech processing.

    Directory of Open Access Journals (Sweden)

    Sam-Po Law

    Full Text Available This study investigated a theoretically challenging dissociation between good production and poor perception of tones among neurologically unimpaired native speakers of Cantonese. The dissociation is referred to as the near-merger phenomenon in sociolinguistic studies of sound change. In a passive oddball paradigm, lexical and nonlexical syllables of the T1/T6 and T4/T6 contrasts were presented to elicit the mismatch negativity (MMN and P3a from two groups of participants, those who could produce and distinguish all tones in the language (Control and those who could produce all tones but specifically failed to distinguish between T4 and T6 in perception (Dissociation. The presence of MMN to T1/T6 and null response to T4/T6 of lexical syllables in the dissociation group confirmed the near-merger phenomenon. The observation that the control participants exhibited a statistically reliable MMN to lexical syllables of T1/T6, weaker responses to nonlexical syllables of T1/T6 and lexical syllables of T4/T6, and finally null response to nonlexical syllables of T4/T6, suggests the involvement of top-down processing in speech perception. Furthermore, the stronger P3a response of the control group, compared with the dissociation group in the same experimental conditions, may be taken to indicate higher cognitive capability in attention switching, auditory attention or memory in the control participants. This cognitive difference, together with our speculation that constant top-down predictions without complete bottom-up analysis of acoustic signals in speech recognition may reduce one's sensitivity to small acoustic contrasts, account for the occurrence of dissociation in some individuals but not others.

  13. An ERP study of good production vis-à-vis poor perception of tones in Cantonese: implications for top-down speech processing.

    Science.gov (United States)

    Law, Sam-Po; Fung, Roxana; Kung, Carmen

    2013-01-01

    This study investigated a theoretically challenging dissociation between good production and poor perception of tones among neurologically unimpaired native speakers of Cantonese. The dissociation is referred to as the near-merger phenomenon in sociolinguistic studies of sound change. In a passive oddball paradigm, lexical and nonlexical syllables of the T1/T6 and T4/T6 contrasts were presented to elicit the mismatch negativity (MMN) and P3a from two groups of participants, those who could produce and distinguish all tones in the language (Control) and those who could produce all tones but specifically failed to distinguish between T4 and T6 in perception (Dissociation). The presence of MMN to T1/T6 and null response to T4/T6 of lexical syllables in the dissociation group confirmed the near-merger phenomenon. The observation that the control participants exhibited a statistically reliable MMN to lexical syllables of T1/T6, weaker responses to nonlexical syllables of T1/T6 and lexical syllables of T4/T6, and finally null response to nonlexical syllables of T4/T6, suggests the involvement of top-down processing in speech perception. Furthermore, the stronger P3a response of the control group, compared with the dissociation group in the same experimental conditions, may be taken to indicate higher cognitive capability in attention switching, auditory attention or memory in the control participants. This cognitive difference, together with our speculation that constant top-down predictions without complete bottom-up analysis of acoustic signals in speech recognition may reduce one's sensitivity to small acoustic contrasts, account for the occurrence of dissociation in some individuals but not others. PMID:23342146

  14. McGurk illusion recalibrates subsequent auditory perception.

    Science.gov (United States)

    Lüttke, Claudia S; Ekman, Matthias; van Gerven, Marcel A J; de Lange, Floris P

    2016-01-01

    Visual information can alter auditory perception. This is clearly illustrated by the well-known McGurk illusion, where an auditory/aba/ and a visual /aga/ are merged to the percept of 'ada'. It is less clear however whether such a change in perception may recalibrate subsequent perception. Here we asked whether the altered auditory perception due to the McGurk illusion affects subsequent auditory perception, i.e. whether this process of fusion may cause a recalibration of the auditory boundaries between phonemes. Participants categorized auditory and audiovisual speech stimuli as /aba/, /ada/ or /aga/ while activity patterns in their auditory cortices were recorded using fMRI. Interestingly, following a McGurk illusion, an auditory /aba/ was more often misperceived as 'ada'. Furthermore, we observed a neural counterpart of this recalibration in the early auditory cortex. When the auditory input /aba/ was perceived as 'ada', activity patterns bore stronger resemblance to activity patterns elicited by /ada/ sounds than when they were correctly perceived as /aba/. Our results suggest that upon experiencing the McGurk illusion, the brain shifts the neural representation of an /aba/ sound towards /ada/, culminating in a recalibration in perception of subsequent auditory input. PMID:27611960

  15. McGurk illusion recalibrates subsequent auditory perception

    Science.gov (United States)

    Lüttke, Claudia S.; Ekman, Matthias; van Gerven, Marcel A. J.; de Lange, Floris P.

    2016-01-01

    Visual information can alter auditory perception. This is clearly illustrated by the well-known McGurk illusion, where an auditory/aba/ and a visual /aga/ are merged to the percept of ‘ada’. It is less clear however whether such a change in perception may recalibrate subsequent perception. Here we asked whether the altered auditory perception due to the McGurk illusion affects subsequent auditory perception, i.e. whether this process of fusion may cause a recalibration of the auditory boundaries between phonemes. Participants categorized auditory and audiovisual speech stimuli as /aba/, /ada/ or /aga/ while activity patterns in their auditory cortices were recorded using fMRI. Interestingly, following a McGurk illusion, an auditory /aba/ was more often misperceived as ‘ada’. Furthermore, we observed a neural counterpart of this recalibration in the early auditory cortex. When the auditory input /aba/ was perceived as ‘ada’, activity patterns bore stronger resemblance to activity patterns elicited by /ada/ sounds than when they were correctly perceived as /aba/. Our results suggest that upon experiencing the McGurk illusion, the brain shifts the neural representation of an /aba/ sound towards /ada/, culminating in a recalibration in perception of subsequent auditory input. PMID:27611960

  16. The Influence of Speech Perception, Oral Language Ability, the Home Literacy Environment, and Pre-Reading Knowledge on the Growth of Phonological Sensitivity: A One-Year Longitudinal Investigation.

    Science.gov (United States)

    Burgess, Stephen R.

    2002-01-01

    Examines the influences of speech perception, oral language ability, emergent literacy, and the home literacy environment on the growth of phonological sensitivity. Finds, overall, the combination of predictors explained a significant proportion of the variance in phonological sensitivity and its growth. Discusses results in terms of their…

  17. Blacklist Established in Chinese Audiovisual Market

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The Chinese audiovisual market is to impose a ban on audiovisual product dealers whose licenses have been revoked for violatingthe law. This ban will prohibit them from dealing in audiovisual products for ten years. Their names are to be included on a blacklist made known to the public.

  18. Audio-Visual Aids: Historians in Blunderland.

    Science.gov (United States)

    Decarie, Graeme

    1988-01-01

    A history professor relates his experiences producing and using audio-visual material and warns teachers not to rely on audio-visual aids for classroom presentations. Includes examples of popular audio-visual aids on Canada that communicate unintended, inaccurate, or unclear ideas. Urges teachers to exercise caution in the selection and use of…

  19. [Audio-visual aids and tropical medicine].

    Science.gov (United States)

    Morand, J J

    1989-01-01

    The author presents a list of the audio-visual productions about Tropical Medicine, as well as of their main characteristics. He thinks that the audio-visual educational productions are often dissociated from their promotion; therefore, he invites the future creator to forward his work to the Audio-Visual Health Committee.

  20. Recognition of Isolated Words using Zernike and MFCC features for Audio Visual Speech Recognition

    OpenAIRE

    Bordea, Prashant; Varpeb, Amarsinh; Manzac, Ramesh; Yannawara, Pravin

    2014-01-01

    Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The objective of audio-visual speech recognition system is to improve recognition accuracy. In this paper we computed visual features using Zernike m...

  1. Musical expertise induces audiovisual integration of abstract congruency rules.

    Science.gov (United States)

    Paraskevopoulos, Evangelos; Kuchenbuch, Anja; Herholz, Sibylle C; Pantev, Christo

    2012-12-12

    Perception of everyday life events relies mostly on multisensory integration. Hence, studying the neural correlates of the integration of multiple senses constitutes an important tool in understanding perception within an ecologically valid framework. The present study used magnetoencephalography in human subjects to identify the neural correlates of an audiovisual incongruency response, which is not generated due to incongruency of the unisensory physical characteristics of the stimulation but from the violation of an abstract congruency rule. The chosen rule-"the higher the pitch of the tone, the higher the position of the circle"-was comparable to musical reading. In parallel, plasticity effects due to long-term musical training on this response were investigated by comparing musicians to non-musicians. The applied paradigm was based on an appropriate modification of the multifeatured oddball paradigm incorporating, within one run, deviants based on a multisensory audiovisual incongruent condition and two unisensory mismatch conditions: an auditory and a visual one. Results indicated the presence of an audiovisual incongruency response, generated mainly in frontal regions, an auditory mismatch negativity, and a visual mismatch response. Moreover, results revealed that long-term musical training generates plastic changes in frontal, temporal, and occipital areas that affect this multisensory incongruency response as well as the unisensory auditory and visual mismatch responses. PMID:23238733

  2. Visual Target Localization, the Effect of Allocentric Audiovisual Reference Frame

    Directory of Open Access Journals (Sweden)

    David Hartnagel

    2011-10-01

    Full Text Available Visual allocentric references frames (contextual cues affect visual space perception (Diedrichsen et al., 2004; Walter et al., 2006. On the other hand, experiments have shown a change of visual perception induced by binaural stimuli (Chandler, 1961; Carlile et al., 2001. In the present study we investigate the effect of visual and audiovisual allocentred reference frame on visual localization and straight ahead pointing. Participant faced a black part-spherical screen (92cm radius. The head was maintained aligned with the body. Participant wore headphone and a glove with motion capture markers. A red laser point was displayed straight ahead as fixation point. The visual target was a 100ms green laser point. After a short delay, the green laser reappeared and participant had to localize target with a trackball. Straight ahead blind pointing was required before and after series of 48 trials. Visual part of the bimodal allocentred reference frame was provided by a vertical red laser line (15° left or 15° right, auditory part was provided by 3D sound. Five conditions were tested, no-reference, visual reference (left/right, audiovisual reference (left/right. Results show that the significant effect of bimodal audiovisual reference is not different from the visual reference one.

  3. Perception of Suprasegmental Speech Features via Bimodal Stimulation: Cochlear Implant on One Ear and Hearing Aid on the Other

    Science.gov (United States)

    Most, Tova; Harel, Tamar; Shpak, Talma; Luntz, Michal

    2011-01-01

    Purpose: The purpose of the study was to evaluate the contribution of acoustic hearing to the perception of suprasegmental features by adults who use a cochlear implant (CI) and a hearing aid (HA) in opposite ears. Method: 23 adults participated in this study. Perception of suprasegmental features--intonation, syllable stress, and word…

  4. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

    Science.gov (United States)

    Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV) integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual) the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/) which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/), lip-reading (when the response was /ka/), fusion (when the response was /ta/) and other (when the response was something other than /pa/, /ka/ or /ta/). Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N = 8), hearing-individuals who were experts in CS (N = 14) and hearing-individuals who were completely naïve of CS (N = 15). Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf people

  5. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults ?

    Directory of Open Access Journals (Sweden)

    Clémence eBayard

    2014-05-01

    Full Text Available Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967. Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/ which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/, lip-reading (when the response was /ka/, fusion (when the response was /ta/ and other (when the response was something other than /pa/, /ka/ or /ta/. Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N=8, hearing-individuals who were experts in CS (N = 14 and hearing-individuals who were completely naïve of CS (N = 15. Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf

  6. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

    Science.gov (United States)

    Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV) integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual) the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/) which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/), lip-reading (when the response was /ka/), fusion (when the response was /ta/) and other (when the response was something other than /pa/, /ka/ or /ta/). Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N = 8), hearing-individuals who were experts in CS (N = 14) and hearing-individuals who were completely naïve of CS (N = 15). Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf people.

  7. HIFI-AV: An Audio-visual Corpus for Spoken Language Human-Machine Dialogue Research in Spanish

    OpenAIRE

    Fernández Martínez, Fernando; Lucas Cuesta, Juan Manuel; Barra Chicote, Roberto; Ferreiros López, Javier; Macías Guarasa, Javier

    2010-01-01

    In this paper, we describe a new multi-purpose audio-visual database on the context of speech interfaces for controlling household electronic devices. The database comprises speech and video recordings of 19 speakers interacting with a HIFI audio box by means of a spoken dialogue system. Dialogue management is based on Bayesian Networks and the system is provided with contextual information handling strategies. Each speaker was requested to fulfil different sets of specific goals following pred...

  8. Exploring the role of low level visual processing in letter-speech sound integration: a visual MMN study

    Directory of Open Access Journals (Sweden)

    Dries Froyen

    2010-04-01

    Full Text Available In contrast with for example audiovisual speech, the relation between visual and auditory properties of letters and speech sounds is artificial and learned only by explicit instruction. The arbitrariness of the audiovisual link together with the widespread usage of letter – speech sound pairs in alphabetic languages makes those audiovisual objects a unique subject for cross-modal research. Brain imaging evidence has indicated that heteromodal areas in superior temporal, as well as modality-specific auditory cortex are involved in letter – speech sound processing. The role of low level visual areas, however, remains unclear. In this study the visual counterpart of the auditory mismatch negativity (MMN is used to investigate the influences of speech sounds on letter processing. Letter and non-letter deviants were infrequently presented in a train of standard letters, either in isolation or simultaneously with speech sounds. Although previous findings showed that letters systematically modulate speech sound processing (reflected by auditory MMN amplitude modulation, the reverse does not seem to hold: our results did not show evidence for an automatic influence of speech sounds on letter processing (no visual MMN amplitude modulation. This apparent asymmetric recruitment of low level sensory cortices during letter - speech sound processing, contrasts with the symmetric involvement of these cortices in audiovisual speech processing, and is possibly due to the arbitrary nature of the link between letters and speech sounds.

  9. Exploring the Role of Low Level Visual Processing in Letter–Speech Sound Integration: A Visual MMN Study

    Science.gov (United States)

    Froyen, Dries; van Atteveldt, Nienke; Blomert, Leo

    2009-01-01

    In contrast with for example audiovisual speech, the relation between visual and auditory properties of letters and speech sounds is artificial and learned only by explicit instruction. The arbitrariness of the audiovisual link together with the widespread usage of letter–speech sound pairs in alphabetic languages makes those audiovisual objects a unique subject for crossmodal research. Brain imaging evidence has indicated that heteromodal areas in superior temporal, as well as modality-specific auditory cortex are involved in letter–speech sound processing. The role of low level visual areas, however, remains unclear. In this study the visual counterpart of the auditory mismatch negativity (MMN) is used to investigate the influences of speech sounds on letter processing. Letter and non-letter deviants were infrequently presented in a train of standard letters, either in isolation or simultaneously with speech sounds. Although previous findings showed that letters systematically modulate speech sound processing (reflected by auditory MMN amplitude modulation), the reverse does not seem to hold: our results did not show evidence for an automatic influence of speech sounds on letter processing (no visual MMN amplitude modulation). This apparent asymmetric recruitment of low level sensory cortices during letter–speech sound processing, contrasts with the symmetric involvement of these cortices in audiovisual speech processing, and is possibly due to the arbitrary nature of the link between letters and speech sounds. PMID:20428501

  10. Exploring the Role of Low Level Visual Processing in Letter-Speech Sound Integration: A Visual MMN Study.

    Science.gov (United States)

    Froyen, Dries; van Atteveldt, Nienke; Blomert, Leo

    2010-01-01

    In contrast with for example audiovisual speech, the relation between visual and auditory properties of letters and speech sounds is artificial and learned only by explicit instruction. The arbitrariness of the audiovisual link together with the widespread usage of letter-speech sound pairs in alphabetic languages makes those audiovisual objects a unique subject for crossmodal research. Brain imaging evidence has indicated that heteromodal areas in superior temporal, as well as modality-specific auditory cortex are involved in letter-speech sound processing. The role of low level visual areas, however, remains unclear. In this study the visual counterpart of the auditory mismatch negativity (MMN) is used to investigate the influences of speech sounds on letter processing. Letter and non-letter deviants were infrequently presented in a train of standard letters, either in isolation or simultaneously with speech sounds. Although previous findings showed that letters systematically modulate speech sound processing (reflected by auditory MMN amplitude modulation), the reverse does not seem to hold: our results did not show evidence for an automatic influence of speech sounds on letter processing (no visual MMN amplitude modulation). This apparent asymmetric recruitment of low level sensory cortices during letter-speech sound processing, contrasts with the symmetric involvement of these cortices in audiovisual speech processing, and is possibly due to the arbitrary nature of the link between letters and speech sounds.

  11. THE IMPROVEMENT OF AUDIO-VISUAL BASED DANCE APPRECIATION LEARNING AMONG PRIMARY TEACHER EDUCATION STUDENTS OF MAKASSAR STATE UNIVERSITY

    OpenAIRE

    Wahira

    2014-01-01

    This research aimed to improve the skill in appreciating dances owned by the students of Primary Teacher Education of Makassar State University, to improve the perception towards audio-visual based art appreciation, to increase the students’ interest in audio-visual based art education subject, and to increase the students’ responses to the subject. This research was classroom action research using the research design created by Kemmis & MC. Taggart, which was conducted to 42 students of Prim...

  12. Lateralized speech perception in normal-hearing and hearing-impaired listeners and its relationship to temporal processing

    DEFF Research Database (Denmark)

    Locsei, Gusztav; Pedersen, Julie Hefting; Laugesen, Søren;

    2016-01-01

    HI listeners, group differences in binaural benefit due to spatial separation of the maskers from the target remained small. Neither the FDT nor the IPDT tasks showed a clear correlation pattern with the SRTs or with the amount of binaural benefit, respectively. The results suggest that, although HI...... listeners with normal hearing in the low-frequency range might have elevated SRTs, the binaural benefit they experience due to spatial separation of competing sources can remain similar to that of NH listeners.......This study investigated the role of temporal fine structure (TFS) coding in spatially complex, lateralized listening tasks. Speech reception thresholds (SRTs) were measured in young normal-hearing (NH) and two groups of elderly hearing-impaired (HI) listeners in the presence of speech-shaped noise...

  13. Preventive Maintenance Handbook. Audiovisual Equipment.

    Science.gov (United States)

    Educational Products Information Exchange Inst., Stony Brook, NY.

    The preventive maintenance system for audiovisual equipment presented in this handbook is designed by specialists so that it can be used by nonspecialists in school sites. The report offers specific advice on saftey factors and also lists major problems that should not be handled by nonspecialists. Other aspects of a preventive maintenance system…

  14. Search in audiovisual broadcast archives

    NARCIS (Netherlands)

    B. Huurnink

    2010-01-01

    Documentary makers, journalists, news editors, and other media professionals routinely require previously recorded audiovisual material for new productions. For example, a news editor might wish to reuse footage from overseas services for the evening news, or a documentary maker describing the histo

  15. Sistema audiovisual para reconocimiento de comandos Audiovisual system for recognition of commands

    Directory of Open Access Journals (Sweden)

    Alexander Ceballos

    2011-08-01

    Full Text Available Se presenta el desarrollo de un sistema automático de reconocimiento audiovisual del habla enfocado en el reconocimiento de comandos. La representación del audio se realizó mediante los coeficientes cepstrales de Mel y las primeras dos derivadas temporales. Para la caracterización del vídeo se hizo seguimiento automático de características visuales de alto nivel a través de toda la secuencia. Para la inicialización automática del algoritmo se emplearon transformaciones de color y contornos activos con información de flujo del vector gradiente ("GVF snakes" sobre la región labial, mientras que para el seguimiento se usaron medidas de similitud entre vecindarios y restricciones morfológicas definidas en el estándar MPEG-4. Inicialmente, se presenta el diseño del sistema de reconocimiento automático del habla, empleando únicamente información de audio (ASR, mediante Modelos Ocultos de Markov (HMMs y un enfoque de palabra aislada; posteriormente, se muestra el diseño de los sistemas empleando únicamente características de vídeo (VSR, y empleando características de audio y vídeo combinadas (AVSR. Al final se comparan los resultados de los tres sistemas para una base de datos propia en español y francés, y se muestra la influencia del ruido acústico, mostrando que el sistema de AVSR es más robusto que ASR y VSR.We present the development of an automatic audiovisual speech recognition system focused on the recognition of commands. Signal audio representation was done using Mel cepstral coefficients and their first and second order time derivatives. In order to characterize the video signal, a set of high-level visual features was tracked throughout the sequences. Automatic initialization of the algorithm was performed using color transformations and active contour models based on Gradient Vector Flow (GVF Snakes on the lip region, whereas visual tracking used similarity measures across neighborhoods and morphological

  16. Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness

    OpenAIRE

    Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas

    2015-01-01

    Objectives: This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Design: Eight NH volunteers participated in the study and listened to sentences embedded in backgrou...

  17. Multiple concurrent temporal recalibrations driven by audiovisual stimuli with apparent physical differences.

    Science.gov (United States)

    Yuan, Xiangyong; Bi, Cuihua; Huang, Xiting

    2015-05-01

    Out-of-synchrony experiences can easily recalibrate one's subjective simultaneity point in the direction of the experienced asynchrony. Although temporal adjustment of multiple audiovisual stimuli has been recently demonstrated to be spatially specific, perceptual grouping processes that organize separate audiovisual stimuli into distinctive "objects" may play a more important role in forming the basis for subsequent multiple temporal recalibrations. We investigated whether apparent physical differences between audiovisual pairs that make them distinct from each other can independently drive multiple concurrent temporal recalibrations regardless of spatial overlap. Experiment 1 verified that reducing the physical difference between two audiovisual pairs diminishes the multiple temporal recalibrations by exposing observers to two utterances with opposing temporal relationships spoken by one single speaker rather than two distinct speakers at the same location. Experiment 2 found that increasing the physical difference between two stimuli pairs can promote multiple temporal recalibrations by complicating their non-temporal dimensions (e.g., disks composed of two rather than one attribute and tones generated by multiplying two frequencies); however, these recalibration aftereffects were subtle. Experiment 3 further revealed that making the two audiovisual pairs differ in temporal structures (one transient and one gradual) was sufficient to drive concurrent temporal recalibration. These results confirm that the more audiovisual pairs physically differ, especially in temporal profile, the more likely multiple temporal perception adjustments will be content-constrained regardless of spatial overlap. These results indicate that multiple temporal recalibrations are based secondarily on the outcome of perceptual grouping processes.

  18. Music and speech prosody: a common rhythm.

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).

  19. Music and speech prosody: A common rhythm

    Directory of Open Access Journals (Sweden)

    Maija eHausen

    2013-09-01

    Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.

  20. Motor Equivalence in Speech Production

    OpenAIRE

    Perrier, Pascal; Fuchs, Susanne

    2015-01-01

    International audience The first section provides a description of the concepts of “motor equivalence” and “degrees of freedom”. It is illustrated with a few examples of motor tasks in general and of speech production tasks in particular. In the second section, the methodology used to investigate experimentally motor equivalence phenomena in speech production is presented. It is mainly based on paradigms that perturb the perception-action loop during on-going speech, either by limiting the...