integrating audiovisual speech: Topics by WorldWideScience.org

Sample records for integrating audiovisual speech

Speech-specificity of two audiovisual integration effects

DEFF Research Database (Denmark)

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

2010-01-01

Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....
Electrophysiological evidence for speech-specific audiovisual integration.

Science.gov (United States)

Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

2014-01-01

Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.
Speech cues contribute to audiovisual spatial integration.

Directory of Open Access Journals (Sweden)

Christopher W Bishop

Full Text Available Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.
Multistage audiovisual integration of speech: dissociating identification and detection.

Science.gov (United States)

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias S

2011-02-01

Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech signal. Here, we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers, the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multistage account of audiovisual integration of speech in which the many attributes of the audiovisual speech signal are integrated by separate integration processes.
Electrophysiological assessment of audiovisual integration in speech perception

DEFF Research Database (Denmark)

Eskelund, Kasper; Dau, Torsten

Speech perception integrates signal from ear and eye. This is witnessed by a wide range of audiovisual integration effects, such as ventriloquism and the McGurk illusion. Some behavioral evidence suggest that audiovisual integration of specific aspects is special for speech perception. However, our...... knowledge of such bimodal integration would be strengthened if the phenomena could be investigated by objective, neutrally based methods. One key question of the present work is if perceptual processing of audiovisual speech can be gauged with a specific signature of neurophysiological activity...... on the auditory speech percept? In two experiments, which both combine behavioral and neurophysiological measures, an uncovering of the relation between perception of faces and of audiovisual integration is attempted. Behavioral findings suggest a strong effect of face perception, whereas the MMN results are less...
Multistage audiovisual integration of speech: dissociating identification and detection

DEFF Research Database (Denmark)

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

2011-01-01

Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech...... signal. Here we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers...... informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multi-stage account of audiovisual integration of speech in which the many attributes...
Audiovisual integration in speech perception: a multi-stage process

DEFF Research Database (Denmark)

Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

2011-01-01

investigate whether the integration of auditory and visual speech observed in these two audiovisual integration effects are specific traits of speech perception. We further ask whether audiovisual integration is undertaken in a single processing stage or multiple processing stages....
Electrophysiological evidence for speech-specific audiovisual integration

NARCIS (Netherlands)

Baart, M.; Stekelenburg, J.J.; Vroomen, J.

2014-01-01

Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were
Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

Directory of Open Access Journals (Sweden)

Hwee Ling eLee

2014-08-01

Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.
Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

DEFF Research Database (Denmark)

Andersen, Tobias; Starrfelt, Randi

2015-01-01

's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical......, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing...
Audiovisual integration of speech in a patient with Broca's Aphasia

Science.gov (United States)

Andersen, Tobias S.; Starrfelt, Randi

2015-01-01

Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca's aphasia. PMID:25972819
Audiovisual integration of speech falters under high attention demands.

Science.gov (United States)

Alsius, Agnès; Navarra, Jordi; Campbell, Ruth; Soto-Faraco, Salvador

2005-05-10

One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands.
Neural dynamics of audiovisual speech integration under variable listening conditions: an individual participant analysis.

Science.gov (United States)

Altieri, Nicholas; Wenger, Michael J

2013-01-01

Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend and Nozawa, 1995), a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of -12 dB, and S/N ratio of -18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude compared to the unisensory signals for lower auditory S/N ratios (higher capacity/efficiency) compared to the high S/N ratio (low capacity/inefficient integration). The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity.
Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

Directory of Open Access Journals (Sweden)

Tobias Søren Andersen

2015-04-01

Full Text Available Lesions to Broca’s area cause aphasia characterised by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca’s area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca’s area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca’s aphasia did not experience the McGurk illusion suggesting that an intact Broca’s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca’s aphasia who experienced the McGurk illusion. This indicates that an intact Broca’s area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca’s area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke’s aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca’s aphasia.
Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

DEFF Research Database (Denmark)

Andersen, Tobias; Starrfelt, Randi

2015-01-01

perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca......Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech......'s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical...
Electrophysiological evidence for a self-processing advantage during audiovisual speech integration.

Science.gov (United States)

Treille, Avril; Vilain, Coriandre; Kandel, Sonia; Sato, Marc

2017-09-01

Previous electrophysiological studies have provided strong evidence for early multisensory integrative mechanisms during audiovisual speech perception. From these studies, one unanswered issue is whether hearing our own voice and seeing our own articulatory gestures facilitate speech perception, possibly through a better processing and integration of sensory inputs with our own sensory-motor knowledge. The present EEG study examined the impact of self-knowledge during the perception of auditory (A), visual (V) and audiovisual (AV) speech stimuli that were previously recorded from the participant or from a speaker he/she had never met. Audiovisual interactions were estimated by comparing N1 and P2 auditory evoked potentials during the bimodal condition (AV) with the sum of those observed in the unimodal conditions (A + V). In line with previous EEG studies, our results revealed an amplitude decrease of P2 auditory evoked potentials in AV compared to A + V conditions. Crucially, a temporal facilitation of N1 responses was observed during the visual perception of self speech movements compared to those of another speaker. This facilitation was negatively correlated with the saliency of visual stimuli. These results provide evidence for a temporal facilitation of the integration of auditory and visual speech signals when the visual situation involves our own speech gestures.
Modeling the Development of Audiovisual Cue Integration in Speech Perception.

Science.gov (United States)

Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

2017-03-21

Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
Speech-specific audiovisual perception affects identification but not detection of speech

DEFF Research Database (Denmark)

Eskelund, Kasper; Andersen, Tobias

Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...
Specialization in audiovisual speech perception: a replication study

DEFF Research Database (Denmark)

Eskelund, Kasper; Andersen, Tobias

Speech perception is audiovisual as evidenced by bimodal integration in the McGurk effect. This integration effect may be specific to speech or be applied to all stimuli in general. To investigate this, Tuomainen et al. (2005) used sine-wave speech, which naÃ¯ve observers may perceive as non......-speech, but hear as speech once informed of the linguistic origin of the signal. Combinations of sine-wave speech and incongruent video of the talker elicited a McGurk effect only for informed observers. This indicates that the audiovisual integration effect is specific to speech perception. However, observers...... that observers did look near the mouth. We conclude that eye-movements did not influence the results of Tuomainen et al. and that their results thus can be taken as evidence of a speech specific mode of audiovisual integration underlying the McGurk illusion....
The role of visual spatial attention in audiovisual speech perception

DEFF Research Database (Denmark)

Andersen, Tobias; Tiippana, K.; Laarni, J.

2009-01-01

Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive b......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre...... from each of the faces and from the voice on the auditory speech percept. We found that directing visual spatial attention towards a face increased the influence of that face on auditory perception. However, the influence of the voice on auditory perception did not change suggesting that audiovisual...... integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration...

Audiovisual speech integration in the superior temporal region is dysfunctional in dyslexia.

Science.gov (United States)

Ye, Zheng; Rüsseler, Jascha; Gerth, Ivonne; Münte, Thomas F

2017-07-25

Dyslexia is an impairment of reading and spelling that affects both children and adults even after many years of schooling. Dyslexic readers have deficits in the integration of auditory and visual inputs but the neural mechanisms of the deficits are still unclear. This fMRI study examined the neural processing of auditorily presented German numbers 0-9 and videos of lip movements of a German native speaker voicing numbers 0-9 in unimodal (auditory or visual) and bimodal (always congruent) conditions in dyslexic readers and their matched fluent readers. We confirmed results of previous studies that the superior temporal gyrus/sulcus plays a critical role in audiovisual speech integration: fluent readers showed greater superior temporal activations for combined audiovisual stimuli than auditory-/visual-only stimuli. Importantly, such an enhancement effect was absent in dyslexic readers. Moreover, the auditory network (bilateral superior temporal regions plus medial PFC) was dynamically modulated during audiovisual integration in fluent, but not in dyslexic readers. These results suggest that superior temporal dysfunction may underly poor audiovisual speech integration in readers with dyslexia. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.
Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

Science.gov (United States)

Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

2015-05-01

The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.
Audiovisual Asynchrony Detection in Human Speech

Science.gov (United States)

Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

2011-01-01

Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…
Influences of selective adaptation on perception of audiovisual speech

Science.gov (United States)

Dias, James W.; Cook, Theresa C.; Rosenblum, Lawrence D.

2016-01-01

Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers “heard” the audio-visual stimulus as an integrated “va” percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual “va” percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. PMID:27041781
The level of audiovisual print-speech integration deficits in dyslexia.

Science.gov (United States)

Kronschnabel, Jens; Brem, Silvia; Maurer, Urs; Brandeis, Daniel

2014-09-01

The classical phonological deficit account of dyslexia is increasingly linked to impairments in grapho-phonological conversion, and to dysfunctions in superior temporal regions associated with audiovisual integration. The present study investigates mechanisms of audiovisual integration in typical and impaired readers at the critical developmental stage of adolescence. Congruent and incongruent audiovisual as well as unimodal (visual only and auditory only) material was presented. Audiovisual presentations were single letters and three-letter (consonant-vowel-consonant) stimuli accompanied by matching or mismatching speech sounds. Three-letter stimuli exhibited fast phonetic transitions as in real-life language processing and reading. Congruency effects, i.e. different brain responses to congruent and incongruent stimuli were taken as an indicator of audiovisual integration at a phonetic level (grapho-phonological conversion). Comparisons of unimodal and audiovisual stimuli revealed basic, more sensory aspects of audiovisual integration. By means of these two criteria of audiovisual integration, the generalizability of audiovisual deficits in dyslexia was tested. Moreover, it was expected that the more naturalistic three-letter stimuli are superior to single letters in revealing group differences. Electrophysiological and hemodynamic (EEG and fMRI) data were acquired simultaneously in a simple target detection task. Applying the same statistical models to event-related EEG potentials and fMRI responses allowed comparing the effects detected by the two techniques at a descriptive level. Group differences in congruency effects (congruent against incongruent) were observed in regions involved in grapho-phonological processing, including the left inferior frontal and angular gyri and the inferotemporal cortex. Importantly, such differences also emerged in superior temporal key regions. Three-letter stimuli revealed stronger group differences than single letters. No
The early maximum likelihood estimation model of audiovisual integration in speech perception

DEFF Research Database (Denmark)

Andersen, Tobias

2015-01-01

integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross......Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk−MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely......-validation can evaluate models of audiovisual integration based on typical data sets taking both goodness-of-fit and model flexibility into account. All models were tested on a published data set previously used for testing the FLMP. Cross-validation favored the early MLE while more conventional error measures...
Audiovisual Discrimination between Laughter and Speech

NARCIS (Netherlands)

Petridis, Stavros; Pantic, Maja

Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audiovisual approach to distinguishing laughter from speech and we show that integrating the information from audio and video leads to an improved reliability of audiovisual approach in
Audiovisual integration for speech during mid-childhood: Electrophysiological evidence

Science.gov (United States)

Kaganovich, Natalya; Schumaker, Jennifer

2014-01-01

Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7–8-year-olds and 10–11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. PMID:25463815
Atypical audiovisual speech integration in infants at risk for autism.

Directory of Open Access Journals (Sweden)

Jeanne A Guiraud

Full Text Available The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/- audio/ba/and the congruent visual/ba/- audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/- audio/ga/display compared with the congruent visual/ga/- audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16 = 17.153, p = 0.001. The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25 = 0.09, p = 0.767, in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41 = 4.466, p = 0.041. In some cases this reduced ability might lead to the poor communication skills characteristic of autism.
Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

Science.gov (United States)

Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

2018-06-01

Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.
Prediction and constraint in audiovisual speech perception

Science.gov (United States)

Peelle, Jonathan E.; Sommers, Mitchell S.

2015-01-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported
Prediction and constraint in audiovisual speech perception.

Science.gov (United States)

Peelle, Jonathan E; Sommers, Mitchell S

2015-07-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
Audiovisual Temporal Recalibration for Speech in Synchrony Perception and Speech Identification

Science.gov (United States)

Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato

We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.
Ordinal models of audiovisual speech perception

DEFF Research Database (Denmark)

Andersen, Tobias

2011-01-01

Audiovisual information is integrated in speech perception. One manifestation of this is the McGurk illusion in which watching the articulating face alters the auditory phonetic percept. Understanding this phenomenon fully requires a computational model with predictive power. Here, we describe...
An ALE meta-analysis on the audiovisual integration of speech signals.

Science.gov (United States)

Erickson, Laura C; Heeg, Elizabeth; Rauschecker, Josef P; Turkeltaub, Peter E

2014-11-01

The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals. Copyright © 2014 Wiley Periodicals, Inc.
Audiovisual integration in children listening to spectrally degraded speech.

Science.gov (United States)

Maidment, David W; Kang, Hi Jee; Stewart, Hannah J; Amitay, Sygal

2015-02-01

The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Children (n=69) and adults (n=15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in auditory-only or audiovisual conditions. The number of bands was adaptively varied to modulate the degradation of the auditory signal, with the number of bands required for approximately 79% correct identification calculated as the threshold. The youngest children (4- to 5-year-olds) did not benefit from accompanying visual information, in comparison to 6- to 11-year-old children and adults. Audiovisual gain also increased with age in the child sample. The current data suggest that children younger than 6 years of age do not fully utilize visual speech cues to enhance speech perception when the auditory signal is degraded. This evidence not only has implications for understanding the development of speech perception skills in children with normal hearing but may also inform the development of new treatment and intervention strategies that aim to remediate speech perception difficulties in pediatric cochlear implant users.
Some Behavioral and Neurobiological Constraints on Theories of Audiovisual Speech Integration: A Review and Suggestions for New Directions

Science.gov (United States)

Altieri, Nicholas; Pisoni, David B.; Townsend, James T.

2012-01-01

Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield’s feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration. PMID:21968081
[Intermodal timing cues for audio-visual speech recognition].

Science.gov (United States)

Hashimoto, Masahiro; Kumashiro, Masaharu

2004-06-01

The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.
Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception

DEFF Research Database (Denmark)

Baart, Martijn; Lindborg, Alma Cornelia; Andersen, Tobias S

2017-01-01

Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech-induced suppression of P2 amplitude (which is generally taken as a measure...... of audiovisual integration) for fusions was comparable to suppression obtained with fully congruent stimuli, whereas P2 suppression for combinations was larger. We argue that these effects arise because the phonetic incongruency is solved differently for both types of stimuli. This article is protected...
A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

Directory of Open Access Journals (Sweden)

John F Magnotti

2017-02-01

Full Text Available Audiovisual speech integration combines information from auditory speech (talker's voice and visual speech (talker's mouth movements to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga, that are integrated to produce a fused percept ("da". This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba. We describe a simplified model of causal inference in multisensory speech perception (CIMS that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.

Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception.

Science.gov (United States)

Baart, Martijn; Lindborg, Alma; Andersen, Tobias S

2017-11-01

Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech-induced suppression of P2 amplitude (which is generally taken as a measure of audiovisual integration) for fusions was similar to suppression obtained with fully congruent stimuli, whereas P2 suppression for combinations was larger. We argue that these effects arise because the phonetic incongruency is solved differently for both types of stimuli. © 2017 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
The effect of combined sensory and semantic components on audio-visual speech perception in older adults

Directory of Open Access Journals (Sweden)

Corrina eMaguinness

2011-12-01

Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.
Audiovisual Integration in High Functioning Adults with Autism

Science.gov (United States)

Keane, Brian P.; Rosenthal, Orna; Chun, Nicole H.; Shams, Ladan

2010-01-01

Autism involves various perceptual benefits and deficits, but it is unclear if the disorder also involves anomalous audiovisual integration. To address this issue, we compared the performance of high-functioning adults with autism and matched controls on experiments investigating the audiovisual integration of speech, spatiotemporal relations, and…
Lip movements affect infants' audiovisual speech perception.

Science.gov (United States)

Yeung, H Henny; Werker, Janet F

2013-05-01

Speech is robustly audiovisual from early in infancy. Here we show that audiovisual speech perception in 4.5-month-old infants is influenced by sensorimotor information related to the lip movements they make while chewing or sucking. Experiment 1 consisted of a classic audiovisual matching procedure, in which two simultaneously displayed talking faces (visual [i] and [u]) were presented with a synchronous vowel sound (audio /i/ or /u/). Infants' looking patterns were selectively biased away from the audiovisual matching face when the infants were producing lip movements similar to those needed to produce the heard vowel. Infants' looking patterns returned to those of a baseline condition (no lip movements, looking longer at the audiovisual matching face) when they were producing lip movements that did not match the heard vowel. Experiment 2 confirmed that these sensorimotor effects interacted with the heard vowel, as looking patterns differed when infants produced these same lip movements while seeing and hearing a talking face producing an unrelated vowel (audio /a/). These findings suggest that the development of speech perception and speech production may be mutually informative.
A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography.

Science.gov (United States)

Ozker, Muge; Schepers, Inga M; Magnotti, John F; Yoshor, Daniel; Beauchamp, Michael S

2017-06-01

Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.
Neural correlates of audiovisual speech processing in a second language.

Science.gov (United States)

Barrós-Loscertales, Alfonso; Ventura-Campos, Noelia; Visser, Maya; Alsius, Agnès; Pallier, Christophe; Avila Rivera, César; Soto-Faraco, Salvador

2013-09-01

Neuroimaging studies of audiovisual speech processing have exclusively addressed listeners' native language (L1). Yet, several behavioural studies now show that AV processing plays an important role in non-native (L2) speech perception. The current fMRI study measured brain activity during auditory, visual, audiovisual congruent and audiovisual incongruent utterances in L1 and L2. BOLD responses to congruent AV speech in the pSTS were stronger than in either unimodal condition in both L1 and L2. Yet no differences in AV processing were expressed according to the language background in this area. Instead, the regions in the bilateral occipital lobe had a stronger congruency effect on the BOLD response (congruent higher than incongruent) in L2 as compared to L1. According to these results, language background differences are predominantly expressed in these unimodal regions, whereas the pSTS is similarly involved in AV integration regardless of language dominance. Copyright © 2013 Elsevier Inc. All rights reserved.
Perception of audio-visual speech synchrony in Spanish-speaking children with and without specific language impairment.

Science.gov (United States)

Pons, Ferran; Andreu, Llorenç; Sanz-Torrent, Monica; Buil-Legaz, Lucía; Lewkowicz, David J

2013-06-01

Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666 ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component preceded [corrected] the visual component. None of the groups perceived an audiovisual asynchrony of 366 ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception.
Audiovisual Speech Synchrony Measure: Application to Biometrics

Directory of Open Access Journals (Sweden)

Gérard Chollet

2007-01-01

Full Text Available Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, transformations performed on audio, visual, or joint audiovisual feature spaces, and the actual measure of correspondence between audio and visual speech. Finally, the use of synchrony measure for biometric identity verification based on talking faces is experimented on the BANCA database.
ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

Directory of Open Access Journals (Sweden)

D.V. Ivanko

2016-05-01

Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.
Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study.

Science.gov (United States)

Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

2016-01-01

Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our
Degradation of labial information modifies audiovisual speech perception in cochlear-implanted children.

Science.gov (United States)

Huyse, Aurélie; Berthommier, Frédéric; Leybaert, Jacqueline

2013-01-01

The aim of the present study was to examine audiovisual speech integration in cochlear-implanted children and in normally hearing children exposed to degraded auditory stimuli. Previous studies have shown that speech perception in cochlear-implanted users is biased toward the visual modality when audition and vision provide conflicting information. Our main question was whether an experimentally designed degradation of the visual speech cue would increase the importance of audition in the response pattern. The impact of auditory proficiency was also investigated. A group of 31 children with cochlear implants and a group of 31 normally hearing children matched for chronological age were recruited. All children with cochlear implants had profound congenital deafness and had used their implants for at least 2 years. Participants had to perform an /aCa/ consonant-identification task in which stimuli were presented randomly in three conditions: auditory only, visual only, and audiovisual (congruent and incongruent McGurk stimuli). In half of the experiment, the visual speech cue was normal; in the other half (visual reduction) a degraded visual signal was presented, aimed at preventing lipreading of good quality. The normally hearing children received a spectrally reduced speech signal (simulating the input delivered by the cochlear implant). First, performance in visual-only and in congruent audiovisual modalities were decreased, showing that the visual reduction technique used here was efficient at degrading lipreading. Second, in the incongruent audiovisual trials, visual reduction led to a major increase in the number of auditory based responses in both groups. Differences between proficient and nonproficient children were found in both groups, with nonproficient children's responses being more visual and less auditory than those of proficient children. Further analysis revealed that differences between visually clear and visually reduced conditions and between
The contribution of dynamic visual cues to audiovisual speech perception.

Science.gov (United States)

Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

2015-08-01

Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.
Perception of Intersensory Synchrony in Audiovisual Speech: Not that Special

Science.gov (United States)

Vroomen, Jean; Stekelenburg, Jeroen J.

2011-01-01

Perception of intersensory temporal order is particularly difficult for (continuous) audiovisual speech, as perceivers may find it difficult to notice substantial timing differences between speech sounds and lip movements. Here we tested whether this occurs because audiovisual speech is strongly paired ("unity assumption"). Participants made…
Neural initialization of audiovisual integration in prereaders at varying risk for developmental dyslexia.

Science.gov (United States)

I Karipidis, Iliana; Pleisch, Georgette; Röthlisberger, Martina; Hofstetter, Christoph; Dornbierer, Dario; Stämpfli, Philipp; Brem, Silvia

2017-02-01

Learning letter-speech sound correspondences is a major step in reading acquisition and is severely impaired in children with dyslexia. Up to now, it remains largely unknown how quickly neural networks adopt specific functions during audiovisual integration of linguistic information when prereading children learn letter-speech sound correspondences. Here, we simulated the process of learning letter-speech sound correspondences in 20 prereading children (6.13-7.17 years) at varying risk for dyslexia by training artificial letter-speech sound correspondences within a single experimental session. Subsequently, we acquired simultaneously event-related potentials (ERP) and functional magnetic resonance imaging (fMRI) scans during implicit audiovisual presentation of trained and untrained pairs. Audiovisual integration of trained pairs correlated with individual learning rates in right superior temporal, left inferior temporal, and bilateral parietal areas and with phonological awareness in left temporal areas. In correspondence, a differential left-lateralized parietooccipitotemporal ERP at 400 ms for trained pairs correlated with learning achievement and familial risk. Finally, a late (650 ms) posterior negativity indicating audiovisual congruency of trained pairs was associated with increased fMRI activation in the left occipital cortex. Taken together, a short (audiovisual integration in neural systems that are responsible for processing linguistic information in proficient readers. To conclude, the ability to learn grapheme-phoneme correspondences, the familial history of reading disability, and phonological awareness of prereading children account for the degree of audiovisual integration in a distributed brain network. Such findings on emerging linguistic audiovisual integration could allow for distinguishing between children with typical and atypical reading development. Hum Brain Mapp 38:1038-1055, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals
Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

Directory of Open Access Journals (Sweden)

Elena V Kushnerenko

2013-07-01

Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.
Modulations of 'late' event-related brain potentials in humans by dynamic audiovisual speech stimuli.

Science.gov (United States)

Lebib, Riadh; Papo, David; Douiri, Abdel; de Bode, Stella; Gillon Dowens, Margaret; Baudonnière, Pierre-Marie

2004-11-30

Lipreading reliably improve speech perception during face-to-face conversation. Within the range of good dubbing, however, adults tolerate some audiovisual (AV) discrepancies and lipreading, then, can give rise to confusion. We used event-related brain potentials (ERPs) to study the perceptual strategies governing the intermodal processing of dynamic and bimodal speech stimuli, either congruently dubbed or not. Electrophysiological analyses revealed that non-coherent audiovisual dubbings modulated in amplitude an endogenous ERP component, the N300, we compared to a 'N400-like effect' reflecting the difficulty to integrate these conflicting pieces of information. This result adds further support for the existence of a cerebral system underlying 'integrative processes' lato sensu. Further studies should take advantage of this 'N400-like effect' with AV speech stimuli to open new perspectives in the domain of psycholinguistics.
Rapid, generalized adaptation to asynchronous audiovisual speech.

Science.gov (United States)

Van der Burg, Erik; Goodbourn, Patrick T

2015-04-07

The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

Directory of Open Access Journals (Sweden)

Petar S. Aleksic

2002-11-01

Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0Ã¢Â€Â“30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.
The natural statistics of audiovisual speech.

Directory of Open Access Journals (Sweden)

Chandramouli Chandrasekaran

2009-07-01

Full Text Available Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2-7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver.
Brain responses to audiovisual speech mismatch in infants are associated with individual differences in looking behaviour.

Science.gov (United States)

Kushnerenko, Elena; Tomalski, Przemyslaw; Ballieux, Haiko; Ribeiro, Helena; Potton, Anita; Axelsson, Emma L; Murphy, Elizabeth; Moore, Derek G

2013-11-01

Research on audiovisual speech integration has reported high levels of individual variability, especially among young infants. In the present study we tested the hypothesis that this variability results from individual differences in the maturation of audiovisual speech processing during infancy. A developmental shift in selective attention to audiovisual speech has been demonstrated between 6 and 9 months with an increase in the time spent looking to articulating mouths as compared to eyes (Lewkowicz & Hansen-Tift. (2012) Proc. Natl Acad. Sci. USA, 109, 1431-1436; Tomalski et al. (2012) Eur. J. Dev. Psychol., 1-14). In the present study we tested whether these changes in behavioural maturational level are associated with differences in brain responses to audiovisual speech across this age range. We measured high-density event-related potentials (ERPs) in response to videos of audiovisually matching and mismatched syllables /ba/ and /ga/, and subsequently examined visual scanning of the same stimuli with eye-tracking. There were no clear age-specific changes in ERPs, but the amplitude of audiovisual mismatch response (AVMMR) to the combination of visual /ba/ and auditory /ga/ was strongly negatively associated with looking time to the mouth in the same condition. These results have significant implications for our understanding of individual differences in neural signatures of audiovisual speech processing in infants, suggesting that they are not strictly related to chronological age but instead associated with the maturation of looking behaviour, and develop at individual rates in the second half of the first year of life. © 2013 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

Effect of attentional load on audiovisual speech perception: Evidence from ERPs

Directory of Open Access Journals (Sweden)

Agnès eAlsius

2014-07-01

Full Text Available Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e. a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.
Effect of attentional load on audiovisual speech perception: evidence from ERPs.

Science.gov (United States)

Alsius, Agnès; Möttönen, Riikka; Sams, Mikko E; Soto-Faraco, Salvador; Tiippana, Kaisa

2014-01-01

Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs) generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual, and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e., a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.
Causal inference of asynchronous audiovisual speech

Directory of Open Access Journals (Sweden)

John F Magnotti

2013-11-01

Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.
Cortical Integration of Audio-Visual Information

Science.gov (United States)

Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.

2013-01-01

We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442
Infants' preference for native audiovisual speech dissociated from congruency preference.

Directory of Open Access Journals (Sweden)

Kathleen Shaw

Full Text Available Although infant speech perception in often studied in isolated modalities, infants' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces. Across two experiments, we tested infants' sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English and non-native (Spanish language. In Experiment 1, infants' looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.
Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech

Science.gov (United States)

Pilling, Michael; Thomas, Sharon

2011-01-01

Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…
No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag.

Directory of Open Access Journals (Sweden)

Jean-Luc Schwartz

2014-07-01

Full Text Available An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.
Talker Variability in Audiovisual Speech Perception

Directory of Open Access Journals (Sweden)

Shannon eHeald

2014-07-01

Full Text Available A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition. So far, this talker-variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target-word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.
Audiovisual discrimination between speech and laughter: Why and when visual information might help

NARCIS (Netherlands)

Petridis, Stavros; Pantic, Maja

Past research on automatic laughter classification/detection has focused mainly on audio-based approaches. Here we present an audiovisual approach to distinguishing laughter from speech, and we show that integrating the information from audio and video channels may lead to improved performance over
The organization and reorganization of audiovisual speech perception in the first year of life.

Science.gov (United States)

Danielson, D Kyle; Bruderer, Alison G; Kandhadai, Padmapriya; Vatikiotis-Bateson, Eric; Werker, Janet F

2017-04-01

The period between six and 12 months is a sensitive period for language learning during which infants undergo auditory perceptual attunement, and recent results indicate that this sensitive period may exist across sensory modalities. We tested infants at three stages of perceptual attunement (six, nine, and 11 months) to determine 1) whether they were sensitive to the congruence between heard and seen speech stimuli in an unfamiliar language, and 2) whether familiarization with congruent audiovisual speech could boost subsequent non-native auditory discrimination. Infants at six- and nine-, but not 11-months, detected audiovisual congruence of non-native syllables. Familiarization to incongruent, but not congruent, audiovisual speech changed auditory discrimination at test for six-month-olds but not nine- or 11-month-olds. These results advance the proposal that speech perception is audiovisual from early in ontogeny, and that the sensitive period for audiovisual speech perception may last somewhat longer than that for auditory perception alone.
Language/Culture Modulates Brain and Gaze Processes in Audiovisual Speech Perception.

Science.gov (United States)

Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki

2016-10-13

Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs' response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs' early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.
Audiovisual Speech Perception and Eye Gaze Behavior of Adults with Asperger Syndrome

Science.gov (United States)

Saalasti, Satu; Katsyri, Jari; Tiippana, Kaisa; Laine-Hernandez, Mari; von Wendt, Lennart; Sams, Mikko

2012-01-01

Audiovisual speech perception was studied in adults with Asperger syndrome (AS), by utilizing the McGurk effect, in which conflicting visual articulation alters the perception of heard speech. The AS group perceived the audiovisual stimuli differently from age, sex and IQ matched controls. When a voice saying /p/ was presented with a face…
Atypical audio-visual speech perception and McGurk effects in children with specific language impairment.

Science.gov (United States)

Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric

2014-01-01

Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed.
Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus.

Science.gov (United States)

Venezia, Jonathan H; Vaden, Kenneth I; Rong, Feng; Maddox, Dale; Saberi, Kourosh; Hickok, Gregory

2017-01-01

The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.
Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech.

Science.gov (United States)

Shahin, Antoine J; Shen, Stanley; Kerlin, Jess R

2017-01-01

We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync . Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

Science.gov (United States)

Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

2015-01-01

Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.
Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals

Science.gov (United States)

Lidestam, Björn; Rönnberg, Jerker

2016-01-01

The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. PMID:27317667
Visual-Auditory Integration during Speech Imitation in Autism

Science.gov (United States)

Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

2004-01-01

Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…
Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

Directory of Open Access Journals (Sweden)

Yue Zhao

2012-12-01

Full Text Available Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.
Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

Science.gov (United States)

Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

2016-06-17

The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. © The Author(s) 2016.

Robust audio-visual speech recognition under noisy audio-video conditions.

Science.gov (United States)

Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji

2014-02-01

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Timing in audiovisual speech perception: A mini review and new psychophysical data.

Science.gov (United States)

Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory

2016-02-01

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.
Timing in Audiovisual Speech Perception: A Mini Review and New Psychophysical Data

Science.gov (United States)

Venezia, Jonathan H.; Thurman, Steven M.; Matchin, William; George, Sahara E.; Hickok, Gregory

2015-01-01

Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually-relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (∼35% identification of /apa/ compared to ∼5% in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually-relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (∼130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content. PMID:26669309
Self-organizing maps for measuring similarity of audiovisual speech percepts

DEFF Research Database (Denmark)

Bothe, Hans-Heinrich

The goal of this work is to find a way to measure similarity of audiovisual speech percepts. Phoneme-related self-organizing maps (SOM) with a rectangular basis are trained with data material from a (labeled) video film. For the training, a combination of auditory speech features and corresponding....... Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled...... sentences in German with a balanced phoneme repertoire. As a result it can be stated that (i) the SOM can be trained to map auditory and visual features in a topology-preserving way and (ii) they show strain due to the influence of other audio-visual units. The SOM can be used to measure similarity amongst...
Text-to-audiovisual speech synthesizer for children with learning disabilities.

Science.gov (United States)

Mendi, Engin; Bayrak, Coskun

2013-01-01

Learning disabilities affect the ability of children to learn, despite their having normal intelligence. Assistive tools can highly increase functional capabilities of children with learning disorders such as writing, reading, or listening. In this article, we describe a text-to-audiovisual synthesizer that can serve as an assistive tool for such children. The system automatically converts an input text to audiovisual speech, providing synchronization of the head, eye, and lip movements of the three-dimensional face model with appropriate facial expressions and word flow of the text. The proposed system can enhance speech perception and help children having learning deficits to improve their chances of success.
Audiovisual speech facilitates voice learning.

Science.gov (United States)

Sheffert, Sonya M; Olson, Elizabeth

2004-02-01

In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.
Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception

Science.gov (United States)

Wilson, Amanda H.; Alsius, Agnès; Parè, Martin; Munhall, Kevin G.

2016-01-01

Purpose: The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method: We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent…
On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

Directory of Open Access Journals (Sweden)

Wesley Mattheyses

2009-01-01

Full Text Available Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.
Audiovisual speech perception development at varying levels of perceptual processing

OpenAIRE

Lalonde, Kaylah; Holt, Rachael Frush

2016-01-01

This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the le...
Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition.

Science.gov (United States)

Stevenson, Ryan A; Nelms, Caitlin E; Baum, Sarah H; Zurkovsky, Lilia; Barense, Morgan D; Newhouse, Paul A; Wallace, Mark T

2015-01-01

Over the next 2 decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio. For whole-word recognition, older adults relative to younger adults showed greater multisensory gains at intermediate SNRs but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as signal-to-noise ratio decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy-aging populations and that deficits begin to emerge only at the more complex word-recognition level of speech signals. Copyright © 2015 Elsevier Inc. All rights reserved.
Effects of Audio-Visual Information on the Intelligibility of Alaryngeal Speech

Science.gov (United States)

Evitts, Paul M.; Portugal, Lindsay; Van Dine, Ami; Holler, Aline

2010-01-01

Background: There is minimal research on the contribution of visual information on speech intelligibility for individuals with a laryngectomy (IWL). Aims: The purpose of this project was to determine the effects of mode of presentation (audio-only, audio-visual) on alaryngeal speech intelligibility. Method: Twenty-three naive listeners were…
Validating a Method to Assess Lipreading, Audiovisual Gain, and Integration During Speech Reception With Cochlear-Implanted and Normal-Hearing Subjects Using a Talking Head.

Science.gov (United States)

Schreitmüller, Stefan; Frenken, Miriam; Bentz, Lüder; Ortmann, Magdalene; Walger, Martin; Meister, Hartmut

Watching a talker's mouth is beneficial for speech reception (SR) in many communication settings, especially in noise and when hearing is impaired. Measures for audiovisual (AV) SR can be valuable in the framework of diagnosing or treating hearing disorders. This study addresses the lack of standardized methods in many languages for assessing lipreading, AV gain, and integration. A new method is validated that supplements a German speech audiometric test with visualizations of the synthetic articulation of an avatar that was used, for it is feasible to lip-sync auditory speech in a highly standardized way. Three hypotheses were formed according to the literature on AV SR that used live or filmed talkers. It was tested whether respective effects could be reproduced with synthetic articulation: (1) cochlear implant (CI) users have a higher visual-only SR than normal-hearing (NH) individuals, and younger individuals obtain higher lipreading scores than older persons. (2) Both CI and NH gain from presenting AV over unimodal (auditory or visual) sentences in noise. (3) Both CI and NH listeners efficiently integrate complementary auditory and visual speech features. In a controlled, cross-sectional study with 14 experienced CI users (mean age 47.4) and 14 NH individuals (mean age 46.3, similar broad age distribution), lipreading, AV gain, and integration of a German matrix sentence test were assessed. Visual speech stimuli were synthesized by the articulation of the Talking Head system "MASSY" (Modular Audiovisual Speech Synthesizer), which displayed standardized articulation with respect to the visibility of German phones. In line with the hypotheses and previous literature, CI users had a higher mean visual-only SR than NH individuals (CI, 38%; NH, 12%; p < 0.001). Age was correlated with lipreading such that within each group, younger individuals obtained higher visual-only scores than older persons (rCI = -0.54; p = 0.046; rNH = -0.78; p < 0.001). Both CI and NH
Distributed neural signatures of natural audiovisual speech and music in the human auditory cortex.

Science.gov (United States)

Salmi, Juha; Koistinen, Olli-Pekka; Glerean, Enrico; Jylänki, Pasi; Vehtari, Aki; Jääskeläinen, Iiro P; Mäkelä, Sasu; Nummenmaa, Lauri; Nummi-Kuisma, Katarina; Nummi, Ilari; Sams, Mikko

2017-08-15

During a conversation or when listening to music, auditory and visual information are combined automatically into audiovisual objects. However, it is still poorly understood how specific type of visual information shapes neural processing of sounds in lifelike stimulus environments. Here we applied multi-voxel pattern analysis to investigate how naturally matching visual input modulates supratemporal cortex activity during processing of naturalistic acoustic speech, singing and instrumental music. Bayesian logistic regression classifiers with sparsity-promoting priors were trained to predict whether the stimulus was audiovisual or auditory, and whether it contained piano playing, speech, or singing. The predictive performances of the classifiers were tested by leaving one participant at a time for testing and training the model using the remaining 15 participants. The signature patterns associated with unimodal auditory stimuli encompassed distributed locations mostly in the middle and superior temporal gyrus (STG/MTG). A pattern regression analysis, based on a continuous acoustic model, revealed that activity in some of these MTG and STG areas were associated with acoustic features present in speech and music stimuli. Concurrent visual stimulus modulated activity in bilateral MTG (speech), lateral aspect of right anterior STG (singing), and bilateral parietal opercular cortex (piano). Our results suggest that specific supratemporal brain areas are involved in processing complex natural speech, singing, and piano playing, and other brain areas located in anterior (facial speech) and posterior (music-related hand actions) supratemporal cortex are influenced by related visual information. Those anterior and posterior supratemporal areas have been linked to stimulus identification and sensory-motor integration, respectively. Copyright © 2017 Elsevier Inc. All rights reserved.
Neurophysiology underlying influence of stimulus reliability on audiovisual integration.

Science.gov (United States)

Shatzer, Hannah; Shen, Stanley; Kerlin, Jess R; Pitt, Mark A; Shahin, Antoine J

2018-01-24

We tested the predictions of the dynamic reweighting model (DRM) of audiovisual (AV) speech integration, which posits that spectrotemporally reliable (informative) AV speech stimuli induce a reweighting of processing from low-level to high-level auditory networks. This reweighting decreases sensitivity to acoustic onsets and in turn increases tolerance to AV onset asynchronies (AVOA). EEG was recorded while subjects watched videos of a speaker uttering trisyllabic nonwords that varied in spectrotemporal reliability and asynchrony of the visual and auditory inputs. Subjects judged the stimuli as in-sync or out-of-sync. Results showed that subjects exhibited greater AVOA tolerance for non-blurred than blurred visual speech and for less than more degraded acoustic speech. Increased AVOA tolerance was reflected in reduced amplitude of the P1-P2 auditory evoked potentials, a neurophysiological indication of reduced sensitivity to acoustic onsets and successful AV integration. There was also sustained visual alpha band (8-14 Hz) suppression (desynchronization) following acoustic speech onsets for non-blurred vs. blurred visual speech, consistent with continuous engagement of the visual system as the speech unfolds. The current findings suggest that increased spectrotemporal reliability of acoustic and visual speech promotes robust AV integration, partly by suppressing sensitivity to acoustic onsets, in support of the DRM's reweighting mechanism. Increased visual signal reliability also sustains the engagement of the visual system with the auditory system to maintain alignment of information across modalities. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Perception of the multisensory coherence of fluent audiovisual speech in infancy: its emergence and the role of experience.

Science.gov (United States)

Lewkowicz, David J; Minar, Nicholas J; Tift, Amy H; Brandon, Melissa

2015-02-01

To investigate the developmental emergence of the perception of the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8- to 10-, and 12- to 14-month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor 8- to 10-month-old infants exhibited audiovisual matching in that they did not look longer at the matching monologue. In contrast, the 12- to 14-month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, perceived the multisensory coherence of native-language monologues earlier in the test trials than that of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12- to 14-month-olds did not depend on audiovisual synchrony, whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audiovisual synchrony cues are more important in the perception of the multisensory coherence of non-native speech than that of native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. Copyright © 2014 Elsevier Inc. All rights reserved.
Perception of the Multisensory Coherence of Fluent Audiovisual Speech in Infancy: Its Emergence & the Role of Experience

Science.gov (United States)

Lewkowicz, David J.; Minar, Nicholas J.; Tift, Amy H.; Brandon, Melissa

2014-01-01

To investigate the developmental emergence of the ability to perceive the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8–10, and 12–14 month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor the 8–10 month-old infants exhibited audio-visual matching in that neither group exhibited greater looking at the matching monologue. In contrast, the 12–14 month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, they perceived the multisensory coherence of native-language monologues earlier in the test trials than of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12–14 month olds did not depend on audio-visual synchrony whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audio-visual synchrony cues are more important in the perception of the multisensory coherence of non-native than native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. PMID:25462038
Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

Science.gov (United States)

Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

2016-01-01

Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…
BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

Directory of Open Access Journals (Sweden)

A. A. Karpov

2014-09-01

Full Text Available We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating all the components for generation of audio, visual and signed speech. The proposed system performs automatic translation of input textual information into speech (audio information and gestures (video information, information fusion and its output in the form of multimedia information. A user can input any grammatically correct text in Russian or Czech languages to the system; it is analyzed by the text processor to detect sentences, words and characters. Then this textual information is converted into symbols of the sign language notation. We apply international «Hamburg Notation System» - HamNoSys, which describes the main differential features of each manual sign: hand shape, hand orientation, place and type of movement. On their basis the 3D signing avatar displays the elements of the sign language. The virtual 3D model of human’s head and upper body has been created using VRML virtual reality modeling language, and it is controlled by the software based on OpenGL graphical library. The developed multimodal synthesis system is a universal one since it is oriented for both regular users and disabled people (in particular, for the hard-of-hearing and visually impaired, and it serves for multimedia output (by audio and visual modalities of input textual information.
Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex.

Science.gov (United States)

Rhone, Ariane E; Nourski, Kirill V; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A; McMurray, Bob

In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.
Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

Directory of Open Access Journals (Sweden)

Magnus eAlm

2015-07-01

Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.

Gated audiovisual speech identification in silence vs. noise: effects on time and accuracy

Science.gov (United States)

Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

2013-01-01

This study investigated the degree to which audiovisual presentation (compared to auditory-only presentation) affected isolation point (IPs, the amount of time required for the correct identification of speech stimuli using a gating paradigm) in silence and noise conditions. The study expanded on the findings of Moradi et al. (under revision), using the same stimuli, but presented in an audiovisual instead of an auditory-only manner. The results showed that noise impeded the identification of consonants and words (i.e., delayed IPs and lowered accuracy), but not the identification of final words in sentences. In comparison with the previous study by Moradi et al., it can be concluded that the provision of visual cues expedited IPs and increased the accuracy of speech stimuli identification in both silence and noise. The implication of the results is discussed in terms of models for speech understanding. PMID:23801980
Immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention: a mismatch negativity study.

Science.gov (United States)

Li, X; Yang, Y; Ren, G

2009-06-16

Language is often perceived together with visual information. Recent experimental evidences indicated that, during spoken language comprehension, the brain can immediately integrate visual information with semantic or syntactic information from speech. Here we used the mismatch negativity to further investigate whether prosodic information from speech could be immediately integrated into a visual scene context or not, and especially the time course and automaticity of this integration process. Sixteen Chinese native speakers participated in the study. The materials included Chinese spoken sentences and picture pairs. In the audiovisual situation, relative to the concomitant pictures, the spoken sentence was appropriately accented in the standard stimuli, but inappropriately accented in the two kinds of deviant stimuli. In the purely auditory situation, the speech sentences were presented without pictures. It was found that the deviants evoked mismatch responses in both audiovisual and purely auditory situations; the mismatch negativity in the purely auditory situation peaked at the same time as, but was weaker than that evoked by the same deviant speech sounds in the audiovisual situation. This pattern of results suggested immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention.
Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration.

Science.gov (United States)

Stropahl, Maren; Debener, Stefan

2017-01-01

There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI) users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users ( n = 18), untreated mild to moderately hearing impaired individuals (n = 18) and normal hearing controls ( n = 17). Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the auditory system
Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration

Directory of Open Access Journals (Sweden)

Maren Stropahl

2017-01-01

Full Text Available There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users (n = 18, untreated mild to moderately hearing impaired individuals (n = 18 and normal hearing controls (n = 17. Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the
Common variation in the autism risk gene CNTNAP2, brain structural connectivity and multisensory speech integration.

Science.gov (United States)

Ross, Lars A; Del Bene, Victor A; Molholm, Sophie; Jae Woo, Young; Andrade, Gizely N; Abrahams, Brett S; Foxe, John J

2017-11-01

Three lines of evidence motivated this study. 1) CNTNAP2 variation is associated with autism risk and speech-language development. 2) CNTNAP2 variations are associated with differences in white matter (WM) tracts comprising the speech-language circuitry. 3) Children with autism show impairment in multisensory speech perception. Here, we asked whether an autism risk-associated CNTNAP2 single nucleotide polymorphism in neurotypical adults was associated with multisensory speech perception performance, and whether such a genotype-phenotype association was mediated through white matter tract integrity in speech-language circuitry. Risk genotype at rs7794745 was associated with decreased benefit from visual speech and lower fractional anisotropy (FA) in several WM tracts (right precentral gyrus, left anterior corona radiata, right retrolenticular internal capsule). These structural connectivity differences were found to mediate the effect of genotype on audiovisual speech perception, shedding light on possible pathogenic pathways in autism and biological sources of inter-individual variation in audiovisual speech processing in neurotypicals. Copyright © 2017 Elsevier Inc. All rights reserved.
Audiovisual integration facilitates monkeys' short-term memory.

Science.gov (United States)

Bigelow, James; Poremba, Amy

2016-07-01

Many human behaviors are known to benefit from audiovisual integration, including language and communication, recognizing individuals, social decision making, and memory. Exceptionally little is known about the contributions of audiovisual integration to behavior in other primates. The current experiment investigated whether short-term memory in nonhuman primates is facilitated by the audiovisual presentation format. Three macaque monkeys that had previously learned an auditory delayed matching-to-sample (DMS) task were trained to perform a similar visual task, after which they were tested with a concurrent audiovisual DMS task with equal proportions of auditory, visual, and audiovisual trials. Parallel to outcomes in human studies, accuracy was higher and response times were faster on audiovisual trials than either unisensory trial type. Unexpectedly, two subjects exhibited superior unimodal performance on auditory trials, a finding that contrasts with previous studies, but likely reflects their training history. Our results provide the first demonstration of a bimodal memory advantage in nonhuman primates, lending further validation to their use as a model for understanding audiovisual integration and memory processing in humans.
Audiovisual speech perception development at varying levels of perceptual processing.

Science.gov (United States)

Lalonde, Kaylah; Holt, Rachael Frush

2016-04-01

This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.
Bimodal bilingualism as multisensory training?: Evidence for improved audiovisual speech perception after sign language exposure.

Science.gov (United States)

Williams, Joshua T; Darcy, Isabelle; Newman, Sharlene D

2016-02-15

The aim of the present study was to characterize effects of learning a sign language on the processing of a spoken language. Specifically, audiovisual phoneme comprehension was assessed before and after 13 weeks of sign language exposure. L2 ASL learners performed this task in the fMRI scanner. Results indicated that L2 American Sign Language (ASL) learners' behavioral classification of the speech sounds improved with time compared to hearing nonsigners. Results indicated increased activation in the supramarginal gyrus (SMG) after sign language exposure, which suggests concomitant increased phonological processing of speech. A multiple regression analysis indicated that learner's rating on co-sign speech use and lipreading ability was correlated with SMG activation. This pattern of results indicates that the increased use of mouthing and possibly lipreading during sign language acquisition may concurrently improve audiovisual speech processing in budding hearing bimodal bilinguals. Copyright © 2015 Elsevier B.V. All rights reserved.
Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect.

Science.gov (United States)

Van Engen, Kristin J; Xie, Zilong; Chandrasekaran, Bharath

2017-02-01

In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners' auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants' susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners' McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.
Brief Report: Arrested Development of Audiovisual Speech Perception in Autism Spectrum Disorders

Science.gov (United States)

Stevenson, Ryan A.; Siemann, Justin K.; Woynaroski, Tiffany G.; Schneider, Brittany C.; Eberly, Haley E.; Camarata, Stephen M.; Wallace, Mark T.

2014-01-01

Atypical communicative abilities are a core marker of Autism Spectrum Disorders (ASD). A number of studies have shown that, in addition to auditory comprehension differences, individuals with autism frequently show atypical responses to audiovisual speech, suggesting a multisensory contribution to these communicative differences from their…
Audiovisual Perception of Noise Vocoded Speech in Dyslexic and Non-Dyslexic Adults: The Role of Low-Frequency Visual Modulations

Science.gov (United States)

Megnin-Viggars, Odette; Goswami, Usha

2013-01-01

Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…
Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

Science.gov (United States)

Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

2016-01-13

An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.
Effect of hearing loss on semantic access by auditory and audiovisual speech in children.

Science.gov (United States)

Jerger, Susan; Tye-Murray, Nancy; Damian, Markus F; Abdi, Hervé

2013-01-01

This research studied whether the mode of input (auditory versus audiovisual) influenced semantic access by speech in children with sensorineural hearing impairment (HI). Participants, 31 children with HI and 62 children with normal hearing (NH), were tested with the authors' new multimodal picture word task. Children were instructed to name pictures displayed on a monitor and ignore auditory or audiovisual speech distractors. The semantic content of the distractors was varied to be related versus unrelated to the pictures (e.g., picture distractor of dog-bear versus dog-cheese, respectively). In children with NH, picture-naming times were slower in the presence of semantically related distractors. This slowing, called semantic interference, is attributed to the meaning-related picture-distractor entries competing for selection and control of the response (the lexical selection by competition hypothesis). Recently, a modification of the lexical selection by competition hypothesis, called the competition threshold (CT) hypothesis, proposed that (1) the competition between the picture-distractor entries is determined by a threshold, and (2) distractors with experimentally reduced fidelity cannot reach the CT. Thus, semantically related distractors with reduced fidelity do not produce the normal interference effect, but instead no effect or semantic facilitation (faster picture naming times for semantically related versus unrelated distractors). Facilitation occurs because the activation level of the semantically related distractor with reduced fidelity (1) is not sufficient to exceed the CT and produce interference but (2) is sufficient to activate its concept, which then strengthens the activation of the picture and facilitates naming. This research investigated whether the proposals of the CT hypothesis generalize to the auditory domain, to the natural degradation of speech due to HI, and to participants who are children. Our multimodal picture word task allowed us
Audiovisual segregation in cochlear implant users.

Directory of Open Access Journals (Sweden)

Simon Landry

Full Text Available It has traditionally been assumed that cochlear implant users de facto perform atypically in audiovisual tasks. However, a recent study that combined an auditory task with visual distractors suggests that only those cochlear implant users that are not proficient at recognizing speech sounds might show abnormal audiovisual interactions. The present study aims at reinforcing this notion by investigating the audiovisual segregation abilities of cochlear implant users in a visual task with auditory distractors. Speechreading was assessed in two groups of cochlear implant users (proficient and non-proficient at sound recognition, as well as in normal controls. A visual speech recognition task (i.e. speechreading was administered either in silence or in combination with three types of auditory distractors: i noise ii reverse speech sound and iii non-altered speech sound. Cochlear implant users proficient at speech recognition performed like normal controls in all conditions, whereas non-proficient users showed significantly different audiovisual segregation patterns in both speech conditions. These results confirm that normal-like audiovisual segregation is possible in highly skilled cochlear implant users and, consequently, that proficient and non-proficient CI users cannot be lumped into a single group. This important feature must be taken into account in further studies of audiovisual interactions in cochlear implant users.
Crossmodal deficit in dyslexic children: practice affects the neural timing of letter-speech sound integration

Directory of Open Access Journals (Sweden)

Gojko eŽarić

2015-06-01

Full Text Available A failure to build solid letter-speech sound associations may contribute to reading impairments in developmental dyslexia. Whether this reduced neural integration of letters and speech sounds changes over time within individual children and how this relates to behavioral gains in reading skills remains unknown. In this research, we examined changes in event-related potential (ERP measures of letter-speech sound integration over a 6-month period during which 9-year-old dyslexic readers (n=17 followed a training in letter-speech sound coupling next to their regular reading curriculum. We presented the Dutch spoken vowels /a/ and /o/ as standard and deviant stimuli in one auditory and two audiovisual oddball conditions. In one audiovisual condition (AV0, the letter ‘a’ was presented simultaneously with the vowels, while in the other (AV200 it was preceding vowel onset for 200 ms. Prior to the training (T1, dyslexic readers showed the expected pattern of typical auditory mismatch responses, together with the absence of letter-speech sound effects in a late negativity (LN window. After the training (T2, our results showed earlier (and enhanced crossmodal effects in the LN window. Most interestingly, earlier LN latency at T2 was significantly related to higher behavioral accuracy in letter-speech sound coupling. On a more general level, the timing of the earlier mismatch negativity (MMN in the simultaneous condition (AV0 measured at T1, significantly related to reading fluency at both T1 and T2 as well as with reading gains. Our findings suggest that the reduced neural integration of letters and speech sounds in dyslexic children may show moderate improvement with reading instruction and training and that behavioral improvements relate especially to individual differences in the timing of this neural integration.
The effect of a concurrent working memory task and temporal offsets on the integration of auditory and visual speech information.

Science.gov (United States)

Buchan, Julie N; Munhall, Kevin G

2012-01-01

Audiovisual speech perception is an everyday occurrence of multisensory integration. Conflicting visual speech information can influence the perception of acoustic speech (namely the McGurk effect), and auditory and visual speech are integrated over a rather wide range of temporal offsets. This research examined whether the addition of a concurrent cognitive load task would affect the audiovisual integration in a McGurk speech task and whether the cognitive load task would cause more interference at increasing offsets. The amount of integration was measured by the proportion of responses in incongruent trials that did not correspond to the audio (McGurk response). An eye-tracker was also used to examine whether the amount of temporal offset and the presence of a concurrent cognitive load task would influence gaze behavior. Results from this experiment show a very modest but statistically significant decrease in the number of McGurk responses when subjects also perform a cognitive load task, and that this effect is relatively constant across the various temporal offsets. Participant's gaze behavior was also influenced by the addition of a cognitive load task. Gaze was less centralized on the face, less time was spent looking at the mouth and more time was spent looking at the eyes, when a concurrent cognitive load task was added to the speech task.
Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

Directory of Open Access Journals (Sweden)

Warrick Roseboom

2011-04-01

Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.
Absent Audiovisual Integration Elicited by Peripheral Stimuli in Parkinson's Disease.

Science.gov (United States)

Ren, Yanna; Suzuki, Keisuke; Yang, Weiping; Ren, Yanling; Wu, Fengxia; Yang, Jiajia; Takahashi, Satoshi; Ejima, Yoshimichi; Wu, Jinglong; Hirata, Koichi

2018-01-01

The basal ganglia, which have been shown to be a significant multisensory hub, are disordered in Parkinson's disease (PD). This study was to investigate the audiovisual integration of peripheral stimuli in PD patients with/without sleep disturbances. Thirty-six age-matched normal controls (NC) and 30 PD patients were recruited for an auditory/visual discrimination experiment. The mean response times for each participant were analyzed using repeated measures ANOVA and race model. The results showed that the response to all stimuli was significantly delayed for PD compared to NC (all p audiovisual stimuli was significantly faster than that to unimodal stimuli in both NC and PD ( p audiovisual integration was absent in PD; however, it did occur in NC. Further analysis showed that there was no significant audiovisual integration in PD with/without cognitive impairment or in PD with/without sleep disturbances. Furthermore, audiovisual facilitation was not associated with Hoehn and Yahr stage, disease duration, or the presence of sleep disturbances (all p > 0.05). The current results showed that audiovisual multisensory integration for peripheral stimuli is absent in PD regardless of sleep disturbances and further suggested the abnormal audiovisual integration might be a potential early manifestation of PD.
Crossmodal and incremental perception of audiovisual cues to emotional speech.

Science.gov (United States)

Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

2010-01-01

In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores
Audiovisual perception in amblyopia: A review and synthesis.

Science.gov (United States)

Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

2018-05-17

Amblyopia is a common developmental sensory disorder that has been extensively and systematically investigated as a unisensory visual impairment. However, its effects are increasingly recognized to extend beyond vision to the multisensory domain. Indeed, amblyopia is associated with altered cross-modal interactions in audiovisual temporal perception, audiovisual spatial perception, and audiovisual speech perception. Furthermore, although the visual impairment in amblyopia is typically unilateral, the multisensory abnormalities tend to persist even when viewing with both eyes. Knowledge of the extent and mechanisms of the audiovisual impairments in amblyopia, however, remains in its infancy. This work aims to review our current understanding of audiovisual processing and integration deficits in amblyopia, and considers the possible mechanisms underlying these abnormalities. Copyright © 2018. Published by Elsevier Ltd.

Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

Science.gov (United States)

Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

2011-01-01

Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…
Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

Directory of Open Access Journals (Sweden)

José Antonio Palao Errando

2007-10-01

Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.
Audiovisual laughter detection based on temporal features

NARCIS (Netherlands)

Petridis, Stavros; Nijholt, Antinus; Nijholt, A.; Pantic, M.; Pantic, Maja; Poel, Mannes; Poel, M.; Hondorp, G.H.W.

2008-01-01

Previous research on automatic laughter detection has mainly been focused on audio-based detection. In this study we present an audiovisual approach to distinguishing laughter from speech based on temporal features and we show that the integration of audio and visual information leads to improved
Early and late beta-band power reflect audiovisual perception in the McGurk illusion.

Science.gov (United States)

Roa Romero, Yadira; Senkowski, Daniel; Keil, Julian

2015-04-01

The McGurk illusion is a prominent example of audiovisual speech perception and the influence that visual stimuli can have on auditory perception. In this illusion, a visual speech stimulus influences the perception of an incongruent auditory stimulus, resulting in a fused novel percept. In this high-density electroencephalography (EEG) study, we were interested in the neural signatures of the subjective percept of the McGurk illusion as a phenomenon of speech-specific multisensory integration. Therefore, we examined the role of cortical oscillations and event-related responses in the perception of congruent and incongruent audiovisual speech. We compared the cortical activity elicited by objectively congruent syllables with incongruent audiovisual stimuli. Importantly, the latter elicited a subjectively congruent percept: the McGurk illusion. We found that early event-related responses (N1) to audiovisual stimuli were reduced during the perception of the McGurk illusion compared with congruent stimuli. Most interestingly, our study showed a stronger poststimulus suppression of beta-band power (13-30 Hz) at short (0-500 ms) and long (500-800 ms) latencies during the perception of the McGurk illusion compared with congruent stimuli. Our study demonstrates that auditory perception is influenced by visual context and that the subsequent formation of a McGurk illusion requires stronger audiovisual integration even at early processing stages. Our results provide evidence that beta-band suppression at early stages reflects stronger stimulus processing in the McGurk illusion. Moreover, stronger late beta-band suppression in McGurk illusion indicates the resolution of incongruent physical audiovisual input and the formation of a coherent, illusory multisensory percept. Copyright © 2015 the American Physiological Society.
Reduced neural integration of letters and speech sounds in dyslexic children scales with individual differences in reading fluency.

Directory of Open Access Journals (Sweden)

Gojko Žarić

Full Text Available The acquisition of letter-speech sound associations is one of the basic requirements for fluent reading acquisition and its failure may contribute to reading difficulties in developmental dyslexia. Here we investigated event-related potential (ERP measures of letter-speech sound integration in 9-year-old typical and dyslexic readers and specifically test their relation to individual differences in reading fluency. We employed an audiovisual oddball paradigm in typical readers (n = 20, dysfluent (n = 18 and severely dysfluent (n = 18 dyslexic children. In one auditory and two audiovisual conditions the Dutch spoken vowels/a/and/o/were presented as standard and deviant stimuli. In audiovisual blocks, the letter 'a' was presented either simultaneously (AV0, or 200 ms before (AV200 vowel sound onset. Across the three children groups, vowel deviancy in auditory blocks elicited comparable mismatch negativity (MMN and late negativity (LN responses. In typical readers, both audiovisual conditions (AV0 and AV200 led to enhanced MMN and LN amplitudes. In both dyslexic groups, the audiovisual LN effects were mildly reduced. Most interestingly, individual differences in reading fluency were correlated with MMN latency in the AV0 condition. A further analysis revealed that this effect was driven by a short-lived MMN effect encompassing only the N1 window in severely dysfluent dyslexics versus a longer MMN effect encompassing both the N1 and P2 windows in the other two groups. Our results confirm and extend previous findings in dyslexic children by demonstrating a deficient pattern of letter-speech sound integration depending on the level of reading dysfluency. These findings underscore the importance of considering individual differences across the entire spectrum of reading skills in addition to group differences between typical and dyslexic readers.
Dissociating Cortical Activity during Processing of Native and Non-Native Audiovisual Speech from Early to Late Infancy

Directory of Open Access Journals (Sweden)

Eswen Fava

2014-08-01

Full Text Available Initially, infants are capable of discriminating phonetic contrasts across the world’s languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech. Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity.
An audiovisual emotion recognition system

Science.gov (United States)

Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun

2007-12-01

Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.
Attenuated audiovisual integration in middle-aged adults in a discrimination task.

Science.gov (United States)

Yang, Weiping; Ren, Yanna

2018-02-01

Numerous studies have focused on the diversity of audiovisual integration between younger and older adults. However, consecutive trends in audiovisual integration throughout life are still unclear. In the present study, to clarify audiovisual integration characteristics in middle-aged adults, we instructed younger and middle-aged adults to conduct an auditory/visual stimuli discrimination experiment. Randomized streams of unimodal auditory (A), unimodal visual (V) or audiovisual stimuli were presented on the left or right hemispace of the central fixation point, and subjects were instructed to respond to the target stimuli rapidly and accurately. Our results demonstrated that the responses of middle-aged adults to all unimodal and bimodal stimuli were significantly slower than those of younger adults (p Audiovisual integration was markedly delayed (onset time 360 ms) and weaker (peak 3.97%) in middle-aged adults than in younger adults (onset time 260 ms, peak 11.86%). The results suggested that audiovisual integration was attenuated in middle-aged adults and further confirmed age-related decline in information processing.
The role of emotion in dynamic audiovisual integration of faces and voices.

Science.gov (United States)

Kokinous, Jenny; Kotz, Sonja A; Tavano, Alessandro; Schröger, Erich

2015-05-01

We used human electroencephalogram to study early audiovisual integration of dynamic angry and neutral expressions. An auditory-only condition served as a baseline for the interpretation of integration effects. In the audiovisual conditions, the validity of visual information was manipulated using facial expressions that were either emotionally congruent or incongruent with the vocal expressions. First, we report an N1 suppression effect for angry compared with neutral vocalizations in the auditory-only condition. Second, we confirm early integration of congruent visual and auditory information as indexed by a suppression of the auditory N1 and P2 components in the audiovisual compared with the auditory-only condition. Third, audiovisual N1 suppression was modulated by audiovisual congruency in interaction with emotion: for neutral vocalizations, there was N1 suppression in both the congruent and the incongruent audiovisual conditions. For angry vocalizations, there was N1 suppression only in the congruent but not in the incongruent condition. Extending previous findings of dynamic audiovisual integration, the current results suggest that audiovisual N1 suppression is congruency- and emotion-specific and indicate that dynamic emotional expressions compared with non-emotional expressions are preferentially processed in early audiovisual integration. © The Author (2014). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Read My Lips: Brain Dynamics Associated with Audiovisual Integration and Deviance Detection.

Science.gov (United States)

Tse, Chun-Yu; Gratton, Gabriele; Garnsey, Susan M; Novak, Michael A; Fabiani, Monica

2015-09-01

Information from different modalities is initially processed in different brain areas, yet real-world perception often requires the integration of multisensory signals into a single percept. An example is the McGurk effect, in which people viewing a speaker whose lip movements do not match the utterance perceive the spoken sounds incorrectly, hearing them as more similar to those signaled by the visual rather than the auditory input. This indicates that audiovisual integration is important for generating the phoneme percept. Here we asked when and where the audiovisual integration process occurs, providing spatial and temporal boundaries for the processes generating phoneme perception. Specifically, we wanted to separate audiovisual integration from other processes, such as simple deviance detection. Building on previous work employing ERPs, we used an oddball paradigm in which task-irrelevant audiovisually deviant stimuli were embedded in strings of non-deviant stimuli. We also recorded the event-related optical signal, an imaging method combining spatial and temporal resolution, to investigate the time course and neuroanatomical substrate of audiovisual integration. We found that audiovisual deviants elicit a short duration response in the middle/superior temporal gyrus, whereas audiovisual integration elicits a more extended response involving also inferior frontal and occipital regions. Interactions between audiovisual integration and deviance detection processes were observed in the posterior/superior temporal gyrus. These data suggest that dynamic interactions between inferior frontal cortex and sensory regions play a significant role in multimodal integration.
Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training.

Science.gov (United States)

Bernstein, Lynne E; Auer, Edward T; Eberhardt, Silvio P; Jiang, Jintao

2013-01-01

Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called "reverse hierarchy theory" of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.
Audio-visual speech perception in prelingually deafened Japanese children following sequential bilateral cochlear implantation.

Science.gov (United States)

Yamamoto, Ryosuke; Naito, Yasushi; Tona, Risa; Moroto, Saburo; Tamaya, Rinko; Fujiwara, Keizo; Shinohara, Shogo; Takebayashi, Shinji; Kikuchi, Masahiro; Michida, Tetsuhiko

2017-11-01

An effect of audio-visual (AV) integration is observed when the auditory and visual stimuli are incongruent (the McGurk effect). In general, AV integration is helpful especially in subjects wearing hearing aids or cochlear implants (CIs). However, the influence of AV integration on spoken word recognition in individuals with bilateral CIs (Bi-CIs) has not been fully investigated so far. In this study, we investigated AV integration in children with Bi-CIs. The study sample included thirty one prelingually deafened children who underwent sequential bilateral cochlear implantation. We assessed their responses to congruent and incongruent AV stimuli with three CI-listening modes: only the 1st CI, only the 2nd CI, and Bi-CIs. The responses were assessed in the whole group as well as in two sub-groups: a proficient group (syllable intelligibility ≥80% with the 1st CI) and a non-proficient group (syllable intelligibility effect in each of the three CI-listening modes. AV integration responses were observed in a subset of incongruent AV stimuli, and the patterns observed with the 1st CI and with Bi-CIs were similar. In the proficient group, the responses with the 2nd CI were not significantly different from those with the 1st CI whereas in the non-proficient group the responses with the 2nd CI were driven by visual stimuli more than those with the 1st CI. Our results suggested that prelingually deafened Japanese children who underwent sequential bilateral cochlear implantation exhibit AV integration abilities, both in monaural listening as well as in binaural listening. We also observed a higher influence of visual stimuli on speech perception with the 2nd CI in the non-proficient group, suggesting that Bi-CIs listeners with poorer speech recognition rely on visual information more compared to the proficient subjects to compensate for poorer auditory input. Nevertheless, poorer quality auditory input with the 2nd CI did not interfere with AV integration with binaural
Does hearing aid use affect audiovisual integration in mild hearing impairment?

Science.gov (United States)

Gieseler, Anja; Tahden, Maike A S; Thiel, Christiane M; Colonius, Hans

2018-04-01

There is converging evidence for altered audiovisual integration abilities in hearing-impaired individuals and those with profound hearing loss who are provided with cochlear implants, compared to normal-hearing adults. Still, little is known on the effects of hearing aid use on audiovisual integration in mild hearing loss, although this constitutes one of the most prevalent conditions in the elderly and, yet, often remains untreated in its early stages. This study investigated differences in the strength of audiovisual integration between elderly hearing aid users and those with the same degree of mild hearing loss who were not using hearing aids, the non-users, by measuring their susceptibility to the sound-induced flash illusion. We also explored the corresponding window of integration by varying the stimulus onset asynchronies. To examine general group differences that are not attributable to specific hearing aid settings but rather reflect overall changes associated with habitual hearing aid use, the group of hearing aid users was tested unaided while individually controlling for audibility. We found greater audiovisual integration together with a wider window of integration in hearing aid users compared to their age-matched untreated peers. Signal detection analyses indicate that a change in perceptual sensitivity as well as in bias may underlie the observed effects. Our results and comparisons with other studies in normal-hearing older adults suggest that both mild hearing impairment and hearing aid use seem to affect audiovisual integration, possibly in the sense that hearing aid use may reverse the effects of hearing loss on audiovisual integration. We suggest that these findings may be particularly important for auditory rehabilitation and call for a longitudinal study.
Skill Dependent Audiovisual Integration in the Fusiform Induces Repetition Suppression

Science.gov (United States)

McNorgan, Chris; Booth, James R.

2015-01-01

Learning to read entails mapping existing phonological representations to novel orthographic representations and is thus an ideal context for investigating experience driven audiovisual integration. Because two dominant brain-based theories of reading development hinge on the sensitivity of the visual-object processing stream to phonological information, we were interested in how reading skill relates to audiovisual integration in this area. Thirty-two children between 8 and 13 years of age spanning a range of reading skill participated in a functional magnetic resonance imaging experiment. Participants completed a rhyme judgment task to word pairs presented unimodally (auditory- or visual-only) and cross-modally (auditory followed by visual). Skill-dependent sub-additive audiovisual modulation was found in left fusiform gyrus, extending into the putative visual word form area, and was correlated with behavioral orthographic priming. These results suggest learning to read promotes facilitatory audiovisual integration in the ventral visual-object processing stream and may optimize this region for orthographic processing. PMID:25585276
Skill dependent audiovisual integration in the fusiform induces repetition suppression.

Science.gov (United States)

McNorgan, Chris; Booth, James R

2015-02-01

Learning to read entails mapping existing phonological representations to novel orthographic representations and is thus an ideal context for investigating experience driven audiovisual integration. Because two dominant brain-based theories of reading development hinge on the sensitivity of the visual-object processing stream to phonological information, we were interested in how reading skill relates to audiovisual integration in this area. Thirty-two children between 8 and 13 years of age spanning a range of reading skill participated in a functional magnetic resonance imaging experiment. Participants completed a rhyme judgment task to word pairs presented unimodally (auditory- or visual-only) and cross-modally (auditory followed by visual). Skill-dependent sub-additive audiovisual modulation was found in left fusiform gyrus, extending into the putative visual word form area, and was correlated with behavioral orthographic priming. These results suggest learning to read promotes facilitatory audiovisual integration in the ventral visual-object processing stream and may optimize this region for orthographic processing. Copyright © 2014 Elsevier Inc. All rights reserved.
The Efficacy of Short-term Gated Audiovisual Speech Training for Improving Auditory Sentence Identification in Noise in Elderly Hearing Aid Users

Science.gov (United States)

Moradi, Shahram; Wahlin, Anna; Hällgren, Mathias; Rönnberg, Jerker; Lidestam, Björn

2017-01-01

This study aimed to examine the efficacy and maintenance of short-term (one-session) gated audiovisual speech training for improving auditory sentence identification in noise in experienced elderly hearing-aid users. Twenty-five hearing aid users (16 men and 9 women), with an average age of 70.8 years, were randomly divided into an experimental (audiovisual training, n = 14) and a control (auditory training, n = 11) group. Participants underwent gated speech identification tasks comprising Swedish consonants and words presented at 65 dB sound pressure level with a 0 dB signal-to-noise ratio (steady-state broadband noise), in audiovisual or auditory-only training conditions. The Hearing-in-Noise Test was employed to measure participants’ auditory sentence identification in noise before the training (pre-test), promptly after training (post-test), and 1 month after training (one-month follow-up). The results showed that audiovisual training improved auditory sentence identification in noise promptly after the training (post-test vs. pre-test scores); furthermore, this improvement was maintained 1 month after the training (one-month follow-up vs. pre-test scores). Such improvement was not observed in the control group, neither promptly after the training nor at the one-month follow-up. However, no significant between-groups difference nor an interaction between groups and session was observed. Conclusion: Audiovisual training may be considered in aural rehabilitation of hearing aid users to improve listening capabilities in noisy conditions. However, the lack of a significant between-groups effect (audiovisual vs. auditory) or an interaction between group and session calls for further research. PMID:28348542
Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

Directory of Open Access Journals (Sweden)

Koji Iwano

2007-03-01

Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.
Multisensory integration in complete unawareness: evidence from audiovisual congruency priming.

Science.gov (United States)

Faivre, Nathan; Mudrik, Liad; Schwartz, Naama; Koch, Christof

2014-11-01

Multisensory integration is thought to require conscious perception. Although previous studies have shown that an invisible stimulus could be integrated with an audible one, none have demonstrated integration of two subliminal stimuli of different modalities. Here, pairs of identical or different audiovisual target letters (the sound /b/ with the written letter "b" or "m," respectively) were preceded by pairs of masked identical or different audiovisual prime digits (the sound /6/ with the written digit "6" or "8," respectively). In three experiments, awareness of the audiovisual digit primes was manipulated, such that participants were either unaware of the visual digit, the auditory digit, or both. Priming of the semantic relations between the auditory and visual digits was found in all experiments. Moreover, a further experiment showed that unconscious multisensory integration was not obtained when participants did not undergo prior conscious training of the task. This suggests that following conscious learning, unconscious processing suffices for multisensory integration. © The Author(s) 2014.
The Influence of Selective and Divided Attention on Audiovisual Integration in Children.

Science.gov (United States)

Yang, Weiping; Ren, Yanna; Yang, Dan Ou; Yuan, Xue; Wu, Jinglong

2016-01-24

This article aims to investigate whether there is a difference in audiovisual integration in school-aged children (aged 6 to 13 years; mean age = 9.9 years) between the selective attention condition and divided attention condition. We designed a visual and/or auditory detection task that included three blocks (divided attention, visual-selective attention, and auditory-selective attention). The results showed that the response to bimodal audiovisual stimuli was faster than to unimodal auditory or visual stimuli under both divided attention and auditory-selective attention conditions. However, in the visual-selective attention condition, no significant difference was found between the unimodal visual and bimodal audiovisual stimuli in response speed. Moreover, audiovisual behavioral facilitation effects were compared between divided attention and selective attention (auditory or visual attention). In doing so, we found that audiovisual behavioral facilitation was significantly difference between divided attention and selective attention. The results indicated that audiovisual integration was stronger in the divided attention condition than that in the selective attention condition in children. Our findings objectively support the notion that attention can modulate audiovisual integration in school-aged children. Our study might offer a new perspective for identifying children with conditions that are associated with sustained attention deficit, such as attention-deficit hyperactivity disorder. © The Author(s) 2016.
[Virtual audiovisual talking heads: articulatory data and models--applications].

Science.gov (United States)

Badin, P; Elisei, F; Bailly, G; Savariaux, C; Serrurier, A; Tarabalka, Y

2007-01-01

In the framework of experimental phonetics, our approach to the study of speech production is based on the measurement, the analysis and the modeling of orofacial articulators such as the jaw, the face and the lips, the tongue or the velum. Therefore, we present in this article experimental techniques that allow characterising the shape and movement of speech articulators (static and dynamic MRI, computed tomodensitometry, electromagnetic articulography, video recording). We then describe the linear models of the various organs that we can elaborate from speaker-specific articulatory data. We show that these models, that exhibit a good geometrical resolution, can be controlled from articulatory data with a good temporal resolution and can thus permit the reconstruction of high quality animation of the articulators. These models, that we have integrated in a virtual talking head, can produce augmented audiovisual speech. In this framework, we have assessed the natural tongue reading capabilities of human subjects by means of audiovisual perception tests. We conclude by suggesting a number of other applications of talking heads.

Audiovisual Integration Delayed by Stimulus Onset Asynchrony Between Auditory and Visual Stimuli in Older Adults.

Science.gov (United States)

Ren, Yanna; Yang, Weiping; Nakahashi, Kohei; Takahashi, Satoshi; Wu, Jinglong

2017-02-01

Although neuronal studies have shown that audiovisual integration is regulated by temporal factors, there is still little knowledge about the impact of temporal factors on audiovisual integration in older adults. To clarify how stimulus onset asynchrony (SOA) between auditory and visual stimuli modulates age-related audiovisual integration, 20 younger adults (21-24 years) and 20 older adults (61-80 years) were instructed to perform an auditory or visual stimuli discrimination experiment. The results showed that in younger adults, audiovisual integration was altered from an enhancement (AV, A ± 50 V) to a depression (A ± 150 V). In older adults, the alterative pattern was similar to that for younger adults with the expansion of SOA; however, older adults showed significantly delayed onset for the time-window-of-integration and peak latency in all conditions, which further demonstrated that audiovisual integration was delayed more severely with the expansion of SOA, especially in the peak latency for V-preceded-A conditions in older adults. Our study suggested that audiovisual facilitative integration occurs only within a certain SOA range (e.g., -50 to 50 ms) in both younger and older adults. Moreover, our results confirm that the response for older adults was slowed and provided empirical evidence that integration ability is much more sensitive to the temporal alignment of audiovisual stimuli in older adults.
Contribution of Prosody in Audio-Visual Integration to Emotional Perception of Virtual Characters

Directory of Open Access Journals (Sweden)

Ekaterina Volkova

2011-10-01

Full Text Available Recent technology provides us with realistic looking virtual characters. Motion capture and elaborate mathematical models supply data for natural looking, controllable facial and bodily animations. With the help of computational linguistics and artificial intelligence, we can automatically assign emotional categories to appropriate stretches of text for a simulation of those social scenarios where verbal communication is important. All this makes virtual characters a valuable tool for creation of versatile stimuli for research on the integration of emotion information from different modalities. We conducted an audio-visual experiment to investigate the differential contributions of emotional speech and facial expressions on emotion identification. We used recorded and synthesized speech as well as dynamic virtual faces, all enhanced for seven emotional categories. The participants were asked to recognize the prevalent emotion of paired faces and audio. Results showed that when the voice was recorded, the vocalized emotion influenced participants' emotion identification more than the facial expression. However, when the voice was synthesized, facial expression influenced participants' emotion identification more than vocalized emotion. Additionally, individuals did worse on identifying either the facial expression or vocalized emotion when the voice was synthesized. Our experimental method can help to determine how to improve synthesized emotional speech.
Effects of Sound Frequency on Audiovisual Integration: An Event-Related Potential Study.

Science.gov (United States)

Yang, Weiping; Yang, Jingjing; Gao, Yulin; Tang, Xiaoyu; Ren, Yanna; Takahashi, Satoshi; Wu, Jinglong

2015-01-01

A combination of signals across modalities can facilitate sensory perception. The audiovisual facilitative effect strongly depends on the features of the stimulus. Here, we investigated how sound frequency, which is one of basic features of an auditory signal, modulates audiovisual integration. In this study, the task of the participant was to respond to a visual target stimulus by pressing a key while ignoring auditory stimuli, comprising of tones of different frequencies (0.5, 1, 2.5 and 5 kHz). A significant facilitation of reaction times was obtained following audiovisual stimulation, irrespective of whether the task-irrelevant sounds were low or high frequency. Using event-related potential (ERP), audiovisual integration was found over the occipital area for 0.5 kHz auditory stimuli from 190-210 ms, for 1 kHz stimuli from 170-200 ms, for 2.5 kHz stimuli from 140-200 ms, 5 kHz stimuli from 100-200 ms. These findings suggest that a higher frequency sound signal paired with visual stimuli might be early processed or integrated despite the auditory stimuli being task-irrelevant information. Furthermore, audiovisual integration in late latency (300-340 ms) ERPs with fronto-central topography was found for auditory stimuli of lower frequencies (0.5, 1 and 2.5 kHz). Our results confirmed that audiovisual integration is affected by the frequency of an auditory stimulus. Taken together, the neurophysiological results provide unique insight into how the brain processes a multisensory visual signal and auditory stimuli of different frequencies.
Selective Attention and Audiovisual Integration: Is Attending to Both Modalities a Prerequisite for Early Integration?

NARCIS (Netherlands)

Talsma, D.; Doty, Tracy J.; Woldorff, Marty G.

2007-01-01

Interactions between multisensory integration and attention were studied using a combined audiovisual streaming design and a rapid serial visual presentation paradigm. Event-related potentials (ERPs) following audiovisual objects (AV) were compared with the sum of the ERPs following auditory (A) and
Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes

Directory of Open Access Journals (Sweden)

Annalisa eSetti

2013-09-01

Full Text Available Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the ‘McGurk illusion’, in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.
Psychophysics of the McGurk and Other Audiovisual Speech Integration Effects

Science.gov (United States)

Jiang, Jintao; Bernstein, Lynne E.

2011-01-01

When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called McGurk effect), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for the distribution of types of perceptual responses to 384 different stimuli from four talkers. The measures included mutual information, the presented acoustic signal versus the acoustic signal recorded with the presented video, and the correlation between the presented acoustic and video stimuli. In Experiment 1, open-set perceptual responses were obtained for acoustic /bA/ or /lA/ dubbed to video /bA, dA, gA, vA, zA, lA, wA, ΔA/. The talker, the video syllable, and the acoustic syllable significantly influenced the type of response. In Experiment 2, the best predictors of response category proportions were a subset of the physical stimulus measures, with the variance accounted for in the perceptual response category proportions between 17% and 52%. That audiovisual stimulus relationships can account for response distributions supports the possibility that internal representations are based on modality-specific stimulus relationships. PMID:21574741
Absent Audiovisual Integration Elicited by Peripheral Stimuli in Parkinson’s Disease

Directory of Open Access Journals (Sweden)

Yanna Ren

2018-01-01

Full Text Available The basal ganglia, which have been shown to be a significant multisensory hub, are disordered in Parkinson’s disease (PD. This study was to investigate the audiovisual integration of peripheral stimuli in PD patients with/without sleep disturbances. Thirty-six age-matched normal controls (NC and 30 PD patients were recruited for an auditory/visual discrimination experiment. The mean response times for each participant were analyzed using repeated measures ANOVA and race model. The results showed that the response to all stimuli was significantly delayed for PD compared to NC (all p0.05. The current results showed that audiovisual multisensory integration for peripheral stimuli is absent in PD regardless of sleep disturbances and further suggested the abnormal audiovisual integration might be a potential early manifestation of PD.
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

Science.gov (United States)

Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu

2018-05-01

Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
fMR-adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex.

Science.gov (United States)

van Atteveldt, Nienke M; Blau, Vera C; Blomert, Leo; Goebel, Rainer

2010-02-02

Efficient multisensory integration is of vital importance for adequate interaction with the environment. In addition to basic binding cues like temporal and spatial coherence, meaningful multisensory information is also bound together by content-based associations. Many functional Magnetic Resonance Imaging (fMRI) studies propose the (posterior) superior temporal cortex (STC) as the key structure for integrating meaningful multisensory information. However, a still unanswered question is how superior temporal cortex encodes content-based associations, especially in light of inconsistent results from studies comparing brain activation to semantically matching (congruent) versus nonmatching (incongruent) multisensory inputs. Here, we used fMR-adaptation (fMR-A) in order to circumvent potential problems with standard fMRI approaches, including spatial averaging and amplitude saturation confounds. We presented repetitions of audiovisual stimuli (letter-speech sound pairs) and manipulated the associative relation between the auditory and visual inputs (congruent/incongruent pairs). We predicted that if multisensory neuronal populations exist in STC and encode audiovisual content relatedness, adaptation should be affected by the manipulated audiovisual relation. The results revealed an occipital-temporal network that adapted independently of the audiovisual relation. Interestingly, several smaller clusters distributed over superior temporal cortex within that network, adapted stronger to congruent than to incongruent audiovisual repetitions, indicating sensitivity to content congruency. These results suggest that the revealed clusters contain multisensory neuronal populations that encode content relatedness by selectively responding to congruent audiovisual inputs, since unisensory neuronal populations are assumed to be insensitive to the audiovisual relation. These findings extend our previously revealed mechanism for the integration of letters and speech sounds and
fMR-adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex

Directory of Open Access Journals (Sweden)

Blomert Leo

2010-02-01

Full Text Available Abstract Background Efficient multisensory integration is of vital importance for adequate interaction with the environment. In addition to basic binding cues like temporal and spatial coherence, meaningful multisensory information is also bound together by content-based associations. Many functional Magnetic Resonance Imaging (fMRI studies propose the (posterior superior temporal cortex (STC as the key structure for integrating meaningful multisensory information. However, a still unanswered question is how superior temporal cortex encodes content-based associations, especially in light of inconsistent results from studies comparing brain activation to semantically matching (congruent versus nonmatching (incongruent multisensory inputs. Here, we used fMR-adaptation (fMR-A in order to circumvent potential problems with standard fMRI approaches, including spatial averaging and amplitude saturation confounds. We presented repetitions of audiovisual stimuli (letter-speech sound pairs and manipulated the associative relation between the auditory and visual inputs (congruent/incongruent pairs. We predicted that if multisensory neuronal populations exist in STC and encode audiovisual content relatedness, adaptation should be affected by the manipulated audiovisual relation. Results The results revealed an occipital-temporal network that adapted independently of the audiovisual relation. Interestingly, several smaller clusters distributed over superior temporal cortex within that network, adapted stronger to congruent than to incongruent audiovisual repetitions, indicating sensitivity to content congruency. Conclusions These results suggest that the revealed clusters contain multisensory neuronal populations that encode content relatedness by selectively responding to congruent audiovisual inputs, since unisensory neuronal populations are assumed to be insensitive to the audiovisual relation. These findings extend our previously revealed mechanism for
Crossmodal integration enhances neural representation of task-relevant features in audiovisual face perception.

Science.gov (United States)

Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Liu, Yongjian; Liang, Changhong; Sun, Pei

2015-02-01

Previous studies have shown that audiovisual integration improves identification performance and enhances neural activity in heteromodal brain areas, for example, the posterior superior temporal sulcus/middle temporal gyrus (pSTS/MTG). Furthermore, it has also been demonstrated that attention plays an important role in crossmodal integration. In this study, we considered crossmodal integration in audiovisual facial perception and explored its effect on the neural representation of features. The audiovisual stimuli in the experiment consisted of facial movie clips that could be classified into 2 gender categories (male vs. female) or 2 emotion categories (crying vs. laughing). The visual/auditory-only stimuli were created from these movie clips by removing the auditory/visual contents. The subjects needed to make a judgment about the gender/emotion category for each movie clip in the audiovisual, visual-only, or auditory-only stimulus condition as functional magnetic resonance imaging (fMRI) signals were recorded. The neural representation of the gender/emotion feature was assessed using the decoding accuracy and the brain pattern-related reproducibility indices, obtained by a multivariate pattern analysis method from the fMRI data. In comparison to the visual-only and auditory-only stimulus conditions, we found that audiovisual integration enhanced the neural representation of task-relevant features and that feature-selective attention might play a role of modulation in the audiovisual integration. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Temporal Ventriloquism Reveals Intact Audiovisual Temporal Integration in Amblyopia.

Science.gov (United States)

Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

2018-02-01

We have shown previously that amblyopia involves impaired detection of asynchrony between auditory and visual events. To distinguish whether this impairment represents a defect in temporal integration or nonintegrative multisensory processing (e.g., cross-modal matching), we used the temporal ventriloquism effect in which visual temporal order judgment (TOJ) is normally enhanced by a lagging auditory click. Participants with amblyopia (n = 9) and normally sighted controls (n = 9) performed a visual TOJ task. Pairs of clicks accompanied the two lights such that the first click preceded the first light, or second click lagged the second light by 100, 200, or 450 ms. Baseline audiovisual synchrony and visual-only conditions also were tested. Within both groups, just noticeable differences for the visual TOJ task were significantly reduced compared with baseline in the 100- and 200-ms click lag conditions. Within the amblyopia group, poorer stereo acuity and poorer visual acuity in the amblyopic eye were significantly associated with greater enhancement in visual TOJ performance in the 200-ms click lag condition. Audiovisual temporal integration is intact in amblyopia, as indicated by perceptual enhancement in the temporal ventriloquism effect. Furthermore, poorer stereo acuity and poorer visual acuity in the amblyopic eye are associated with a widened temporal binding window for the effect. These findings suggest that previously reported abnormalities in audiovisual multisensory processing may result from impaired cross-modal matching rather than a diminished capacity for temporal audiovisual integration.
Context-specific effects of musical expertise on audiovisual integration

Science.gov (United States)

Bishop, Laura; Goebl, Werner

2014-01-01

Ensemble musicians exchange auditory and visual signals that can facilitate interpersonal synchronization. Musical expertise improves how precisely auditory and visual signals are perceptually integrated and increases sensitivity to asynchrony between them. Whether expertise improves sensitivity to audiovisual asynchrony in all instrumental contexts or only in those using sound-producing gestures that are within an observer's own motor repertoire is unclear. This study tested the hypothesis that musicians are more sensitive to audiovisual asynchrony in performances featuring their own instrument than in performances featuring other instruments. Short clips were extracted from audio-video recordings of clarinet, piano, and violin performances and presented to highly-skilled clarinetists, pianists, and violinists. Clips either maintained the audiovisual synchrony present in the original recording or were modified so that the video led or lagged behind the audio. Participants indicated whether the audio and video channels in each clip were synchronized. The range of asynchronies most often endorsed as synchronized was assessed as a measure of participants' sensitivities to audiovisual asynchrony. A positive relationship was observed between musical training and sensitivity, with data pooled across stimuli. While participants across expertise groups detected asynchronies most readily in piano stimuli and least readily in violin stimuli, pianists showed significantly better performance for piano stimuli than for either clarinet or violin. These findings suggest that, to an extent, the effects of expertise on audiovisual integration can be instrument-specific; however, the nature of the sound-producing gestures that are observed has a substantial effect on how readily asynchrony is detected as well. PMID:25324819
Detecting Functional Connectivity During Audiovisual Integration with MEG: A Comparison of Connectivity Metrics.

Science.gov (United States)

Ard, Tyler; Carver, Frederick W; Holroyd, Tom; Horwitz, Barry; Coppola, Richard

2015-08-01

In typical magnetoencephalography and/or electroencephalography functional connectivity analysis, researchers select one of several methods that measure a relationship between regions to determine connectivity, such as coherence, power correlations, and others. However, it is largely unknown if some are more suited than others for various types of investigations. In this study, the authors investigate seven connectivity metrics to evaluate which, if any, are sensitive to audiovisual integration by contrasting connectivity when tracking an audiovisual object versus connectivity when tracking a visual object uncorrelated with the auditory stimulus. The authors are able to assess the metrics' performances at detecting audiovisual integration by investigating connectivity between auditory and visual areas. Critically, the authors perform their investigation on a whole-cortex all-to-all mapping, avoiding confounds introduced in seed selection. The authors find that amplitude-based connectivity measures in the beta band detect strong connections between visual and auditory areas during audiovisual integration, specifically between V4/V5 and auditory cortices in the right hemisphere. Conversely, phase-based connectivity measures in the beta band as well as phase and power measures in alpha, gamma, and theta do not show connectivity between audiovisual areas. The authors postulate that while beta power correlations detect audiovisual integration in the current experimental context, it may not always be the best measure to detect connectivity. Instead, it is likely that the brain utilizes a variety of mechanisms in neuronal communication that may produce differential types of temporal relationships.
Perceived synchrony for realistic and dynamic audiovisual events.

Science.gov (United States)

Eg, Ragnhild; Behne, Dawn M

2015-01-01

In well-controlled laboratory experiments, researchers have found that humans can perceive delays between auditory and visual signals as short as 20 ms. Conversely, other experiments have shown that humans can tolerate audiovisual asynchrony that exceeds 200 ms. This seeming contradiction in human temporal sensitivity can be attributed to a number of factors such as experimental approaches and precedence of the asynchronous signals, along with the nature, duration, location, complexity and repetitiveness of the audiovisual stimuli, and even individual differences. In order to better understand how temporal integration of audiovisual events occurs in the real world, we need to close the gap between the experimental setting and the complex setting of everyday life. With this work, we aimed to contribute one brick to the bridge that will close this gap. We compared perceived synchrony for long-running and eventful audiovisual sequences to shorter sequences that contain a single audiovisual event, for three types of content: action, music, and speech. The resulting windows of temporal integration showed that participants were better at detecting asynchrony for the longer stimuli, possibly because the long-running sequences contain multiple corresponding events that offer audiovisual timing cues. Moreover, the points of subjective simultaneity differ between content types, suggesting that the nature of a visual scene could influence the temporal perception of events. An expected outcome from this type of experiment was the rich variation among participants' distributions and the derived points of subjective simultaneity. Hence, the designs of similar experiments call for more participants than traditional psychophysical studies. Heeding this caution, we conclude that existing theories on multisensory perception are ready to be tested on more natural and representative stimuli.
Comparison for younger and older adults: Stimulus temporal asynchrony modulates audiovisual integration.

Science.gov (United States)

Ren, Yanna; Ren, Yanling; Yang, Weiping; Tang, Xiaoyu; Wu, Fengxia; Wu, Qiong; Takahashi, Satoshi; Ejima, Yoshimichi; Wu, Jinglong

2018-02-01

Recent research has shown that the magnitudes of responses to multisensory information are highly dependent on the stimulus structure. The temporal proximity of multiple signal inputs is a critical determinant for cross-modal integration. Here, we investigated the influence that temporal asynchrony has on audiovisual integration in both younger and older adults using event-related potentials (ERP). Our results showed that in the simultaneous audiovisual condition, except for the earliest integration (80-110ms), which occurred in the occipital region for older adults was absent for younger adults, early integration was similar for the younger and older groups. Additionally, late integration was delayed in older adults (280-300ms) compared to younger adults (210-240ms). In audition‑leading vision conditions, the earliest integration (80-110ms) was absent in younger adults but did occur in older adults. Additionally, after increasing the temporal disparity from 50ms to 100ms, late integration was delayed in both younger (from 230 to 290ms to 280-300ms) and older (from 210 to 240ms to 280-300ms) adults. In the audition-lagging vision conditions, integration only occurred in the A100V condition for younger adults and in the A50V condition for older adults. The current results suggested that the audiovisual temporal integration pattern differed between the audition‑leading and audition-lagging vision conditions and further revealed the varying effect of temporal asynchrony on audiovisual integration in younger and older adults. Copyright © 2017 Elsevier B.V. All rights reserved.
Multisensory speech perception without the left superior temporal sulcus.

Science.gov (United States)

Baum, Sarah H; Martin, Randi C; Hamilton, A Cris; Beauchamp, Michael S

2012-09-01

Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. Copyright © 2012 Elsevier Inc. All rights reserved.
Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs.

Science.gov (United States)

Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

2013-01-01

Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.
The development of multisensory speech perception continues into the late childhood years.

Science.gov (United States)

Ross, Lars A; Molholm, Sophie; Blanco, Daniella; Gomez-Ramirez, Manuel; Saint-Amour, Dave; Foxe, John J

2011-06-01

Observing a speaker's articulations substantially improves the intelligibility of spoken speech, especially under noisy listening conditions. This multisensory integration of speech inputs is crucial to effective communication. Appropriate development of this ability has major implications for children in classroom and social settings, and deficits in it have been linked to a number of neurodevelopmental disorders, especially autism. It is clear from structural imaging studies that there is a prolonged maturational course within regions of the perisylvian cortex that persists into late childhood, and these regions have been firmly established as being crucial to speech and language functions. Given this protracted maturational timeframe, we reasoned that multisensory speech processing might well show a similarly protracted developmental course. Previous work in adults has shown that audiovisual enhancement in word recognition is most apparent within a restricted range of signal-to-noise ratios (SNRs). Here, we investigated when these properties emerge during childhood by testing multisensory speech recognition abilities in typically developing children aged between 5 and 14 years, and comparing them with those of adults. By parametrically varying SNRs, we found that children benefited significantly less from observing visual articulations, displaying considerably less audiovisual enhancement. The findings suggest that improvement in the ability to recognize speech-in-noise and in audiovisual integration during speech perception continues quite late into the childhood years. The implication is that a considerable amount of multisensory learning remains to be achieved during the later schooling years, and that explicit efforts to accommodate this learning may well be warranted. European Journal of Neuroscience © 2011 Federation of European Neuroscience Societies and Blackwell Publishing Ltd. No claim to original US government works.
Audiovisual materials are effective for enhancing the correction of articulation disorders in children with cleft palate.

Science.gov (United States)

Pamplona, María Del Carmen; Ysunza, Pablo Antonio; Morales, Santiago

2017-02-01

Children with cleft palate frequently show speech disorders known as compensatory articulation. Compensatory articulation requires a prolonged period of speech intervention that should include reinforcement at home. However, frequently relatives do not know how to work with their children at home. To study whether the use of audiovisual materials especially designed for complementing speech pathology treatment in children with compensatory articulation can be effective for stimulating articulation practice at home and consequently enhancing speech normalization in children with cleft palate. Eighty-two patients with compensatory articulation were studied. Patients were randomly divided into two groups. Both groups received speech pathology treatment aimed to correct articulation placement. In addition, patients from the active group received a set of audiovisual materials to be used at home. Parents were instructed about strategies and ideas about how to use the materials with their children. Severity of compensatory articulation was compared at the onset and at the end of the speech intervention. After the speech therapy period, the group of patients using audiovisual materials at home demonstrated significantly greater improvement in articulation, as compared with the patients receiving speech pathology treatment on - site without audiovisual supporting materials. The results of this study suggest that audiovisual materials especially designed for practicing adequate articulation placement at home can be effective for reinforcing and enhancing speech pathology treatment of patients with cleft palate and compensatory articulation. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

Audiovisual Integration in Children Listening to Spectrally Degraded Speech

Science.gov (United States)

Maidment, David W.; Kang, Hi Jee; Stewart, Hannah J.; Amitay, Sygal

2015-01-01

Purpose: The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Method: Children (n = 69) and adults (n = 15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in…
Greater BOLD variability in older compared with younger adults during audiovisual speech perception.

Directory of Open Access Journals (Sweden)

Sarah H Baum

Full Text Available Older adults exhibit decreased performance and increased trial-to-trial variability on a range of cognitive tasks, including speech perception. We used blood oxygen level dependent functional magnetic resonance imaging (BOLD fMRI to search for neural correlates of these behavioral phenomena. We compared brain responses to simple speech stimuli (audiovisual syllables in 24 healthy older adults (53 to 70 years old and 14 younger adults (23 to 39 years old using two independent analysis strategies: region-of-interest (ROI and voxel-wise whole-brain analysis. While mean response amplitudes were moderately greater in younger adults, older adults had much greater within-subject variability. The greatly increased variability in older adults was observed for both individual voxels in the whole-brain analysis and for ROIs in the left superior temporal sulcus, the left auditory cortex, and the left visual cortex. Increased variability in older adults could not be attributed to differences in head movements between the groups. Increased neural variability may be related to the performance declines and increased behavioral variability that occur with aging.
Audiovisual integration increases the intentional step synchronization of side-by-side walkers.

Science.gov (United States)

Noy, Dominic; Mouta, Sandra; Lamas, Joao; Basso, Daniel; Silva, Carlos; Santos, Jorge A

2017-12-01

When people walk side-by-side, they often synchronize their steps. To achieve this, individuals might cross-modally match audiovisual signals from the movements of the partner and kinesthetic, cutaneous, visual and auditory signals from their own movements. Because signals from different sensory systems are processed with noise and asynchronously, the challenge of the CNS is to derive the best estimate based on this conflicting information. This is currently thought to be done by a mechanism operating as a Maximum Likelihood Estimator (MLE). The present work investigated whether audiovisual signals from the partner are integrated according to MLE in order to synchronize steps during walking. Three experiments were conducted in which the sensory cues from a walking partner were virtually simulated. In Experiment 1 seven participants were instructed to synchronize with human-sized Point Light Walkers and/or footstep sounds. Results revealed highest synchronization performance with auditory and audiovisual cues. This was quantified by the time to achieve synchronization and by synchronization variability. However, this auditory dominance effect might have been due to artifacts of the setup. Therefore, in Experiment 2 human-sized virtual mannequins were implemented. Also, audiovisual stimuli were rendered in real-time and thus were synchronous and co-localized. All four participants synchronized best with audiovisual cues. For three of the four participants results point toward their optimal integration consistent with the MLE model. Experiment 3 yielded performance decrements for all three participants when the cues were incongruent. Overall, these findings suggest that individuals might optimally integrate audiovisual cues to synchronize steps during side-by-side walking. Copyright © 2017 Elsevier B.V. All rights reserved.
Neural Correlates of Temporal Complexity and Synchrony during Audiovisual Correspondence Detection.

Science.gov (United States)

Baumann, Oliver; Vromen, Joyce M G; Cheung, Allen; McFadyen, Jessica; Ren, Yudan; Guo, Christine C

2018-01-01

We often perceive real-life objects as multisensory cues through space and time. A key challenge for audiovisual integration is to match neural signals that not only originate from different sensory modalities but also that typically reach the observer at slightly different times. In humans, complex, unpredictable audiovisual streams lead to higher levels of perceptual coherence than predictable, rhythmic streams. In addition, perceptual coherence for complex signals seems less affected by increased asynchrony between visual and auditory modalities than for simple signals. Here, we used functional magnetic resonance imaging to determine the human neural correlates of audiovisual signals with different levels of temporal complexity and synchrony. Our study demonstrated that greater perceptual asynchrony and lower signal complexity impaired performance in an audiovisual coherence-matching task. Differences in asynchrony and complexity were also underpinned by a partially different set of brain regions. In particular, our results suggest that, while regions in the dorsolateral prefrontal cortex (DLPFC) were modulated by differences in memory load due to stimulus asynchrony, areas traditionally thought to be involved in speech production and recognition, such as the inferior frontal and superior temporal cortex, were modulated by the temporal complexity of the audiovisual signals. Our results, therefore, indicate specific processing roles for different subregions of the fronto-temporal cortex during audiovisual coherence detection.
Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss.

Science.gov (United States)

Brooks, Cassandra J; Chan, Yu Man; Anderson, Andrew J; McKendrick, Allison M

2018-01-01

Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information.
Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss

Science.gov (United States)

Brooks, Cassandra J.; Chan, Yu Man; Anderson, Andrew J.; McKendrick, Allison M.

2018-01-01

Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information. PMID:29867415
The effect of varying talker identity and listening conditions on gaze behavior during audiovisual speech perception.

Science.gov (United States)

Buchan, Julie N; Paré, Martin; Munhall, Kevin G

2008-11-25

During face-to-face conversation the face provides auditory and visual linguistic information, and also conveys information about the identity of the speaker. This study investigated behavioral strategies involved in gathering visual information while watching talking faces. The effects of varying talker identity and varying the intelligibility of speech (by adding acoustic noise) on gaze behavior were measured with an eyetracker. Varying the intelligibility of the speech by adding noise had a noticeable effect on the location and duration of fixations. When noise was present subjects adopted a vantage point that was more centralized on the face by reducing the frequency of the fixations on the eyes and mouth and lengthening the duration of their gaze fixations on the nose and mouth. Varying talker identity resulted in a more modest change in gaze behavior that was modulated by the intelligibility of the speech. Although subjects generally used similar strategies to extract visual information in both talker variability conditions, when noise was absent there were more fixations on the mouth when viewing a different talker every trial as opposed to the same talker every trial. These findings provide a useful baseline for studies examining gaze behavior during audiovisual speech perception and perception of dynamic faces.
The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.

Science.gov (United States)

Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo

2009-04-01

The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained
Conditioning Influences Audio-Visual Integration by Increasing Sound Saliency

Directory of Open Access Journals (Sweden)

Fabrizio Leo

2011-10-01

Full Text Available We investigated the effect of prior conditioning of an auditory stimulus on audiovisual integration in a series of four psychophysical experiments. The experiments factorially manipulated the conditioning procedure (picture vs monetary conditioning and multisensory paradigm (2AFC visual detection vs redundant target paradigm. In the conditioning sessions, subjects were presented with three pure tones (= conditioned stimulus, CS that were paired with neutral, positive, or negative unconditioned stimuli (US, monetary: +50 euro cents,.–50 cents, 0 cents; pictures: highly pleasant, unpleasant, and neutral IAPS. In a 2AFC visual selective attention paradigm, detection of near-threshold Gabors was improved by concurrent sounds that had previously been paired with a positive (monetary or negative (picture outcome relative to neutral sounds. In the redundant target paradigm, sounds previously paired with positive (monetary or negative (picture outcomes increased response speed to both auditory and audiovisual targets similarly. Importantly, prior conditioning did not increase the multisensory response facilitation (ie, (A + V/2 – AV or the race model violation. Collectively, our results suggest that prior conditioning primarily increases the saliency of the auditory stimulus per se rather than influencing audiovisual integration directly. In turn, conditioned sounds are rendered more potent for increasing response accuracy or speed in detection of visual targets.
Sensitivity to audio-visual synchrony and its relation to language abilities in children with and without ASD.

Science.gov (United States)

Righi, Giulia; Tenenbaum, Elena J; McCormick, Carolyn; Blossom, Megan; Amso, Dima; Sheinkopf, Stephen J

2018-04-01

Autism Spectrum Disorder (ASD) is often accompanied by deficits in speech and language processing. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to examine whether young children with ASD show reduced sensitivity to temporal asynchronies in a speech processing task when compared to typically developing controls, and to examine how this sensitivity might relate to language proficiency. Using automated eye tracking methods, we found that children with ASD failed to demonstrate sensitivity to asynchronies of 0.3s, 0.6s, or 1.0s between a video of a woman speaking and the corresponding audio track. In contrast, typically developing children who were language-matched to the ASD group, were sensitive to both 0.6s and 1.0s asynchronies. We also demonstrated that individual differences in sensitivity to audiovisual asynchronies and individual differences in orientation to relevant facial features were both correlated with scores on a standardized measure of language abilities. Results are discussed in the context of attention to visual language and audio-visual processing as potential precursors to language impairment in ASD. Autism Res 2018, 11: 645-653. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to explore whether children with ASD process audio-visual synchrony in ways comparable to their typically developing peers, and the relationship between preference for synchrony and language ability. Results showed that
Assessing the effect of physical differences in the articulation of consonants and vowels on audiovisual temporal perception

Science.gov (United States)

Vatakis, Argiro; Maragos, Petros; Rodomagoulakis, Isidoros; Spence, Charles

2012-01-01

We investigated how the physical differences associated with the articulation of speech affect the temporal aspects of audiovisual speech perception. Video clips of consonants and vowels uttered by three different speakers were presented. The video clips were analyzed using an auditory-visual signal saliency model in order to compare signal saliency and behavioral data. Participants made temporal order judgments (TOJs) regarding which speech-stream (auditory or visual) had been presented first. The sensitivity of participants' TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. We expected that in the case of the place of articulation and roundedness, where the visual-speech signal is more salient, temporal perception of speech would be modulated by the visual-speech signal. No such effect was expected for the manner of articulation or height. The results demonstrate that for place and manner of articulation, participants' temporal percept was affected (although not always significantly) by highly-salient speech-signals with the visual-signals requiring smaller visual-leads at the PSS. This was not the case when height was evaluated. These findings suggest that in the case of audiovisual speech perception, a highly salient visual-speech signal may lead to higher probabilities regarding the identity of the auditory-signal that modulate the temporal window of multisensory integration of the speech-stimulus. PMID:23060756
Audiovisual Capture with Ambiguous Audiovisual Stimuli

Directory of Open Access Journals (Sweden)

Jean-Michel Hupé

2011-10-01

Full Text Available Audiovisual capture happens when information across modalities get fused into a coherent percept. Ambiguous multi-modal stimuli have the potential to be powerful tools to observe such effects. We used such stimuli made of temporally synchronized and spatially co-localized visual flashes and auditory tones. The flashes produced bistable apparent motion and the tones produced ambiguous streaming. We measured strong interferences between perceptual decisions in each modality, a case of audiovisual capture. However, does this mean that audiovisual capture occurs before bistable decision? We argue that this is not the case, as the interference had a slow temporal dynamics and was modulated by audiovisual congruence, suggestive of high-level factors such as attention or intention. We propose a framework to integrate bistability and audiovisual capture, which distinguishes between “what” competes and “how” it competes (Hupé et al., 2008. The audiovisual interactions may be the result of contextual influences on neural representations (“what” competes, quite independent from the causal mechanisms of perceptual switches (“how” it competes. This framework predicts that audiovisual capture can bias bistability especially if modalities are congruent (Sato et al., 2007, but that is fundamentally distinct in nature from the bistable competition mechanism.
Spectral integration in speech and non-speech sounds

Science.gov (United States)

Jacewicz, Ewa

2005-04-01

Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.
The contribution of perceptual factors and training on varying audiovisual integration capacity.

Science.gov (United States)

Wilbiks, Jonathan M P; Dyson, Benjamin J

2018-06-01

The suggestion that the capacity of audiovisual integration has an upper limit of 1 was challenged in 4 experiments using perceptual factors and training to enhance the binding of auditory and visual information. Participants were required to note a number of specific visual dot locations that changed in polarity when a critical auditory stimulus was presented, under relatively fast (200-ms stimulus onset asynchrony [SOA]) and slow (700-ms SOA) rates of presentation. In Experiment 1, transient cross-modal congruency between the brightness of polarity change and pitch of the auditory tone was manipulated. In Experiment 2, sustained chunking was enabled on certain trials by connecting varying dot locations with vertices. In Experiment 3, training was employed to determine if capacity would increase through repeated experience with an intermediate presentation rate (450 ms). Estimates of audiovisual integration capacity (K) were larger than 1 during cross-modal congruency at slow presentation rates (Experiment 1), during perceptual chunking at slow and fast presentation rates (Experiment 2), and, during an intermediate presentation rate posttraining (Experiment 3). Finally, Experiment 4 showed a linear increase in K using SOAs ranging from 100 to 600 ms, suggestive of quantitative rather than qualitative changes in the mechanisms in audiovisual integration as a function of presentation rate. The data compromise the suggestion that the capacity of audiovisual integration is limited to 1 and suggest that the ability to bind sounds to sights is contingent on individual and environmental factors. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
PRACTICING SPEECH THERAPY INTERVENTION FOR SOCIAL INTEGRATION OF CHILDREN WITH SPEECH DISORDERS

Directory of Open Access Journals (Sweden)

Martin Ofelia POPESCU

2016-11-01

Full Text Available The article presents a concise speech correction intervention program in of dyslalia in conjunction with capacity development of intra, interpersonal and social integration of children with speech disorders. The program main objectives represent: the potential increasing of individual social integration by correcting speech disorders in conjunction with intra- and interpersonal capacity, the potential growth of children and community groups for social integration by optimizing the socio-relational context of children with speech disorder. In the program were included 60 children / students with dyslalia speech disorders (monomorphic and polymorphic dyslalia, from 11 educational institutions - 6 kindergartens and 5 schools / secondary schools, joined with inter-school logopedic centre (CLI from Targu Jiu city and areas of Gorj district. The program was implemented under the assumption that therapeutic-formative intervention to correct speech disorders and facilitate the social integration will lead, in combination with correct pronunciation disorders, to social integration optimization of children with speech disorders. The results conirm the hypothesis and gives facts about the intervention program eficiency.
Optimal Audiovisual Integration in the Ventriloquism Effect But Pervasive Deficits in Unisensory Spatial Localization in Amblyopia.

Science.gov (United States)

Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

2018-01-01

Classically understood as a deficit in spatial vision, amblyopia is increasingly recognized to also impair audiovisual multisensory processing. Studies to date, however, have not determined whether the audiovisual abnormalities reflect a failure of multisensory integration, or an optimal strategy in the face of unisensory impairment. We use the ventriloquism effect and the maximum-likelihood estimation (MLE) model of optimal integration to investigate integration of audiovisual spatial information in amblyopia. Participants with unilateral amblyopia (n = 14; mean age 28.8 years; 7 anisometropic, 3 strabismic, 4 mixed mechanism) and visually normal controls (n = 16, mean age 29.2 years) localized brief unimodal auditory, unimodal visual, and bimodal (audiovisual) stimuli during binocular viewing using a location discrimination task. A subset of bimodal trials involved the ventriloquism effect, an illusion in which auditory and visual stimuli originating from different locations are perceived as originating from a single location. Localization precision and bias were determined by psychometric curve fitting, and the observed parameters were compared with predictions from the MLE model. Spatial localization precision was significantly reduced in the amblyopia group compared with the control group for unimodal visual, unimodal auditory, and bimodal stimuli. Analyses of localization precision and bias for bimodal stimuli showed no significant deviations from the MLE model in either the amblyopia group or the control group. Despite pervasive deficits in localization precision for visual, auditory, and audiovisual stimuli, audiovisual integration remains intact and optimal in unilateral amblyopia.
Audiovisual integration in hemianopia: A neurocomputational account based on cortico-collicular interaction.

Science.gov (United States)

Magosso, Elisa; Bertini, Caterina; Cuppini, Cristiano; Ursino, Mauro

2016-10-01

Hemianopic patients retain some abilities to integrate audiovisual stimuli in the blind hemifield, showing both modulation of visual perception by auditory stimuli and modulation of auditory perception by visual stimuli. Indeed, conscious detection of a visual target in the blind hemifield can be improved by a spatially coincident auditory stimulus (auditory enhancement of visual detection), while a visual stimulus in the blind hemifield can improve localization of a spatially coincident auditory stimulus (visual enhancement of auditory localization). To gain more insight into the neural mechanisms underlying these two perceptual phenomena, we propose a neural network model including areas of neurons representing the retina, primary visual cortex (V1), extrastriate visual cortex, auditory cortex and the Superior Colliculus (SC). The visual and auditory modalities in the network interact via both direct cortical-cortical connections and subcortical-cortical connections involving the SC; the latter, in particular, integrates visual and auditory information and projects back to the cortices. Hemianopic patients were simulated by unilaterally lesioning V1, and preserving spared islands of V1 tissue within the lesion, to analyze the role of residual V1 neurons in mediating audiovisual integration. The network is able to reproduce the audiovisual phenomena in hemianopic patients, linking perceptions to neural activations, and disentangles the individual contribution of specific neural circuits and areas via sensitivity analyses. The study suggests i) a common key role of SC-cortical connections in mediating the two audiovisual phenomena; ii) a different role of visual cortices in the two phenomena: auditory enhancement of conscious visual detection being conditional on surviving V1 islands, while visual enhancement of auditory localization persisting even after complete V1 damage. The present study may contribute to advance understanding of the audiovisual dialogue
Effects of noise and audiovisual cues on speech processing in adults with and without ADHD.

Science.gov (United States)

Michalek, Anne M P; Watson, Silvana M; Ash, Ivan; Ringleb, Stacie; Raymer, Anastasia

2014-03-01

This study examined the interplay among internal (e.g. attention, working memory abilities) and external (e.g. background noise, visual information) factors in individuals with and without ADHD. A 2 × 2 × 6 mixed design with correlational analyses was used to compare participant results on a standardized listening in noise sentence repetition task (QuickSin; Killion et al, 2004 ), presented in an auditory and an audiovisual condition as signal-to-noise ratio (SNR) varied from 25-0 dB and to determine individual differences in working memory capacity and short-term recall. Thirty-eight young adults without ADHD and twenty-five young adults with ADHD. Diagnosis, modality, and signal-to-noise ratio all affected the ability to process speech in noise. The interaction between the diagnosis of ADHD, the presence of visual cues, and the level of noise had an effect on a person's ability to process speech in noise. conclusion: Young adults with ADHD benefited less from visual information during noise than young adults without ADHD, an effect influenced by working memory abilities.
Multisensory and Modality Specific Processing of Visual Speech in Different Regions of the Premotor Cortex

Directory of Open Access Journals (Sweden)

Daniel eCallan

2014-05-01

Full Text Available Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex has been shown to be active during both observation and execution of action (‘Mirror System’ properties, and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI study, participants identified vowels produced by a speaker in audio-visual (saw the speaker’s articulating face and heard her voice, visual only (only saw the speaker’s articulating face, and audio only (only heard the speaker’s voice conditions with varying audio signal-to-noise ratios in order to determine the regions of the premotor cortex involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the fMRI analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and premotor cortex. The left ventral inferior premotor cortex showed properties of multimodal (audio-visual enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the premotor cortex are involved with mapping unimodal (in this case visual sensory features of the speech signal with
Looking Behavior and Audiovisual Speech Understanding in Children With Normal Hearing and Children With Mild Bilateral or Unilateral Hearing Loss.

Science.gov (United States)

Lewis, Dawna E; Smith, Nicholas A; Spalding, Jody L; Valente, Daniel L

Visual information from talkers facilitates speech intelligibility for listeners when audibility is challenged by environmental noise and hearing loss. Less is known about how listeners actively process and attend to visual information from different talkers in complex multi-talker environments. This study tracked looking behavior in children with normal hearing (NH), mild bilateral hearing loss (MBHL), and unilateral hearing loss (UHL) in a complex multi-talker environment to examine the extent to which children look at talkers and whether looking patterns relate to performance on a speech-understanding task. It was hypothesized that performance would decrease as perceptual complexity increased and that children with hearing loss would perform more poorly than their peers with NH. Children with MBHL or UHL were expected to demonstrate greater attention to individual talkers during multi-talker exchanges, indicating that they were more likely to attempt to use visual information from talkers to assist in speech understanding in adverse acoustics. It also was of interest to examine whether MBHL, versus UHL, would differentially affect performance and looking behavior. Eighteen children with NH, eight children with MBHL, and 10 children with UHL participated (8-12 years). They followed audiovisual instructions for placing objects on a mat under three conditions: a single talker providing instructions via a video monitor, four possible talkers alternately providing instructions on separate monitors in front of the listener, and the same four talkers providing both target and nontarget information. Multi-talker background noise was presented at a 5 dB signal-to-noise ratio during testing. An eye tracker monitored looking behavior while children performed the experimental task. Behavioral task performance was higher for children with NH than for either group of children with hearing loss. There were no differences in performance between children with UHL and children

Multi-sensory learning and learning to read.

Science.gov (United States)

Blomert, Leo; Froyen, Dries

2010-09-01

The basis of literacy acquisition in alphabetic orthographies is the learning of the associations between the letters and the corresponding speech sounds. In spite of this primacy in learning to read, there is only scarce knowledge on how this audiovisual integration process works and which mechanisms are involved. Recent electrophysiological studies of letter-speech sound processing have revealed that normally developing readers take years to automate these associations and dyslexic readers hardly exhibit automation of these associations. It is argued that the reason for this effortful learning may reside in the nature of the audiovisual process that is recruited for the integration of in principle arbitrarily linked elements. It is shown that letter-speech sound integration does not resemble the processes involved in the integration of natural audiovisual objects such as audiovisual speech. The automatic symmetrical recruitment of the assumedly uni-sensory visual and auditory cortices in audiovisual speech integration does not occur for letter and speech sound integration. It is also argued that letter-speech sound integration only partly resembles the integration of arbitrarily linked unfamiliar audiovisual objects. Letter-sound integration and artificial audiovisual objects share the necessity of a narrow time window for integration to occur. However, they differ from these artificial objects, because they constitute an integration of partly familiar elements which acquire meaning through the learning of an orthography. Although letter-speech sound pairs share similarities with audiovisual speech processing as well as with unfamiliar, arbitrary objects, it seems that letter-speech sound pairs develop into unique audiovisual objects that furthermore have to be processed in a unique way in order to enable fluent reading and thus very likely recruit other neurobiological learning mechanisms than the ones involved in learning natural or arbitrary unfamiliar
Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.

Directory of Open Access Journals (Sweden)

Kirsten E Smayda

Full Text Available Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35 and thirty-three older adults (ages 60-90 to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger
INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

Directory of Open Access Journals (Sweden)

J. SANGEETHA

2015-02-01

Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.
Summarizing Audiovisual Contents of a Video Program

Science.gov (United States)

Gong, Yihong

2003-12-01

In this paper, we focus on video programs that are intended to disseminate information and knowledge such as news, documentaries, seminars, etc, and present an audiovisual summarization system that summarizes the audio and visual contents of the given video separately, and then integrating the two summaries with a partial alignment. The audio summary is created by selecting spoken sentences that best present the main content of the audio speech while the visual summary is created by eliminating duplicates/redundancies and preserving visually rich contents in the image stream. The alignment operation aims to synchronize each spoken sentence in the audio summary with its corresponding speaker's face and to preserve the rich content in the visual summary. A Bipartite Graph-based audiovisual alignment algorithm is developed to efficiently find the best alignment solution that satisfies these alignment requirements. With the proposed system, we strive to produce a video summary that: (1) provides a natural visual and audio content overview, and (2) maximizes the coverage for both audio and visual contents of the original video without having to sacrifice either of them.
Non-fluent speech following stroke is caused by impaired efference copy.

Science.gov (United States)

Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

2017-09-01

Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.
A Comparison of the Development of Audiovisual Integration in Children with Autism Spectrum Disorders and Typically Developing Children

Science.gov (United States)

Taylor, Natalie; Isaac, Claire; Milne, Elizabeth

2010-01-01

This study aimed to investigate the development of audiovisual integration in children with Autism Spectrum Disorder (ASD). Audiovisual integration was measured using the McGurk effect in children with ASD aged 7-16 years and typically developing children (control group) matched approximately for age, sex, nonverbal ability and verbal ability.…
Effects of auditory stimuli in the horizontal plane on audiovisual integration: an event-related potential study.

Science.gov (United States)

Yang, Weiping; Li, Qi; Ochi, Tatsuya; Yang, Jingjing; Gao, Yulin; Tang, Xiaoyu; Takahashi, Satoshi; Wu, Jinglong

2013-01-01

This article aims to investigate whether auditory stimuli in the horizontal plane, particularly originating from behind the participant, affect audiovisual integration by using behavioral and event-related potential (ERP) measurements. In this study, visual stimuli were presented directly in front of the participants, auditory stimuli were presented at one location in an equidistant horizontal plane at the front (0°, the fixation point), right (90°), back (180°), or left (270°) of the participants, and audiovisual stimuli that include both visual stimuli and auditory stimuli originating from one of the four locations were simultaneously presented. These stimuli were presented randomly with equal probability; during this time, participants were asked to attend to the visual stimulus and respond promptly only to visual target stimuli (a unimodal visual target stimulus and the visual target of the audiovisual stimulus). A significant facilitation of reaction times and hit rates was obtained following audiovisual stimulation, irrespective of whether the auditory stimuli were presented in the front or back of the participant. However, no significant interactions were found between visual stimuli and auditory stimuli from the right or left. Two main ERP components related to audiovisual integration were found: first, auditory stimuli from the front location produced an ERP reaction over the right temporal area and right occipital area at approximately 160-200 milliseconds; second, auditory stimuli from the back produced a reaction over the parietal and occipital areas at approximately 360-400 milliseconds. Our results confirmed that audiovisual integration was also elicited, even though auditory stimuli were presented behind the participant, but no integration occurred when auditory stimuli were presented in the right or left spaces, suggesting that the human brain might be particularly sensitive to information received from behind than both sides.
"Audio-visuel Integre" et Communication(s) ("Integrated Audiovisual" and Communication)

Science.gov (United States)

Moirand, Sophie

1974-01-01

This article examines the usefullness of the audiovisual method in teaching communication competence, and calls for research in audiovisual methods as well as in communication theory for improvement in these areas. (Text is in French.) (AM)
Multisensory Integration in Cochlear Implant Recipients.

Science.gov (United States)

Stevenson, Ryan A; Sheffield, Sterling W; Butera, Iliza M; Gifford, René H; Wallace, Mark T

Speech perception is inherently a multisensory process involving integration of auditory and visual cues. Multisensory integration in cochlear implant (CI) recipients is a unique circumstance in that the integration occurs after auditory deprivation and the provision of hearing via the CI. Despite the clear importance of multisensory cues for perception, in general, and for speech intelligibility, specifically, the topic of multisensory perceptual benefits in CI users has only recently begun to emerge as an area of inquiry. We review the research that has been conducted on multisensory integration in CI users to date and suggest a number of areas needing further research. The overall pattern of results indicates that many CI recipients show at least some perceptual gain that can be attributable to multisensory integration. The extent of this gain, however, varies based on a number of factors, including age of implantation and specific task being assessed (e.g., stimulus detection, phoneme perception, word recognition). Although both children and adults with CIs obtain audiovisual benefits for phoneme, word, and sentence stimuli, neither group shows demonstrable gain for suprasegmental feature perception. Additionally, only early-implanted children and the highest performing adults obtain audiovisual integration benefits similar to individuals with normal hearing. Increasing age of implantation in children is associated with poorer gains resultant from audiovisual integration, suggesting a sensitive period in development for the brain networks that subserve these integrative functions, as well as length of auditory experience. This finding highlights the need for early detection of and intervention for hearing loss, not only in terms of auditory perception, but also in terms of the behavioral and perceptual benefits of audiovisual processing. Importantly, patterns of auditory, visual, and audiovisual responses suggest that underlying integrative processes may be
Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

Science.gov (United States)

Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-01-01

A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production
The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise

NARCIS (Netherlands)

Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

2008-01-01

OBJECTIVES: The aim of this study was to evaluate the benefit that listeners obtain from visually presented output from an automatic speech recognition (ASR) system during listening to speech in noise. DESIGN: Auditory-alone and audiovisual speech reception thresholds (SRTs) were measured. The SRT
How can audiovisual pathways enhance the temporal resolution of time-compressed speech in blind subjects?

Directory of Open Access Journals (Sweden)

Ingo eHertrich

2013-08-01

Full Text Available In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1 the auditory system, (2 supramodal representations, and (3 frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for "reading" texts at ultra-fast speaking rates (> 16 syllables/s, exceeding by far the normal range of 6 syllables/s. An fMRI study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic (MEG measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the "bottleneck" for understanding time-compressed speech seems related to a demand for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments considering cross-modal adjustments in space, time, and object recognition.
Integration of speech and gesture in aphasia.

Science.gov (United States)

Cocks, Naomi; Byrne, Suzanne; Pritchard, Madeleine; Morgan, Gary; Dipper, Lucy

2018-02-07

Information from speech and gesture is often integrated to comprehend a message. This integration process requires the appropriate allocation of cognitive resources to both the gesture and speech modalities. People with aphasia are likely to find integration of gesture and speech difficult. This is due to a reduction in cognitive resources, a difficulty with resource allocation or a combination of the two. Despite it being likely that people who have aphasia will have difficulty with integration, empirical evidence describing this difficulty is limited. Such a difficulty was found in a single case study by Cocks et al. in 2009, and is replicated here with a greater number of participants. To determine whether individuals with aphasia have difficulties understanding messages in which they have to integrate speech and gesture. Thirty-one participants with aphasia (PWA) and 30 control participants watched videos of an actor communicating a message in three different conditions: verbal only, gesture only, and verbal and gesture message combined. The message related to an action in which the name of the action (e.g., 'eat') was provided verbally and the manner of the action (e.g., hands in a position as though eating a burger) was provided gesturally. Participants then selected a picture that 'best matched' the message conveyed from a choice of four pictures which represented a gesture match only (G match), a verbal match only (V match), an integrated verbal-gesture match (Target) and an unrelated foil (UR). To determine the gain that participants obtained from integrating gesture and speech, a measure of multimodal gain (MMG) was calculated. The PWA were less able to integrate gesture and speech than the control participants and had significantly lower MMG scores. When the PWA had difficulty integrating, they more frequently selected the verbal match. The findings suggest that people with aphasia can have difficulty integrating speech and gesture in order to obtain
Audiovisual integration in depth: multisensory binding and gain as a function of distance.

Science.gov (United States)

Noel, Jean-Paul; Modi, Kahan; Wallace, Mark T; Van der Stoep, Nathan

2018-07-01

The integration of information across sensory modalities is dependent on the spatiotemporal characteristics of the stimuli that are paired. Despite large variation in the distance over which events occur in our environment, relatively little is known regarding how stimulus-observer distance affects multisensory integration. Prior work has suggested that exteroceptive stimuli are integrated over larger temporal intervals in near relative to far space, and that larger multisensory facilitations are evident in far relative to near space. Here, we sought to examine the interrelationship between these previously established distance-related features of multisensory processing. Participants performed an audiovisual simultaneity judgment and redundant target task in near and far space, while audiovisual stimuli were presented at a range of temporal delays (i.e., stimulus onset asynchronies). In line with the previous findings, temporal acuity was poorer in near relative to far space. Furthermore, reaction time to asynchronously presented audiovisual targets suggested a temporal window for fast detection-a range of stimuli asynchronies that was also larger in near as compared to far space. However, the range of reaction times over which multisensory response enhancement was observed was limited to a restricted range of relatively small (i.e., 150 ms) asynchronies, and did not differ significantly between near and far space. Furthermore, for synchronous presentations, these distance-related (i.e., near vs. far) modulations in temporal acuity and multisensory gain correlated negatively at an individual subject level. Thus, the findings support the conclusion that multisensory temporal binding and gain are asymmetrically modulated as a function of distance from the observer, and specifies that this relationship is specific for temporally synchronous audiovisual stimulus presentations.
Audiovisual Simultaneity Judgment and Rapid Recalibration throughout the Lifespan.

Science.gov (United States)

Noel, Jean-Paul; De Niear, Matthew; Van der Burg, Erik; Wallace, Mark T

2016-01-01

Multisensory interactions are well established to convey an array of perceptual and behavioral benefits. One of the key features of multisensory interactions is the temporal structure of the stimuli combined. In an effort to better characterize how temporal factors influence multisensory interactions across the lifespan, we examined audiovisual simultaneity judgment and the degree of rapid recalibration to paired audiovisual stimuli (Flash-Beep and Speech) in a sample of 220 participants ranging from 7 to 86 years of age. Results demonstrate a surprisingly protracted developmental time-course for both audiovisual simultaneity judgment and rapid recalibration, with neither reaching maturity until well into adolescence. Interestingly, correlational analyses revealed that audiovisual simultaneity judgments (i.e., the size of the audiovisual temporal window of simultaneity) and rapid recalibration significantly co-varied as a function of age. Together, our results represent the most complete description of age-related changes in audiovisual simultaneity judgments to date, as well as being the first to describe changes in the degree of rapid recalibration as a function of age. We propose that the developmental time-course of rapid recalibration scaffolds the maturation of more durable audiovisual temporal representations.
Monkey Lipsmacking Develops Like the Human Speech Rhythm

Science.gov (United States)

Morrill, Ryan J.; Paukner, Annika; Ferrari, Pier F.; Ghazanfar, Asif A.

2012-01-01

Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved "de novo" in humans. An alternative account--the one we explored here--is that the rhythm of speech evolved through the modification of rhythmic facial…
Reliability-Weighted Integration of Audiovisual Signals Can Be Modulated by Top-down Attention

Science.gov (United States)

Noppeney, Uta

2018-01-01

Abstract Behaviorally, it is well established that human observers integrate signals near-optimally weighted in proportion to their reliabilities as predicted by maximum likelihood estimation. Yet, despite abundant behavioral evidence, it is unclear how the human brain accomplishes this feat. In a spatial ventriloquist paradigm, participants were presented with auditory, visual, and audiovisual signals and reported the location of the auditory or the visual signal. Combining psychophysics, multivariate functional MRI (fMRI) decoding, and models of maximum likelihood estimation (MLE), we characterized the computational operations underlying audiovisual integration at distinct cortical levels. We estimated observers’ behavioral weights by fitting psychometric functions to participants’ localization responses. Likewise, we estimated the neural weights by fitting neurometric functions to spatial locations decoded from regional fMRI activation patterns. Our results demonstrate that low-level auditory and visual areas encode predominantly the spatial location of the signal component of a region’s preferred auditory (or visual) modality. By contrast, intraparietal sulcus forms spatial representations by integrating auditory and visual signals weighted by their reliabilities. Critically, the neural and behavioral weights and the variance of the spatial representations depended not only on the sensory reliabilities as predicted by the MLE model but also on participants’ modality-specific attention and report (i.e., visual vs. auditory). These results suggest that audiovisual integration is not exclusively determined by bottom-up sensory reliabilities. Instead, modality-specific attention and report can flexibly modulate how intraparietal sulcus integrates sensory signals into spatial representations to guide behavioral responses (e.g., localization and orienting). PMID:29527567
Audiovisual speech perception at various presentation levels in Mandarin-speaking adults with cochlear implants.

Directory of Open Access Journals (Sweden)

Shu-Yu Liu

Full Text Available (1 To evaluate the recognition of words, phonemes and lexical tones in audiovisual (AV and auditory-only (AO modes in Mandarin-speaking adults with cochlear implants (CIs; (2 to understand the effect of presentation levels on AV speech perception; (3 to learn the effect of hearing experience on AV speech perception.Thirteen deaf adults (age = 29.1±13.5 years; 8 male, 5 female who had used CIs for >6 months and 10 normal-hearing (NH adults participated in this study. Seven of them were prelingually deaf, and 6 postlingually deaf. The Mandarin Monosyllablic Word Recognition Test was used to assess recognition of words, phonemes and lexical tones in AV and AO conditions at 3 presentation levels: speech detection threshold (SDT, speech recognition threshold (SRT and 10 dB SL (re:SRT.The prelingual group had better phoneme recognition in the AV mode than in the AO mode at SDT and SRT (both p = 0.016, and so did the NH group at SDT (p = 0.004. Mode difference was not noted in the postlingual group. None of the groups had significantly different tone recognition in the 2 modes. The prelingual and postlingual groups had significantly better phoneme and tone recognition than the NH one at SDT in the AO mode (p = 0.016 and p = 0.002 for phonemes; p = 0.001 and p<0.001 for tones but were outperformed by the NH group at 10 dB SL (re:SRT in both modes (both p<0.001 for phonemes; p<0.001 and p = 0.002 for tones. The recognition scores had a significant correlation with group with age and sex controlled (p<0.001.Visual input may help prelingually deaf implantees to recognize phonemes but may not augment Mandarin tone recognition. The effect of presentation level seems minimal on CI users' AV perception. This indicates special considerations in developing audiological assessment protocols and rehabilitation strategies for implantees who speak tonal languages.
The Dynamics and Neural Correlates of Audio-Visual Integration Capacity as Determined by Temporal Unpredictability, Proactive Interference, and SOA.

Science.gov (United States)

Wilbiks, Jonathan M P; Dyson, Benjamin J

2016-01-01

Over 5 experiments, we challenge the idea that the capacity of audio-visual integration need be fixed at 1 item. We observe that the conditions under which audio-visual integration is most likely to exceed 1 occur when stimulus change operates at a slow rather than fast rate of presentation and when the task is of intermediate difficulty such as when low levels of proactive interference (3 rather than 8 interfering visual presentations) are combined with the temporal unpredictability of the critical frame (Experiment 2), or, high levels of proactive interference are combined with the temporal predictability of the critical frame (Experiment 4). Neural data suggest that capacity might also be determined by the quality of perceptual information entering working memory. Experiment 5 supported the proposition that audio-visual integration was at play during the previous experiments. The data are consistent with the dynamic nature usually associated with cross-modal binding, and while audio-visual integration capacity likely cannot exceed uni-modal capacity estimates, performance may be better than being able to associate only one visual stimulus with one auditory stimulus.
Aging and Spectro-Temporal Integration of Speech

Directory of Open Access Journals (Sweden)

John H. Grose

2016-10-01

Full Text Available The purpose of this study was to determine the effects of age on the spectro-temporal integration of speech. The hypothesis was that the integration of speech fragments distributed over frequency, time, and ear of presentation is reduced in older listeners—even for those with good audiometric hearing. Younger, middle-aged, and older listeners (10 per group with good audiometric hearing participated. They were each tested under seven conditions that encompassed combinations of spectral, temporal, and binaural integration. Sentences were filtered into two bands centered at 500 Hz and 2500 Hz, with criterion bandwidth tailored for each participant. In some conditions, the speech bands were individually square wave interrupted at a rate of 10 Hz. Configurations of uninterrupted, synchronously interrupted, and asynchronously interrupted frequency bands were constructed that constituted speech fragments distributed across frequency, time, and ear of presentation. The over-arching finding was that, for most configurations, performance was not differentially affected by listener age. Although speech intelligibility varied across condition, there was no evidence of performance deficits in older listeners in any condition. This study indicates that age, per se, does not necessarily undermine the ability to integrate fragments of speech dispersed across frequency and time.

The Dynamics and Neural Correlates of Audio-Visual Integration Capacity as Determined by Temporal Unpredictability, Proactive Interference, and SOA.

Directory of Open Access Journals (Sweden)

Jonathan M P Wilbiks

Full Text Available Over 5 experiments, we challenge the idea that the capacity of audio-visual integration need be fixed at 1 item. We observe that the conditions under which audio-visual integration is most likely to exceed 1 occur when stimulus change operates at a slow rather than fast rate of presentation and when the task is of intermediate difficulty such as when low levels of proactive interference (3 rather than 8 interfering visual presentations are combined with the temporal unpredictability of the critical frame (Experiment 2, or, high levels of proactive interference are combined with the temporal predictability of the critical frame (Experiment 4. Neural data suggest that capacity might also be determined by the quality of perceptual information entering working memory. Experiment 5 supported the proposition that audio-visual integration was at play during the previous experiments. The data are consistent with the dynamic nature usually associated with cross-modal binding, and while audio-visual integration capacity likely cannot exceed uni-modal capacity estimates, performance may be better than being able to associate only one visual stimulus with one auditory stimulus.
Visual speech alters the discrimination and identification of non-intact auditory speech in children with hearing loss.

Science.gov (United States)

Jerger, Susan; Damian, Markus F; McAlpine, Rachel P; Abdi, Hervé

2017-03-01

Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/-B/aa or/-B/az). The items started with an easy-to-speechread/B/or difficult-to-speechread/G/onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/-B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same-as opposed to different-responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g.,/-B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz-as opposed to az- responses in the audiovisual than auditory mode. Performance in the audiovisual mode showed more same
Visual Speech Alters the Discrimination and Identification of Non-Intact Auditory Speech in Children with Hearing Loss

Science.gov (United States)

Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Hervé

2017-01-01

Objectives Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Methods Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/–B/aa or /–B/az). The items started with an easy-to-speechread /B/ or difficult-to-speechread /G/ onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/–B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same—as opposed to different—responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g., /–B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz—as opposed to az— responses in the audiovisual than auditory mode. Results
Design and realisation of an audiovisual speech activity detector

NARCIS (Netherlands)

Van Bree, K.C.

2006-01-01

For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will givefalse positives when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach
Auditory and audio-visual processing in patients with cochlear, auditory brainstem, and auditory midbrain implants: An EEG study.

Science.gov (United States)

Schierholz, Irina; Finke, Mareike; Kral, Andrej; Büchner, Andreas; Rach, Stefan; Lenarz, Thomas; Dengler, Reinhard; Sandmann, Pascale

2017-04-01

There is substantial variability in speech recognition ability across patients with cochlear implants (CIs), auditory brainstem implants (ABIs), and auditory midbrain implants (AMIs). To better understand how this variability is related to central processing differences, the current electroencephalography (EEG) study compared hearing abilities and auditory-cortex activation in patients with electrical stimulation at different sites of the auditory pathway. Three different groups of patients with auditory implants (Hannover Medical School; ABI: n = 6, CI: n = 6; AMI: n = 2) performed a speeded response task and a speech recognition test with auditory, visual, and audio-visual stimuli. Behavioral performance and cortical processing of auditory and audio-visual stimuli were compared between groups. ABI and AMI patients showed prolonged response times on auditory and audio-visual stimuli compared with NH listeners and CI patients. This was confirmed by prolonged N1 latencies and reduced N1 amplitudes in ABI and AMI patients. However, patients with central auditory implants showed a remarkable gain in performance when visual and auditory input was combined, in both speech and non-speech conditions, which was reflected by a strong visual modulation of auditory-cortex activation in these individuals. In sum, the results suggest that the behavioral improvement for audio-visual conditions in central auditory implant patients is based on enhanced audio-visual interactions in the auditory cortex. Their findings may provide important implications for the optimization of electrical stimulation and rehabilitation strategies in patients with central auditory prostheses. Hum Brain Mapp 38:2206-2225, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

Science.gov (United States)

D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

2016-08-01

Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.
Joint evaluation of communication quality and user experience in an audio-visual virtual reality meeting

DEFF Research Database (Denmark)

Møller, Anders Kalsgaard; Hoffmann, Pablo F.; Carrozzino, Marcello

2013-01-01

The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...
Audiovisual sentence repetition as a clinical criterion for auditory development in Persian-language children with hearing loss.

Science.gov (United States)

Oryadi-Zanjani, Mohammad Majid; Vahab, Maryam; Rahimi, Zahra; Mayahi, Anis

2017-02-01

It is important for clinician such as speech-language pathologists and audiologists to develop more efficient procedures to assess the development of auditory, speech and language skills in children using hearing aid and/or cochlear implant compared to their peers with normal hearing. So, the aim of study was the comparison of the performance of 5-to-7-year-old Persian-language children with and without hearing loss in visual-only, auditory-only, and audiovisual presentation of sentence repetition task. The research was administered as a cross-sectional study. The sample size was 92 Persian 5-7 year old children including: 60 with normal hearing and 32 with hearing loss. The children with hearing loss were recruited from Soroush rehabilitation center for Persian-language children with hearing loss in Shiraz, Iran, through consecutive sampling method. All the children had unilateral cochlear implant or bilateral hearing aid. The assessment tool was the Sentence Repetition Test. The study included three computer-based experiments including visual-only, auditory-only, and audiovisual. The scores were compared within and among the three groups through statistical tests in α = 0.05. The score of sentence repetition task between V-only, A-only, and AV presentation was significantly different in the three groups; in other words, the highest to lowest scores belonged respectively to audiovisual, auditory-only, and visual-only format in the children with normal hearing (P audiovisual sentence repetition scores in all the 5-to-7-year-old children (r = 0.179, n = 92, P = 0.088), but audiovisual sentence repetition scores were found to be strongly correlated with auditory-only scores in all the 5-to-7-year-old children (r = 0.943, n = 92, P = 0.000). According to the study's findings, audiovisual integration occurs in the 5-to-7-year-old Persian children using hearing aid or cochlear implant during sentence repetition similar to their peers with normal hearing
Speech Entrainment Compensates for Broca's Area Damage

Science.gov (United States)

Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

2015-01-01

Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443
Audiovisual alignment of co-speech gestures to speech supports word learning in 2-year-olds.

Science.gov (United States)

Jesse, Alexandra; Johnson, Elizabeth K

2016-05-01

Analyses of caregiver-child communication suggest that an adult tends to highlight objects in a child's visual scene by moving them in a manner that is temporally aligned with the adult's speech productions. Here, we used the looking-while-listening paradigm to examine whether 25-month-olds use audiovisual temporal alignment to disambiguate and learn novel word-referent mappings in a difficult word-learning task. Videos of two equally interesting and animated novel objects were simultaneously presented to children, but the movement of only one of the objects was aligned with an accompanying object-labeling audio track. No social cues (e.g., pointing, eye gaze, touch) were available to the children because the speaker was edited out of the videos. Immediately afterward, toddlers were presented with still images of the two objects and asked to look at one or the other. Toddlers looked reliably longer to the labeled object, demonstrating their acquisition of the novel word-referent mapping. A control condition showed that children's performance was not solely due to the single unambiguous labeling that had occurred at experiment onset. We conclude that the temporal link between a speaker's utterances and the motion they imposed on the referent object helps toddlers to deduce a speaker's intended reference in a difficult word-learning scenario. In combination with our previous work, these findings suggest that intersensory redundancy is a source of information used by language users of all ages. That is, intersensory redundancy is not just a word-learning tool used by young infants. Copyright © 2015 Elsevier Inc. All rights reserved.
Causal inference and temporal predictions in audiovisual perception of speech and music.

Science.gov (United States)

Noppeney, Uta; Lee, Hwee Ling

2018-03-31

To form a coherent percept of the environment, the brain must integrate sensory signals emanating from a common source but segregate those from different sources. Temporal regularities are prominent cues for multisensory integration, particularly for speech and music perception. In line with models of predictive coding, we suggest that the brain adapts an internal model to the statistical regularities in its environment. This internal model enables cross-sensory and sensorimotor temporal predictions as a mechanism to arbitrate between integration and segregation of signals from different senses. © 2018 New York Academy of Sciences.
Spatio-temporal distribution of brain activity associated with audio-visually congruent and incongruent speech and the McGurk Effect.

Science.gov (United States)

Pratt, Hillel; Bleich, Naomi; Mittelman, Nomi

2015-11-01

Spatio-temporal distributions of cortical activity to audio-visual presentations of meaningless vowel-consonant-vowels and the effects of audio-visual congruence/incongruence, with emphasis on the McGurk effect, were studied. The McGurk effect occurs when a clearly audible syllable with one consonant, is presented simultaneously with a visual presentation of a face articulating a syllable with a different consonant and the resulting percept is a syllable with a consonant other than the auditorily presented one. Twenty subjects listened to pairs of audio-visually congruent or incongruent utterances and indicated whether pair members were the same or not. Source current densities of event-related potentials to the first utterance in the pair were estimated and effects of stimulus-response combinations, brain area, hemisphere, and clarity of visual articulation were assessed. Auditory cortex, superior parietal cortex, and middle temporal cortex were the most consistently involved areas across experimental conditions. Early (visual cortex. Clarity of visual articulation impacted activity in secondary visual cortex and Wernicke's area. McGurk perception was associated with decreased activity in primary and secondary auditory cortices and Wernicke's area before 100 msec, increased activity around 100 msec which decreased again around 180 msec. Activity in Broca's area was unaffected by McGurk perception and was only increased to congruent audio-visual stimuli 30-70 msec following consonant onset. The results suggest left hemisphere prominence in the effects of stimulus and response conditions on eight brain areas involved in dynamically distributed parallel processing of audio-visual integration. Initially (30-70 msec) subcortical contributions to auditory cortex, superior parietal cortex, and middle temporal cortex occur. During 100-140 msec, peristriate visual influences and Wernicke's area join in the processing. Resolution of incongruent audio-visual inputs is then
Multisensory speech perception in autism spectrum disorder: From phoneme to whole-word perception.

Science.gov (United States)

Stevenson, Ryan A; Baum, Sarah H; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Wallace, Mark T

2017-07-01

Speech perception in noisy environments is boosted when a listener can see the speaker's mouth and integrate the auditory and visual speech information. Autistic children have a diminished capacity to integrate sensory information across modalities, which contributes to core symptoms of autism, such as impairments in social communication. We investigated the abilities of autistic and typically-developing (TD) children to integrate auditory and visual speech stimuli in various signal-to-noise ratios (SNR). Measurements of both whole-word and phoneme recognition were recorded. At the level of whole-word recognition, autistic children exhibited reduced performance in both the auditory and audiovisual modalities. Importantly, autistic children showed reduced behavioral benefit from multisensory integration with whole-word recognition, specifically at low SNRs. At the level of phoneme recognition, autistic children exhibited reduced performance relative to their TD peers in auditory, visual, and audiovisual modalities. However, and in contrast to their performance at the level of whole-word recognition, both autistic and TD children showed benefits from multisensory integration for phoneme recognition. In accordance with the principle of inverse effectiveness, both groups exhibited greater benefit at low SNRs relative to high SNRs. Thus, while autistic children showed typical multisensory benefits during phoneme recognition, these benefits did not translate to typical multisensory benefit of whole-word recognition in noisy environments. We hypothesize that sensory impairments in autistic children raise the SNR threshold needed to extract meaningful information from a given sensory input, resulting in subsequent failure to exhibit behavioral benefits from additional sensory information at the level of whole-word recognition. Autism Res 2017. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. Autism Res 2017, 10: 1280-1290. © 2017 International
Multisensory integration of speech sounds with letters vs. visual speech : only visual speech induces the mismatch negativity

NARCIS (Netherlands)

Stekelenburg, J.J.; Keetels, M.N.; Vroomen, J.H.M.

2018-01-01

Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect.
Neural Correlates of Audiovisual Integration of Semantic Category Information

Science.gov (United States)

Hu, Zhonghua; Zhang, Ruiling; Zhang, Qinglin; Liu, Qiang; Li, Hong

2012-01-01

Previous studies have found a late frontal-central audiovisual interaction during the time period about 150-220 ms post-stimulus. However, it is unclear to which process is this audiovisual interaction related: to processing of acoustical features or to classification of stimuli? To investigate this question, event-related potentials were recorded…
Development of Trivia Game for speech understanding in background noise.

Science.gov (United States)

Schwartz, Kathryn; Ringleb, Stacie I; Sandberg, Hilary; Raymer, Anastasia; Watson, Ginger S

2015-01-01

Listening in noise is an everyday activity and poses a challenge for many people. To improve the ability to understand speech in noise, a computerized auditory rehabilitation game was developed. In Trivia Game players are challenged to answer trivia questions spoken aloud. As players progress through the game, the level of background noise increases. A study using Trivia Game was conducted as a proof-of-concept investigation in healthy participants. College students with normal hearing were randomly assigned to a control (n = 13) or a treatment (n = 14) group. Treatment participants played Trivia Game 12 times over a 4-week period. All participants completed objective (auditory-only and audiovisual formats) and subjective listening in noise measures at baseline and 4 weeks later. There were no statistical differences between the groups at baseline. At post-test, the treatment group significantly improved their overall speech understanding in noise in the audiovisual condition and reported significant benefits in their functional listening abilities. Playing Trivia Game improved speech understanding in noise in healthy listeners. Significant findings for the audiovisual condition suggest that participants improved face-reading abilities. Trivia Game may be a platform for investigating changes in speech understanding in individuals with sensory, linguistic and cognitive impairments.
Audio-Visual Speech in Noise Perception in Dyslexia

Science.gov (United States)

van Laarhoven, Thijs; Keetels, Mirjam; Schakel, Lemmy; Vroomen, Jean

2018-01-01

Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and adults with DD. We found that children with a…
Audio-Visual Speech Perception: A Developmental ERP Investigation

Science.gov (United States)

Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

2014-01-01

Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…
The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

Science.gov (United States)

Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

2017-01-01

We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework
Patients with hippocampal amnesia successfully integrate gesture and speech.

Science.gov (United States)

Hilverman, Caitlin; Clough, Sharice; Duff, Melissa C; Cook, Susan Wagner

2018-06-19

During conversation, people integrate information from co-speech hand gestures with information in spoken language. For example, after hearing the sentence, "A piece of the log flew up and hit Carl in the face" while viewing a gesture directed at the nose, people tend to later report that the log hit Carl in the nose (information only in gesture) rather than in the face (information in speech). The cognitive and neural mechanisms that support the integration of gesture with speech are unclear. One possibility is that the hippocampus - known for its role in relational memory and information integration - is necessary for integrating gesture and speech. To test this possibility, we examined how patients with hippocampal amnesia and healthy and brain-damaged comparison participants express information from gesture in a narrative retelling task. Participants watched videos of an experimenter telling narratives that included hand gestures that contained supplementary information. Participants were asked to retell the narratives and their spoken retellings were assessed for the presence of information from gesture. For features that had been accompanied by supplementary gesture, patients with amnesia retold fewer of these features overall and fewer retellings that matched the speech from the narrative. Yet their retellings included features that contained information that had been present uniquely in gesture in amounts that were not reliably different from comparison groups. Thus, a functioning hippocampus is not necessary for gesture-speech integration over short timescales. Providing unique information in gesture may enhance communication for individuals with declarative memory impairment, possibly via non-declarative memory mechanisms. Copyright © 2018. Published by Elsevier Ltd.

Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

Science.gov (United States)

Borrie, Stephanie A

2015-03-01

This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
AN EXPERIMENTAL EVALUATION OF AUDIO-VISUAL METHODS--CHANGING ATTITUDES TOWARD EDUCATION.

Science.gov (United States)

LOWELL, EDGAR L.; AND OTHERS

AUDIOVISUAL PROGRAMS FOR PARENTS OF DEAF CHILDREN WERE DEVELOPED AND EVALUATED. EIGHTEEN SOUND FILMS AND ACCOMPANYING RECORDS PRESENTED INFORMATION ON HEARING, LIPREADING AND SPEECH, AND ATTEMPTED TO CHANGE PARENTAL ATTITUDES TOWARD CHILDREN AND SPOUSES. TWO VERSIONS OF THE FILMS AND RECORDS WERE NARRATED BY (1) "STARS" WHO WERE…
Historia audiovisual para una sociedad audiovisual

Directory of Open Access Journals (Sweden)

Julio Montero Díaz

2013-04-01

Full Text Available This article analyzes the possibilities of presenting an audiovisual history in a society in which audiovisual media has progressively gained greater protagonism. We analyze specific cases of films and historical documentaries and we assess the difficulties faced by historians to understand the keys of audiovisual language and by filmmakers to understand and incorporate history into their productions. We conclude that it would not be possible to disseminate history in the western world without audiovisual resources circulated through various types of screens (cinema, television, computer, mobile phone, video games.
Integration of auditory and visual speech information

NARCIS (Netherlands)

Hall, M.; Smeele, P.M.T.; Kuhl, P.K.

1998-01-01

The integration of auditory and visual speech is observed when modes specify different places of articulation. Influences of auditory variation on integration were examined using consonant identifi-cation, plus quality and similarity ratings. Auditory identification predicted auditory-visual
Efficient visual search from synchronized auditory signals requires transient audiovisual events.

Directory of Open Access Journals (Sweden)

Erik Van der Burg

Full Text Available BACKGROUND: A prevailing view is that audiovisual integration requires temporally coincident signals. However, a recent study failed to find any evidence for audiovisual integration in visual search even when using synchronized audiovisual events. An important question is what information is critical to observe audiovisual integration. METHODOLOGY/PRINCIPAL FINDINGS: Here we demonstrate that temporal coincidence (i.e., synchrony of auditory and visual components can trigger audiovisual interaction in cluttered displays and consequently produce very fast and efficient target identification. In visual search experiments, subjects found a modulating visual target vastly more efficiently when it was paired with a synchronous auditory signal. By manipulating the kind of temporal modulation (sine wave vs. square wave vs. difference wave; harmonic sine-wave synthesis; gradient of onset/offset ramps we show that abrupt visual events are required for this search efficiency to occur, and that sinusoidal audiovisual modulations do not support efficient search. CONCLUSIONS/SIGNIFICANCE: Thus, audiovisual temporal alignment will only lead to benefits in visual search if the changes in the component signals are both synchronized and transient. We propose that transient signals are necessary in synchrony-driven binding to avoid spurious interactions with unrelated signals when these occur close together in time.
Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users.

Science.gov (United States)

Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha

2015-03-01

While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed
Speech misperception: speaking and seeing interfere differently with hearing.

Directory of Open Access Journals (Sweden)

Takemi Mochida

Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.
Audio-visual temporal recalibration can be constrained by content cues regardless of spatial overlap

Directory of Open Access Journals (Sweden)

Warrick eRoseboom

2013-04-01

Full Text Available It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated, and opposing, estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this was necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; Experiment 1 and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; Experiment 2 we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.
Classifying laughter and speech using audio-visual feature prediction

NARCIS (Netherlands)

Petridis, Stavros; Asghar, Ali; Pantic, Maja

2010-01-01

In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and
Gesture and Speech Integration: An Exploratory Study of a Man with Aphasia

Science.gov (United States)

Cocks, Naomi; Sautin, Laetitia; Kita, Sotaro; Morgan, Gary; Zlotowitz, Sally

2009-01-01

Background: In order to comprehend fully a speaker's intention in everyday communication, information is integrated from multiple sources, including gesture and speech. There are no published studies that have explored the impact of aphasia on iconic co-speech gesture and speech integration. Aims: To explore the impact of aphasia on co-speech…
Audiovisual spoken word recognition as a clinical criterion for sensory aids efficiency in Persian-language children with hearing loss.

Science.gov (United States)

Oryadi-Zanjani, Mohammad Majid; Vahab, Maryam; Bazrafkan, Mozhdeh; Haghjoo, Asghar

2015-12-01

The aim of this study was to examine the role of audiovisual speech recognition as a clinical criterion of cochlear implant or hearing aid efficiency in Persian-language children with severe-to-profound hearing loss. This research was administered as a cross-sectional study. The sample size was 60 Persian 5-7 year old children. The assessment tool was one of subtests of Persian version of the Test of Language Development-Primary 3. The study included two experiments: auditory-only and audiovisual presentation conditions. The test was a closed-set including 30 words which were orally presented by a speech-language pathologist. The scores of audiovisual word perception were significantly higher than auditory-only condition in the children with normal hearing (Paudiovisual presentation conditions (P>0.05). The audiovisual spoken word recognition can be applied as a clinical criterion to assess the children with severe to profound hearing loss in order to find whether cochlear implant or hearing aid has been efficient for them or not; i.e. if a child with hearing impairment who using CI or HA can obtain higher scores in audiovisual spoken word recognition than auditory-only condition, his/her auditory skills have appropriately developed due to effective CI or HA as one of the main factors of auditory habilitation. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Speech recognition by means of a three-integrated-circuit set

Energy Technology Data Exchange (ETDEWEB)

Zoicas, A.

1983-11-03

The author uses pattern recognition methods for detecting word boundaries, and monitors incoming speech at 12 millisecond intervals. Frequency is divided into eight bands and analysis is achieved in an analogue interface integrated circuit, a pipeline digital processor and a control integrated circuit. Applications are suggested, including speech input to personal computers. 3 references.
Audiovisual quality assessment and prediction for videotelephony

CERN Document Server

Belmudez, Benjamin

2015-01-01

The work presented in this book focuses on modeling audiovisual quality as perceived by the users of IP-based solutions for video communication like videotelephony. It also extends the current framework for the parametric prediction of audiovisual call quality. The book addresses several aspects related to the quality perception of entire video calls, namely, the quality estimation of the single audio and video modalities in an interactive context, the audiovisual quality integration of these modalities and the temporal pooling of short sample-based quality scores to account for the perceptual quality impact of time-varying degradations.
Toward Speech and Nonverbal Behaviors Integration for Humanoid Robot

Directory of Open Access Journals (Sweden)

Wei Wang

2012-09-01

Full Text Available It is essential to integrate speeches and nonverbal behaviors for a humanoid robot in human-robot interaction. This paper presents an approach using multi-object genetic algorithm to match the speeches and behaviors automatically. Firstly, with humanoid robot's emotion status, we construct a hierarchical structure to link voice characteristics and nonverbal behaviors. Secondly, these behaviors corresponding to speeches are matched and integrated into an action sequence based on genetic algorithm, so the robot can consistently speak and perform emotional behaviors. Our approach takes advantage of relevant knowledge described by psychologists and nonverbal communication. And from experiment results, our ultimate goal, implementing an affective robot to act and speak with partners vividly and fluently, could be achieved.
Sex differences in multisensory speech processing in both typically developing children and those on the autism spectrum.

Directory of Open Access Journals (Sweden)

Lars A. Ross

2015-05-01

Full Text Available Background: Previous work has revealed sizeable deficits in the abilities of children with an autism spectrum disorder (ASD to integrate auditory and visual speech signals, with clear implications for social communication in this population. There is a strong male preponderance in ASD, with approximately four affected males for every female. The presence of sex differences in ASD symptoms suggests a sexual dimorphism in the ASD phenotype, and raises the question of whether this dimorphism extends to ASD traits in the neurotypical population. Here, we investigated possible sexual dimorphism in multisensory speech integration in both ASD and neurotypical individuals. Methods: We assessed whether males and females differed in their ability to benefit from visual speech when target words were presented under varying levels of signal-to-noise, in samples of neurotypical children and adults, and in children diagnosed with an ASD. Results: In typically developing (TD children and children with ASD, females (n= 47 and n=15 respectively were significantly superior in their ability to recognize words under audiovisual listening conditions compared to males (n= 55 and n=58 respectively. This sex difference was absent in our sample of neurotypical adults (n= 28 females; n= 28 males. Conclusions: We propose that the development of audiovisual integration is delayed in male relative to female children, a delay that is also observed in ASD. In neurotypicals, these sex differences disappear in early adulthood when females approach their performance maximum and males catch up. Our findings underline the importance of considering sex differences in the search for autism endophenotypes and strongly encourage increased efforts to study the underrepresented population of females within ASD.
Text as a Supplement to Speech in Young and Older Adults.

Science.gov (United States)

Krull, Vidya; Humes, Larry E

2016-01-01

The purpose of this experiment was to quantify the contribution of visual text to auditory speech recognition in background noise. Specifically, the authors tested the hypothesis that partially accurate visual text from an automatic speech recognizer could be used successfully to supplement speech understanding in difficult listening conditions in older adults, with normal or impaired hearing. The working hypotheses were based on what is known regarding audiovisual speech perception in the elderly from speechreading literature. We hypothesized that (1) combining auditory and visual text information will result in improved recognition accuracy compared with auditory or visual text information alone, (2) benefit from supplementing speech with visual text (auditory and visual enhancement) in young adults will be greater than that in older adults, and (3) individual differences in performance on perceptual measures would be associated with cognitive abilities. Fifteen young adults with normal hearing, 15 older adults with normal hearing, and 15 older adults with hearing loss participated in this study. All participants completed sentence recognition tasks in auditory-only, text-only, and combined auditory-text conditions. The auditory sentence stimuli were spectrally shaped to restore audibility for the older participants with impaired hearing. All participants also completed various cognitive measures, including measures of working memory, processing speed, verbal comprehension, perceptual and cognitive speed, processing efficiency, inhibition, and the ability to form wholes from parts. Group effects were examined for each of the perceptual and cognitive measures. Audiovisual benefit was calculated relative to performance on auditory- and visual-text only conditions. Finally, the relationship between perceptual measures and other independent measures were examined using principal-component factor analyses, followed by regression analyses. Both young and older adults
Comparison of audio and audiovisual measures of adult stuttering: Implications for clinical trials.

Science.gov (United States)

O'Brian, Sue; Jones, Mark; Onslow, Mark; Packman, Ann; Menzies, Ross; Lowe, Robyn

2015-04-15

This study investigated whether measures of percentage syllables stuttered (%SS) and stuttering severity ratings with a 9-point scale differ when made from audiovisual compared with audio-only recordings. Four experienced speech-language pathologists measured %SS and assigned stuttering severity ratings to 10-minute audiovisual and audio-only recordings of 36 adults. There was a mean 18% increase in %SS scores when samples were presented in audiovisual compared with audio-only mode. This result was consistent across both higher and lower %SS scores and was found to be directly attributable to counts of stuttered syllables rather than the total number of syllables. There was no significant difference between stuttering severity ratings made from the two modes. In clinical trials research, when using %SS as the primary outcome measure, audiovisual samples would be preferred as long as clear, good quality, front-on images can be easily captured. Alternatively, stuttering severity ratings may be a more valid measure to use as they correlate well with %SS and values are not influenced by the presentation mode.
Computationally Efficient Clustering of Audio-Visual Meeting Data

Science.gov (United States)

Hung, Hayley; Friedland, Gerald; Yeo, Chuohao

This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.
Multisensory integration: the case of a time window of gesture-speech integration.

Science.gov (United States)

Obermeier, Christian; Gunter, Thomas C

2015-02-01

This experiment investigates the integration of gesture and speech from a multisensory perspective. In a disambiguation paradigm, participants were presented with short videos of an actress uttering sentences like "She was impressed by the BALL, because the GAME/DANCE...." The ambiguous noun (BALL) was accompanied by an iconic gesture fragment containing information to disambiguate the noun toward its dominant or subordinate meaning. We used four different temporal alignments between noun and gesture fragment: the identification point (IP) of the noun was either prior to (+120 msec), synchronous with (0 msec), or lagging behind the end of the gesture fragment (-200 and -600 msec). ERPs triggered to the IP of the noun showed significant differences for the integration of dominant and subordinate gesture fragments in the -200, 0, and +120 msec conditions. The outcome of this integration was revealed at the target words. These data suggest a time window for direct semantic gesture-speech integration ranging from at least -200 up to +120 msec. Although the -600 msec condition did not show any signs of direct integration at the homonym, significant disambiguation was found at the target word. An explorative analysis suggested that gesture information was directly integrated at the verb, indicating that there are multiple positions in a sentence where direct gesture-speech integration takes place. Ultimately, this would implicate that in natural communication, where a gesture lasts for some time, several aspects of that gesture will have their specific and possibly distinct impact on different positions in an utterance.
Co-speech gestures influence neural activity in brain regions associated with processing semantic information.

Science.gov (United States)

Dick, Anthony Steven; Goldin-Meadow, Susan; Hasson, Uri; Skipper, Jeremy I; Small, Steven L

2009-11-01

Everyday communication is accompanied by visual information from several sources, including co-speech gestures, which provide semantic information listeners use to help disambiguate the speaker's message. Using fMRI, we examined how gestures influence neural activity in brain regions associated with processing semantic information. The BOLD response was recorded while participants listened to stories under three audiovisual conditions and one auditory-only (speech alone) condition. In the first audiovisual condition, the storyteller produced gestures that naturally accompany speech. In the second, the storyteller made semantically unrelated hand movements. In the third, the storyteller kept her hands still. In addition to inferior parietal and posterior superior and middle temporal regions, bilateral posterior superior temporal sulcus and left anterior inferior frontal gyrus responded more strongly to speech when it was further accompanied by gesture, regardless of the semantic relation to speech. However, the right inferior frontal gyrus was sensitive to the semantic import of the hand movements, demonstrating more activity when hand movements were semantically unrelated to the accompanying speech. These findings show that perceiving hand movements during speech modulates the distributed pattern of neural activation involved in both biological motion perception and discourse comprehension, suggesting listeners attempt to find meaning, not only in the words speakers produce, but also in the hand movements that accompany speech.

Development of Sensitivity to Audiovisual Temporal Asynchrony during Midchildhood

Science.gov (United States)

Kaganovich, Natalya

2016-01-01

Temporal proximity is one of the key factors determining whether events in different modalities are integrated into a unified percept. Sensitivity to audiovisual temporal asynchrony has been studied in adults in great detail. However, how such sensitivity matures during childhood is poorly understood. We examined perception of audiovisual temporal…
Elevated audiovisual temporal interaction in patients with migraine without aura

Science.gov (United States)

2014-01-01

Background Photophobia and phonophobia are the most prominent symptoms in patients with migraine without aura. Hypersensitivity to visual stimuli can lead to greater hypersensitivity to auditory stimuli, which suggests that the interaction between visual and auditory stimuli may play an important role in the pathogenesis of migraine. However, audiovisual temporal interactions in migraine have not been well studied. Therefore, our aim was to examine auditory and visual interactions in migraine. Methods In this study, visual, auditory, and audiovisual stimuli with different temporal intervals between the visual and auditory stimuli were randomly presented to the left or right hemispace. During this time, the participants were asked to respond promptly to target stimuli. We used cumulative distribution functions to analyze the response times as a measure of audiovisual integration. Results Our results showed that audiovisual integration was significantly elevated in the migraineurs compared with the normal controls (p audiovisual suppression was weaker in the migraineurs compared with the normal controls (p < 0.05). Conclusions Our findings further objectively support the notion that migraineurs without aura are hypersensitive to external visual and auditory stimuli. Our study offers a new quantitative and objective method to evaluate hypersensitivity to audio-visual stimuli in patients with migraine. PMID:24961903
Audiovisual Script Writing.

Science.gov (United States)

Parker, Norton S.

In audiovisual writing the writer must first learn to think in terms of moving visual presentation. The writer must research his script, organize it, and adapt it to a limited running time. By use of a pleasant-sounding narrator and well-written narration, the visual and narrative can be successfully integrated. There are two types of script…
Integrated Phoneme Subspace Method for Speech Feature Extraction

Directory of Open Access Journals (Sweden)

Park Hyunsin

2009-01-01

Full Text Available Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequency filter bank domain. Transformations are based on principal component analysis (PCA, independent component analysis (ICA, and linear discriminant analysis (LDA. Furthermore, this paper introduces a new feature extraction technique that collects the correlation information among phoneme subspaces and reconstructs feature space for representing phonemic information efficiently. The proposed speech feature vector is generated by projecting an observed vector onto an integrated phoneme subspace (IPS based on PCA or ICA. The performance of the new feature was evaluated for isolated word speech recognition. The proposed method provided higher recognition accuracy than conventional methods in clean and reverberant environments.
Integration of asynchronous knowledge sources in a novel speech recognition framework

OpenAIRE

Van hamme, Hugo

2008-01-01

Van hamme H., ''Integration of asynchronous knowledge sources in a novel speech recognition framework'', Proceedings ITRW on speech analysis and processing for knowledge discovery, 4 pp., June 2008, Aalborg, Denmark.
Age-related audiovisual interactions in the superior colliculus of the rat.

Science.gov (United States)

Costa, M; Piché, M; Lepore, F; Guillemot, J-P

2016-04-21

It is well established that multisensory integration is a functional characteristic of the superior colliculus that disambiguates external stimuli and therefore reduces the reaction times toward simple audiovisual targets in space. However, in a condition where a complex audiovisual stimulus is used, such as the optical flow in the presence of modulated audio signals, little is known about the processing of the multisensory integration in the superior colliculus. Furthermore, since visual and auditory deficits constitute hallmark signs during aging, we sought to gain some insight on whether audiovisual processes in the superior colliculus are altered with age. Extracellular single-unit recordings were conducted in the superior colliculus of anesthetized Sprague-Dawley adult (10-12 months) and aged (21-22 months) rats. Looming circular concentric sinusoidal (CCS) gratings were presented alone and in the presence of sinusoidally amplitude modulated white noise. In both groups of rats, two different audiovisual response interactions were encountered in the spatial domain: superadditive, and suppressive. In contrast, additive audiovisual interactions were found only in adult rats. Hence, superior colliculus audiovisual interactions were more numerous in adult rats (38%) than in aged rats (8%). These results suggest that intersensory interactions in the superior colliculus play an essential role in space processing toward audiovisual moving objects during self-motion. Moreover, aging has a deleterious effect on complex audiovisual interactions. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.
Integrating speech in time depends on temporal expectancies and attention.

Science.gov (United States)

Scharinger, Mathias; Steinberg, Johanna; Tavano, Alessandro

2017-08-01

Sensory information that unfolds in time, such as in speech perception, relies on efficient chunking mechanisms in order to yield optimally-sized units for further processing. Whether or not two successive acoustic events receive a one-unit or a two-unit interpretation seems to depend on the fit between their temporal extent and a stipulated temporal window of integration. However, there is ongoing debate on how flexible this temporal window of integration should be, especially for the processing of speech sounds. Furthermore, there is no direct evidence of whether attention may modulate the temporal constraints on the integration window. For this reason, we here examine how different word durations, which lead to different temporal separations of sound onsets, interact with attention. In an Electroencephalography (EEG) study, participants actively and passively listened to words where word-final consonants were occasionally omitted. Words had either a natural duration or were artificially prolonged in order to increase the separation of speech sound onsets. Omission responses to incomplete speech input, originating in left temporal cortex, decreased when the critical speech sound was separated from previous sounds by more than 250 msec, i.e., when the separation was larger than the stipulated temporal window of integration (125-150 msec). Attention, on the other hand, only increased omission responses for stimuli with natural durations. We complemented the event-related potential (ERP) analyses by a frequency-domain analysis on the stimulus presentation rate. Notably, the power of stimulation frequency showed the same duration and attention effects than the omission responses. We interpret these findings on the background of existing research on temporal integration windows and further suggest that our findings may be accounted for within the framework of predictive coding. Copyright © 2017 Elsevier Ltd. All rights reserved.
The influence of phonetic dimensions on aphasic speech perception

NARCIS (Netherlands)

de Kok, D.A.; Jonkers, R.; Bastiaanse, Y.R.M.

2010-01-01

Individuals with aphasia have more problems detecting small differences between speech sounds than larger ones. This paper reports how phonemic processing is impaired and how this is influenced by speechreading. A non-word discrimination task was carried out with 'audiovisual', 'auditory only' and
Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

Science.gov (United States)

Altvater-Mackensen, Nicole; Grossmann, Tobias

2015-01-01

Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…
Audiovisual perceptual learning with multiple speakers.

Science.gov (United States)

Mitchel, Aaron D; Gerfen, Chip; Weiss, Daniel J

2016-05-01

One challenge for speech perception is between-speaker variability in the acoustic parameters of speech. For example, the same phoneme (e.g. the vowel in "cat") may have substantially different acoustic properties when produced by two different speakers and yet the listener must be able to interpret these disparate stimuli as equivalent. Perceptual tuning, the use of contextual information to adjust phonemic representations, may be one mechanism that helps listeners overcome obstacles they face due to this variability during speech perception. Here we test whether visual contextual cues to speaker identity may facilitate the formation and maintenance of distributional representations for individual speakers, allowing listeners to adjust phoneme boundaries in a speaker-specific manner. We familiarized participants to an audiovisual continuum between /aba/ and /ada/. During familiarization, the "b-face" mouthed /aba/ when an ambiguous token was played, while the "D-face" mouthed /ada/. At test, the same ambiguous token was more likely to be identified as /aba/ when paired with a stilled image of the "b-face" than with an image of the "D-face." This was not the case in the control condition when the two faces were paired equally with the ambiguous token. Together, these results suggest that listeners may form speaker-specific phonemic representations using facial identity cues.
Neurophysiological evidence for the interplay of speech segmentation and word-referent mapping during novel word learning.

Science.gov (United States)

François, Clément; Cunillera, Toni; Garcia, Enara; Laine, Matti; Rodriguez-Fornells, Antoni

2017-04-01

Learning a new language requires the identification of word units from continuous speech (the speech segmentation problem) and mapping them onto conceptual representation (the word to world mapping problem). Recent behavioral studies have revealed that the statistical properties found within and across modalities can serve as cues for both processes. However, segmentation and mapping have been largely studied separately, and thus it remains unclear whether both processes can be accomplished at the same time and if they share common neurophysiological features. To address this question, we recorded EEG of 20 adult participants during both an audio alone speech segmentation task and an audiovisual word-to-picture association task. The participants were tested for both the implicit detection of online mismatches (structural auditory and visual semantic violations) as well as for the explicit recognition of words and word-to-picture associations. The ERP results from the learning phase revealed a delayed learning-related fronto-central negativity (FN400) in the audiovisual condition compared to the audio alone condition. Interestingly, while online structural auditory violations elicited clear MMN/N200 components in the audio alone condition, visual-semantic violations induced meaning-related N400 modulations in the audiovisual condition. The present results support the idea that speech segmentation and meaning mapping can take place in parallel and act in synergy to enhance novel word learning. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sistema audiovisual para reconocimiento de comandos Audiovisual system for recognition of commands

Directory of Open Access Journals (Sweden)

Alexander Ceballos

2011-08-01

Full Text Available Se presenta el desarrollo de un sistema automático de reconocimiento audiovisual del habla enfocado en el reconocimiento de comandos. La representación del audio se realizó mediante los coeficientes cepstrales de Mel y las primeras dos derivadas temporales. Para la caracterización del vídeo se hizo seguimiento automático de características visuales de alto nivel a través de toda la secuencia. Para la inicialización automática del algoritmo se emplearon transformaciones de color y contornos activos con información de flujo del vector gradiente ("GVF snakes" sobre la región labial, mientras que para el seguimiento se usaron medidas de similitud entre vecindarios y restricciones morfológicas definidas en el estándar MPEG-4. Inicialmente, se presenta el diseño del sistema de reconocimiento automático del habla, empleando únicamente información de audio (ASR, mediante Modelos Ocultos de Markov (HMMs y un enfoque de palabra aislada; posteriormente, se muestra el diseño de los sistemas empleando únicamente características de vídeo (VSR, y empleando características de audio y vídeo combinadas (AVSR. Al final se comparan los resultados de los tres sistemas para una base de datos propia en español y francés, y se muestra la influencia del ruido acústico, mostrando que el sistema de AVSR es más robusto que ASR y VSR.We present the development of an automatic audiovisual speech recognition system focused on the recognition of commands. Signal audio representation was done using Mel cepstral coefficients and their first and second order time derivatives. In order to characterize the video signal, a set of high-level visual features was tracked throughout the sequences. Automatic initialization of the algorithm was performed using color transformations and active contour models based on Gradient Vector Flow (GVF Snakes on the lip region, whereas visual tracking used similarity measures across neighborhoods and morphological
Copyright for audiovisual work and analysis of websites offering audiovisual works

OpenAIRE

Chrastecká, Nicolle

2014-01-01

This Bachelor's thesis deals with the matter of audiovisual piracy. It discusses the question of audiovisual piracy being caused not by the wrong interpretation of law but by the lack of competitiveness among websites with legal audiovisual content. This thesis questions the quality of legal interpretation in the matter of audiovisual piracy and focuses on its sufficiency. It analyses the responsibility of website providers, providers of the illegal content, the responsibility of illegal cont...
'When birds of a feather flock together': synesthetic correspondences modulate audiovisual integration in non-synesthetes.

Directory of Open Access Journals (Sweden)

Cesare Valerio Parise

Full Text Available BACKGROUND: Synesthesia is a condition in which the stimulation of one sense elicits an additional experience, often in a different (i.e., unstimulated sense. Although only a small proportion of the population is synesthetic, there is growing evidence to suggest that neurocognitively-normal individuals also experience some form of synesthetic association between the stimuli presented to different sensory modalities (i.e., between auditory pitch and visual size, where lower frequency tones are associated with large objects and higher frequency tones with small objects. While previous research has highlighted crossmodal interactions between synesthetically corresponding dimensions, the possible role of synesthetic associations in multisensory integration has not been considered previously. METHODOLOGY: Here we investigate the effects of synesthetic associations by presenting pairs of asynchronous or spatially discrepant visual and auditory stimuli that were either synesthetically matched or mismatched. In a series of three psychophysical experiments, participants reported the relative temporal order of presentation or the relative spatial locations of the two stimuli. PRINCIPAL FINDINGS: The reliability of non-synesthetic participants' estimates of both audiovisual temporal asynchrony and spatial discrepancy were lower for pairs of synesthetically matched as compared to synesthetically mismatched audiovisual stimuli. CONCLUSIONS: Recent studies of multisensory integration have shown that the reduced reliability of perceptual estimates regarding intersensory conflicts constitutes the marker of a stronger coupling between the unisensory signals. Our results therefore indicate a stronger coupling of synesthetically matched vs. mismatched stimuli and provide the first psychophysical evidence that synesthetic congruency can promote multisensory integration. Synesthetic crossmodal correspondences therefore appear to play a crucial (if unacknowledged
Modelling audiovisual integration of affect from videos and music.

Science.gov (United States)

Gao, Chuanji; Wedell, Douglas H; Kim, Jongwan; Weber, Christine E; Shinkareva, Svetlana V

2018-05-01

Two experiments examined how affective values from visual and auditory modalities are integrated. Experiment 1 paired music and videos drawn from three levels of valence while holding arousal constant. Experiment 2 included a parallel combination of three levels of arousal while holding valence constant. In each experiment, participants rated their affective states after unimodal and multimodal presentations. Experiment 1 revealed a congruency effect in which stimulus combinations of the same extreme valence resulted in more extreme state ratings than component stimuli presented in isolation. An interaction between music and video valence reflected the greater influence of negative affect. Video valence was found to have a significantly greater effect on combined ratings than music valence. The pattern of data was explained by a five parameter differential weight averaging model that attributed greater weight to the visual modality and increased weight with decreasing values of valence. Experiment 2 revealed a congruency effect only for high arousal combinations and no interaction effects. This pattern was explained by a three parameter constant weight averaging model with greater weight for the auditory modality and a very low arousal value for the initial state. These results demonstrate key differences in audiovisual integration between valence and arousal.
Functional neuroanatomy of gesture-speech integration in children varies with individual differences in gesture processing.

Science.gov (United States)

Demir-Lira, Özlem Ece; Asaridou, Salomi S; Raja Beharelle, Anjali; Holt, Anna E; Goldin-Meadow, Susan; Small, Steven L

2018-03-08

Gesture is an integral part of children's communicative repertoire. However, little is known about the neurobiology of speech and gesture integration in the developing brain. We investigated how 8- to 10-year-old children processed gesture that was essential to understanding a set of narratives. We asked whether the functional neuroanatomy of gesture-speech integration varies as a function of (1) the content of speech, and/or (2) individual differences in how gesture is processed. When gestures provided missing information not present in the speech (i.e., disambiguating gesture; e.g., "pet" + flapping palms = bird), the presence of gesture led to increased activity in inferior frontal gyri, the right middle temporal gyrus, and the left superior temporal gyrus, compared to when gesture provided redundant information (i.e., reinforcing gesture; e.g., "bird" + flapping palms = bird). This pattern of activation was found only in children who were able to successfully integrate gesture and speech behaviorally, as indicated by their performance on post-test story comprehension questions. Children who did not glean meaning from gesture did not show differential activation across the two conditions. Our results suggest that the brain activation pattern for gesture-speech integration in children overlaps with-but is broader than-the pattern in adults performing the same task. Overall, our results provide a possible neurobiological mechanism that could underlie children's increasing ability to integrate gesture and speech over childhood, and account for individual differences in that integration. © 2018 John Wiley & Sons Ltd.
Behavioural evidence for separate mechanisms of audiovisual temporal binding as a function of leading sensory modality.

Science.gov (United States)

Cecere, Roberto; Gross, Joachim; Thut, Gregor

2016-06-01

The ability to integrate auditory and visual information is critical for effective perception and interaction with the environment, and is thought to be abnormal in some clinical populations. Several studies have investigated the time window over which audiovisual events are integrated, also called the temporal binding window, and revealed asymmetries depending on the order of audiovisual input (i.e. the leading sense). When judging audiovisual simultaneity, the binding window appears narrower and non-malleable for auditory-leading stimulus pairs and wider and trainable for visual-leading pairs. Here we specifically examined the level of independence of binding mechanisms when auditory-before-visual vs. visual-before-auditory input is bound. Three groups of healthy participants practiced audiovisual simultaneity detection with feedback, selectively training on auditory-leading stimulus pairs (group 1), visual-leading stimulus pairs (group 2) or both (group 3). Subsequently, we tested for learning transfer (crossover) from trained stimulus pairs to non-trained pairs with opposite audiovisual input. Our data confirmed the known asymmetry in size and trainability for auditory-visual vs. visual-auditory binding windows. More importantly, practicing one type of audiovisual integration (e.g. auditory-visual) did not affect the other type (e.g. visual-auditory), even if trainable by within-condition practice. Together, these results provide crucial evidence that audiovisual temporal binding for auditory-leading vs. visual-leading stimulus pairs are independent, possibly tapping into different circuits for audiovisual integration due to engagement of different multisensory sampling mechanisms depending on leading sense. Our results have implications for informing the study of multisensory interactions in healthy participants and clinical populations with dysfunctional multisensory integration. © 2016 The Authors. European Journal of Neuroscience published by Federation
Gone in a Flash: Manipulation of Audiovisual Temporal Integration Using Transcranial Magnetic Stimulation

Directory of Open Access Journals (Sweden)

Roy eHamilton

2013-09-01

Full Text Available While converging evidence implicates the right inferior parietal lobule in audiovisual integration, its role has not been fully elucidated by direct manipulation of cortical activity. Replicating and extending an experiment initially reported by Kamke, Vieth, Cottrell, and Mattingley (2012, we employed the sound-induced flash illusion, in which a single visual flash, when accompanied by two auditory tones, is misperceived as multiple flashes (Wilson, 1987; Shams, et al., 2000. Slow repetitive (1Hz TMS administered to the right angular gyrus, but not the right supramarginal gyrus, induced a transient decrease in the Peak Perceived Flashes (PPF, reflecting reduced susceptibility to the illusion. This finding independently confirms that perturbation of networks involved in multisensory integration can result in a more veridical representation of asynchronous auditory and visual events and that cross-modal integration is an active process in which the objective is the identification of a meaningful constellation of inputs, at times at the expense of accuracy.
Aging Effect on Audiovisual Integrative Processing in Spatial Discrimination Task

Directory of Open Access Journals (Sweden)

Zhi Zou

2017-11-01

Full Text Available Multisensory integration is an essential process that people employ daily, from conversing in social gatherings to navigating the nearby environment. The aim of this study was to investigate the impact of aging on modulating multisensory integrative processes using event-related potential (ERP, and the validity of the study was improved by including “noise” in the contrast conditions. Older and younger participants were involved in perceiving visual and/or auditory stimuli that contained spatial information. The participants responded by indicating the spatial direction (far vs. near and left vs. right conveyed in the stimuli using different wrist movements. electroencephalograms (EEGs were captured in each task trial, along with the accuracy and reaction time of the participants’ motor responses. Older participants showed a greater extent of behavioral improvements in the multisensory (as opposed to unisensory condition compared to their younger counterparts. Older participants were found to have fronto-centrally distributed super-additive P2, which was not the case for the younger participants. The P2 amplitude difference between the multisensory condition and the sum of the unisensory conditions was found to correlate significantly with performance on spatial discrimination. The results indicated that the age-related effect modulated the integrative process in the perceptual and feedback stages, particularly the evaluation of auditory stimuli. Audiovisual (AV integration may also serve a functional role during spatial-discrimination processes to compensate for the compromised attention function caused by aging.
Boosting pitch encoding with audiovisual interactions in congenital amusia.

Science.gov (United States)

Albouy, Philippe; Lévêque, Yohana; Hyde, Krista L; Bouchet, Patrick; Tillmann, Barbara; Caclin, Anne

2015-01-01

The combination of information across senses can enhance perception, as revealed for example by decreased reaction times or improved stimulus detection. Interestingly, these facilitatory effects have been shown to be maximal when responses to unisensory modalities are weak. The present study investigated whether audiovisual facilitation can be observed in congenital amusia, a music-specific disorder primarily ascribed to impairments of pitch processing. Amusic individuals and their matched controls performed two tasks. In Task 1, they were required to detect auditory, visual, or audiovisual stimuli as rapidly as possible. In Task 2, they were required to detect as accurately and as rapidly as possible a pitch change within an otherwise monotonic 5-tone sequence that was presented either only auditorily (A condition), or simultaneously with a temporally congruent, but otherwise uninformative visual stimulus (AV condition). Results of Task 1 showed that amusics exhibit typical auditory and visual detection, and typical audiovisual integration capacities: both amusics and controls exhibited shorter response times for audiovisual stimuli than for either auditory stimuli or visual stimuli. Results of Task 2 revealed that both groups benefited from simultaneous uninformative visual stimuli to detect pitch changes: accuracy was higher and response times shorter in the AV condition than in the A condition. The audiovisual improvements of response times were observed for different pitch interval sizes depending on the group. These results suggest that both typical listeners and amusic individuals can benefit from multisensory integration to improve their pitch processing abilities and that this benefit varies as a function of task difficulty. These findings constitute the first step towards the perspective to exploit multisensory paradigms to reduce pitch-related deficits in congenital amusia, notably by suggesting that audiovisual paradigms are effective in an appropriate

Segmentation of the Speaker's Face Region with Audiovisual Correlation

Science.gov (United States)

Liu, Yuyu; Sato, Yoichi

The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.
Visual Information Can Hinder Working Memory Processing of Speech

Science.gov (United States)

Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Ronnberg, Jerker; Rudner, Mary

2013-01-01

Purpose: The purpose of the present study was to evaluate the new Cognitive Spare Capacity Test (CSCT), which measures aspects of working memory capacity for heard speech in the audiovisual and auditory-only modalities of presentation. Method: In Experiment 1, 20 young adults with normal hearing performed the CSCT and an independent battery of…
An analysis of machine translation and speech synthesis in speech-to-speech translation system

OpenAIRE

Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

2011-01-01

This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...
Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

Directory of Open Access Journals (Sweden)

Alan James Power

2012-07-01

Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling
Transcranial Magnetic Stimulation over Left Inferior Frontal and Posterior Temporal Cortex Disrupts Gesture-Speech Integration.

Science.gov (United States)

Zhao, Wanying; Riggs, Kevin; Schindler, Igor; Holle, Henning

2018-02-21

Language and action naturally occur together in the form of cospeech gestures, and there is now convincing evidence that listeners display a strong tendency to integrate semantic information from both domains during comprehension. A contentious question, however, has been which brain areas are causally involved in this integration process. In previous neuroimaging studies, left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG) have emerged as candidate areas; however, it is currently not clear whether these areas are causally or merely epiphenomenally involved in gesture-speech integration. In the present series of experiments, we directly tested for a potential critical role of IFG and pMTG by observing the effect of disrupting activity in these areas using transcranial magnetic stimulation in a mixed gender sample of healthy human volunteers. The outcome measure was performance on a Stroop-like gesture task (Kelly et al., 2010a), which provides a behavioral index of gesture-speech integration. Our results provide clear evidence that disrupting activity in IFG and pMTG selectively impairs gesture-speech integration, suggesting that both areas are causally involved in the process. These findings are consistent with the idea that these areas play a joint role in gesture-speech integration, with IFG regulating strategic semantic access via top-down signals acting upon temporal storage areas. SIGNIFICANCE STATEMENT Previous neuroimaging studies suggest an involvement of inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech integration, but findings have been mixed and due to methodological constraints did not allow inferences of causality. By adopting a virtual lesion approach involving transcranial magnetic stimulation, the present study provides clear evidence that both areas are causally involved in combining semantic information arising from gesture and speech. These findings support the view that, rather than being
Severe Speech Sound Disorders: An Integrated Multimodal Intervention

Science.gov (United States)

King, Amie M.; Hengst, Julie A.; DeThorne, Laura S.

2013-01-01

Purpose: This study introduces an integrated multimodal intervention (IMI) and examines its effectiveness for the treatment of persistent and severe speech sound disorders (SSD) in young children. The IMI is an activity-based intervention that focuses simultaneously on increasing the "quantity" of a child's meaningful productions of target words…
Visual feedback of tongue movement for novel speech sound learning

Directory of Open Access Journals (Sweden)

William F Katz

2015-11-01

Full Text Available Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV information. Second language (L2 learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals. However, little is known about the role of viewing one’s own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker’s learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ̠/; a voiced, coronal, palatal stop before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers’ productions were evaluated using kinematic (tongue-tip spatial positioning and acoustic (burst spectra measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing.
Congruent and Incongruent Cues in Highly Familiar Audiovisual Action Sequences: An ERP Study

Directory of Open Access Journals (Sweden)

SM Wuerger

2012-07-01

Full Text Available In a previous fMRI study we found significant differences in BOLD responses for congruent and incongruent semantic audio-visual action sequences (whole-body actions and speech actions in bilateral pSTS, left SMA, left IFG, and IPL (Meyer, Greenlee, & Wuerger, JOCN, 2011. Here, we present results from a 128-channel ERP study that examined the time-course of these interactions using a one-back task. ERPs in response to congruent and incongruent audio-visual actions were compared to identify regions and latencies of differences. Responses to congruent and incongruent stimuli differed between 240–280 ms, 340–420 ms, and 460–660 ms after stimulus onset. A dipole analysis revealed that the difference around 250 ms can be partly explained by a modulation of sources in the vicinity of the superior temporal area, while the responses after 400 ms are consistent with sources in inferior frontal areas. Our results are in line with a model that postulates early recognition of congruent audiovisual actions in the pSTS, perhaps as a sensory memory buffer, and a later role of the IFG, perhaps in a generative capacity, in reconciling incongruent signals.
Being First Matters: Topographical Representational Similarity Analysis of ERP Signals Reveals Separate Networks for Audiovisual Temporal Binding Depending on the Leading Sense.

Science.gov (United States)

Cecere, Roberto; Gross, Joachim; Willis, Ashleigh; Thut, Gregor

2017-05-24

inputs in one modality enhance stimulus processing in another modality. Our research demonstrates that evaluating synchrony of auditory-leading (AV) versus visual-leading (VA) audiovisual stimulus pairs is characterized by two distinct patterns of brain activity. This suggests that audiovisual integration is not a unitary process and that different binding mechanisms are recruited in the brain based on the leading sense. These mechanisms may be relevant for supporting different classes of multisensory operations, for example, auditory enhancement of visual attention (AV) and visual enhancement of auditory speech (VA). Copyright © 2017 Cecere et al.
The Galker test of speech reception in noise

DEFF Research Database (Denmark)

Lauritsen, Maj-Britt Glenn; Söderström, Margareta; Kreiner, Svend

2016-01-01

PURPOSE: We tested "the Galker test", a speech reception in noise test developed for primary care for Danish preschool children, to explore if the children's ability to hear and understand speech was associated with gender, age, middle ear status, and the level of background noise. METHODS......: The Galker test is a 35-item audio-visual, computerized word discrimination test in background noise. Included were 370 normally developed children attending day care center. The children were examined with the Galker test, tympanometry, audiometry, and the Reynell test of verbal comprehension. Parents...... and daycare teachers completed questionnaires on the children's ability to hear and understand speech. As most of the variables were not assessed using interval scales, non-parametric statistics (Goodman-Kruskal's gamma) were used for analyzing associations with the Galker test score. For comparisons...
Audio-visual identification of place of articulation and voicing in white and babble noise.

Science.gov (United States)

Alm, Magnus; Behne, Dawn M; Wang, Yue; Eg, Ragnhild

2009-07-01

Research shows that noise and phonetic attributes influence the degree to which auditory and visual modalities are used in audio-visual speech perception (AVSP). Research has, however, mainly focused on white noise and single phonetic attributes, thus neglecting the more common babble noise and possible interactions between phonetic attributes. This study explores whether white and babble noise differentially influence AVSP and whether these differences depend on phonetic attributes. White and babble noise of 0 and -12 dB signal-to-noise ratio were added to congruent and incongruent audio-visual stop consonant-vowel stimuli. The audio (A) and video (V) of incongruent stimuli differed either in place of articulation (POA) or voicing. Responses from 15 young adults show that, compared to white noise, babble resulted in more audio responses for POA stimuli, and fewer for voicing stimuli. Voiced syllables received more audio responses than voiceless syllables. Results can be attributed to discrepancies in the acoustic spectra of both the noise and speech target. Voiced consonants may be more auditorily salient than voiceless consonants which are more spectrally similar to white noise. Visual cues contribute to identification of voicing, but only if the POA is visually salient and auditorily susceptible to the noise type.
Digital audiovisual archives

CERN Document Server

Stockinger, Peter

2013-01-01

Today, huge quantities of digital audiovisual resources are already available - everywhere and at any time - through Web portals, online archives and libraries, and video blogs. One central question with respect to this huge amount of audiovisual data is how they can be used in specific (social, pedagogical, etc.) contexts and what are their potential interest for target groups (communities, professionals, students, researchers, etc.).This book examines the question of the (creative) exploitation of digital audiovisual archives from a theoretical, methodological, technical and practical
Effect of Perceptual Load on Semantic Access by Speech in Children

Science.gov (United States)

Jerger, Susan; Damian, Markus F.; Mills, Candice; Bartlett, James; Tye-Murray, Nancy; Abdi, Herve

2013-01-01

Purpose: To examine whether semantic access by speech requires attention in children. Method: Children ("N" = 200) named pictures and ignored distractors on a cross-modal (distractors: auditory-no face) or multimodal (distractors: auditory-static face and audiovisual- dynamic face) picture word task. The cross-modal task had a low load,…
Neurofunctional Underpinnings of Audiovisual Emotion Processing in Teens with Autism Spectrum Disorders

Science.gov (United States)

Doyle-Thomas, Krissy A.R.; Goldberg, Jeremy; Szatmari, Peter; Hall, Geoffrey B.C.

2013-01-01

Despite successful performance on some audiovisual emotion tasks, hypoactivity has been observed in frontal and temporal integration cortices in individuals with autism spectrum disorders (ASD). Little is understood about the neurofunctional network underlying this ability in individuals with ASD. Research suggests that there may be processing biases in individuals with ASD, based on their ability to obtain meaningful information from the face and/or the voice. This functional magnetic resonance imaging study examined brain activity in teens with ASD (n = 18) and typically developing controls (n = 16) during audiovisual and unimodal emotion processing. Teens with ASD had a significantly lower accuracy when matching an emotional face to an emotion label. However, no differences in accuracy were observed between groups when matching an emotional voice or face-voice pair to an emotion label. In both groups brain activity during audiovisual emotion matching differed significantly from activity during unimodal emotion matching. Between-group analyses of audiovisual processing revealed significantly greater activation in teens with ASD in a parietofrontal network believed to be implicated in attention, goal-directed behaviors, and semantic processing. In contrast, controls showed greater activity in frontal and temporal association cortices during this task. These results suggest that in the absence of engaging integrative emotional networks during audiovisual emotion matching, teens with ASD may have recruited the parietofrontal network as an alternate compensatory system. PMID:23750139
Integrating Automatic Speech Recognition and Machine Translation for Better Translation Outputs

DEFF Research Database (Denmark)

Liyanapathirana, Jeevanthi

translations, combining machine translation with computer assisted translation has drawn attention in current research. This combines two prospects: the opportunity of ensuring high quality translation along with a significant performance gain. Automatic Speech Recognition (ASR) is another important area......, which caters important functionalities in language processing and natural language understanding tasks. In this work we integrate automatic speech recognition and machine translation in parallel. We aim to avoid manual typing of possible translations as dictating the translation would take less time...... to the n-best list rescoring, we also use word graphs with the expectation of arriving at a tighter integration of ASR and MT models. Integration methods include constraining ASR models using language and translation models of MT, and vice versa. We currently develop and experiment different methods...
Man-system interface based on automatic speech recognition: integration to a virtual control desk

Energy Technology Data Exchange (ETDEWEB)

Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C., E-mail: calexandre@ien.gov.b, E-mail: mol@ien.gov.b, E-mail: cmnap@ien.gov.b, E-mail: mag@ien.gov.b [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Nomiya, Diogo V., E-mail: diogonomiya@gmail.co [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

2009-07-01

This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)
Man-system interface based on automatic speech recognition: integration to a virtual control desk

International Nuclear Information System (INIS)

Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C.; Nomiya, Diogo V.

2009-01-01

This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)
Quantifying temporal ventriloquism in audiovisual synchrony perception

NARCIS (Netherlands)

Kuling, I.A.; Kohlrausch, A.G.; Juola, J.F.

2013-01-01

The integration of visual and auditory inputs in the human brain works properly only if the components are perceived in close temporal proximity. In the present study, we quantified cross-modal interactions in the human brain for audiovisual stimuli with temporal asynchronies, using a paradigm from
Compliments in Audiovisual Translation – issues in character identity

Directory of Open Access Journals (Sweden)

Isabel Fernandes Silva

2011-12-01

Full Text Available Over the last decades, audiovisual translation has gained increased significance in Translation Studies as well as an interdisciplinary subject within other fields (media, cinema studies etc. Although many articles have been published on communicative aspects of translation such as politeness, only recently have scholars taken an interest in the translation of compliments. This study will focus on both these areas from a multimodal and pragmatic perspective, emphasizing the links between these fields and how this multidisciplinary approach will evidence the polysemiotic nature of the translation process. In Audiovisual Translation both text and image are at play, therefore, the translation of speech produced by the characters may either omit (because it is provided by visualgestual signs or it may emphasize information. A selection was made of the compliments present in the film What Women Want, our focus being on subtitles which did not successfully convey the compliment expressed in the source text, as well as analyze the reasons for this, namely difference in register, Culture Specific Items and repetitions. These differences lead to a different portrayal/identity/perception of the main character in the English version (original soundtrack and subtitled versions in Portuguese and Italian.
Iconic Gestures for Robot Avatars, Recognition and Integration with Speech

Science.gov (United States)

Bremner, Paul; Leonards, Ute

2016-01-01

Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realized remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances. PMID:26925010

Iconic Gestures for Robot Avatars, Recognition and Integration with Speech

Directory of Open Access Journals (Sweden)

Paul Adam Bremner

2016-02-01

Full Text Available Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realised remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances.
Gesture-speech integration in children with specific language impairment.

Science.gov (United States)

Mainela-Arnold, Elina; Alibali, Martha W; Hostetter, Autumn B; Evans, Julia L

2014-11-01

Previous research suggests that speakers are especially likely to produce manual communicative gestures when they have relative ease in thinking about the spatial elements of what they are describing, paired with relative difficulty organizing those elements into appropriate spoken language. Children with specific language impairment (SLI) exhibit poor expressive language abilities together with within-normal-range nonverbal IQs. This study investigated whether weak spoken language abilities in children with SLI influence their reliance on gestures to express information. We hypothesized that these children would rely on communicative gestures to express information more often than their age-matched typically developing (TD) peers, and that they would sometimes express information in gestures that they do not express in the accompanying speech. Participants were 15 children with SLI (aged 5;6-10;0) and 18 age-matched TD controls. Children viewed a wordless cartoon and retold the story to a listener unfamiliar with the story. Children's gestures were identified and coded for meaning using a previously established system. Speech-gesture combinations were coded as redundant if the information conveyed in speech and gesture was the same, and non-redundant if the information conveyed in speech was different from the information conveyed in gesture. Children with SLI produced more gestures than children in the TD group; however, the likelihood that speech-gesture combinations were non-redundant did not differ significantly across the SLI and TD groups. In both groups, younger children were significantly more likely to produce non-redundant speech-gesture combinations than older children. The gesture-speech integration system functions similarly in children with SLI and TD, but children with SLI rely more on gesture to help formulate, conceptualize or express the messages they want to convey. This provides motivation for future research examining whether interventions
A Longitudinal Assessment of Early Childhood Education with Integrated Speech Therapy for Children with Significant Language Impairment in Germany

Science.gov (United States)

Ullrich, Dieter; Ullrich, Katja; Marten, Magret

2014-01-01

Background: In Lower Saxony, Germany, pre-school children with language- and speech-deficits have the opportunity to access kindergartens with integrated language-/speech therapy prior to attending primary school, both regular or with integrated speech therapy. It is unknown whether these early childhood education treatments are helpful and…
Internet Video Telephony Allows Speech Reading by Deaf Individuals and Improves Speech Perception by Cochlear Implant Users

Science.gov (United States)

Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D.; Senn, Pascal

2013-01-01

Objective To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Methods Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280×720, 640×480, 320×240, 160×120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0–500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Results Higher frame rate (>7 fps), higher camera resolution (>640×480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Conclusion Webcameras have the potential to improve telecommunication of hearing-impaired individuals. PMID:23359119
Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

Directory of Open Access Journals (Sweden)

Georgios Mantokoudis

Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.
Audiovisual Temporal Processing and Synchrony Perception in the Rat.

Science.gov (United States)

Schormans, Ashley L; Scott, Kaela E; Vo, Albert M Q; Tyker, Anna; Typlt, Marei; Stolzberg, Daniel; Allman, Brian L

2016-01-01

Extensive research on humans has improved our understanding of how the brain integrates information from our different senses, and has begun to uncover the brain regions and large-scale neural activity that contributes to an observer's ability to perceive the relative timing of auditory and visual stimuli. In the present study, we developed the first behavioral tasks to assess the perception of audiovisual temporal synchrony in rats. Modeled after the parameters used in human studies, separate groups of rats were trained to perform: (1) a simultaneity judgment task in which they reported whether audiovisual stimuli at various stimulus onset asynchronies (SOAs) were presented simultaneously or not; and (2) a temporal order judgment task in which they reported whether they perceived the auditory or visual stimulus to have been presented first. Furthermore, using in vivo electrophysiological recordings in the lateral extrastriate visual (V2L) cortex of anesthetized rats, we performed the first investigation of how neurons in the rat multisensory cortex integrate audiovisual stimuli presented at different SOAs. As predicted, rats ( n = 7) trained to perform the simultaneity judgment task could accurately (~80%) identify synchronous vs. asynchronous (200 ms SOA) trials. Moreover, the rats judged trials at 10 ms SOA to be synchronous, whereas the majority (~70%) of trials at 100 ms SOA were perceived to be asynchronous. During the temporal order judgment task, rats ( n = 7) perceived the synchronous audiovisual stimuli to be "visual first" for ~52% of the trials, and calculation of the smallest timing interval between the auditory and visual stimuli that could be detected in each rat (i.e., the just noticeable difference (JND)) ranged from 77 ms to 122 ms. Neurons in the rat V2L cortex were sensitive to the timing of audiovisual stimuli, such that spiking activity was greatest during trials when the visual stimulus preceded the auditory by 20-40 ms. Ultimately, given
Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

Science.gov (United States)

Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

2015-07-01

It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line
Comparing the influence of spectro-temporal integration in computational speech segregation

DEFF Research Database (Denmark)

Bentsen, Thomas; May, Tobias; Kressner, Abigail Anne

2016-01-01

The goal of computational speech segregation systems is to automatically segregate a target speaker from interfering maskers. Typically, these systems include a feature extraction stage in the front-end and a classification stage in the back-end. A spectrotemporal integration strategy can...... be applied in either the frontend, using the so-called delta features, or in the back-end, using a second classifier that exploits the posterior probability of speech from the first classifier across a spectro-temporal window. This study systematically analyzes the influence of such stages on segregation...... metric that comprehensively predicts computational segregation performance and correlates well with intelligibility. The outcome of this study could help to identify the most effective spectro-temporal integration strategy for computational segregation systems....
The Role of Temporal Disparity on Audiovisual Integration in Low-Vision Individuals.

Science.gov (United States)

Targher, Stefano; Micciolo, Rocco; Occelli, Valeria; Zampini, Massimiliano

2017-12-01

Recent findings have shown that sounds improve visual detection in low vision individuals when the audiovisual stimuli pairs of stimuli are presented simultaneously and from the same spatial position. The present study purports to investigate the temporal aspects of the audiovisual enhancement effect previously reported. Low vision participants were asked to detect the presence of a visual stimulus (yes/no task) presented either alone or together with an auditory stimulus at different stimulus onset asynchronies (SOAs). In the first experiment, the sound was presented either simultaneously or before the visual stimulus (i.e., SOAs 0, 100, 250, 400 ms). The results show that the presence of a task-irrelevant auditory stimulus produced a significant visual detection enhancement in all the conditions. In the second experiment, the sound was either synchronized with, or randomly preceded/lagged behind the visual stimulus (i.e., SOAs 0, ± 250, ± 400 ms). The visual detection enhancement was reduced in magnitude and limited only to the synchronous condition and to the condition in which the sound stimulus was presented 250 ms before the visual stimulus. Taken together, the evidence of the present study seems to suggest that audiovisual interaction in low vision individuals is highly modulated by top-down mechanisms.
Evolution of non-speech sound memory in postlingual deafness: implications for cochlear implant rehabilitation.

Science.gov (United States)

Lazard, D S; Giraud, A L; Truy, E; Lee, H J

2011-07-01

Neurofunctional patterns assessed before or after cochlear implantation (CI) are informative markers of implantation outcome. Because phonological memory reorganization in post-lingual deafness is predictive of the outcome, we investigated, using a cross-sectional approach, whether memory of non-speech sounds (NSS) produced by animals or objects (i.e. non-human sounds) is also reorganized, and how this relates to speech perception after CI. We used an fMRI auditory imagery task in which sounds were evoked by pictures of noisy items for post-lingual deaf candidates for CI and for normal-hearing subjects. When deaf subjects imagined sounds, the left inferior frontal gyrus, the right posterior temporal gyrus and the right amygdala were less activated compared to controls. Activity levels in these regions decreased with duration of auditory deprivation, indicating declining NSS representations. Whole brain correlations with duration of auditory deprivation and with speech scores after CI showed an activity decline in dorsal, fronto-parietal, cortical regions, and an activity increase in ventral cortical regions, the right anterior temporal pole and the hippocampal gyrus. Both dorsal and ventral reorganizations predicted poor speech perception outcome after CI. These results suggest that post-CI speech perception relies, at least partially, on the integrity of a neural system used for processing NSS that is based on audio-visual and articulatory mapping processes. When this neural system is reorganized, post-lingual deaf subjects resort to inefficient semantic- and memory-based strategies. These results complement those of other studies on speech processing, suggesting that both speech and NSS representations need to be maintained during deafness to ensure the success of CI. Copyright © 2011 Elsevier Ltd. All rights reserved.
Dissociating verbal and nonverbal audiovisual object processing.

Science.gov (United States)

Hocking, Julia; Price, Cathy J

2009-02-01

This fMRI study investigates how audiovisual integration differs for verbal stimuli that can be matched at a phonological level and nonverbal stimuli that can be matched at a semantic level. Subjects were presented simultaneously with one visual and one auditory stimulus and were instructed to decide whether these stimuli referred to the same object or not. Verbal stimuli were simultaneously presented spoken and written object names, and nonverbal stimuli were photographs of objects simultaneously presented with naturally occurring object sounds. Stimulus differences were controlled by including two further conditions that paired photographs of objects with spoken words and object sounds with written words. Verbal matching, relative to all other conditions, increased activation in a region of the left superior temporal sulcus that has previously been associated with phonological processing. Nonverbal matching, relative to all other conditions, increased activation in a right fusiform region that has previously been associated with structural and conceptual object processing. Thus, we demonstrate how brain activation for audiovisual integration depends on the verbal content of the stimuli, even when stimulus and task processing differences are controlled.
Enhancing audiovisual experience with haptic feedback: a survey on HAV.

Science.gov (United States)

Danieau, F; Lecuyer, A; Guillotel, P; Fleureau, J; Mollet, N; Christie, M

2013-01-01

Haptic technology has been widely employed in applications ranging from teleoperation and medical simulation to art and design, including entertainment, flight simulation, and virtual reality. Today there is a growing interest among researchers in integrating haptic feedback into audiovisual systems. A new medium emerges from this effort: haptic-audiovisual (HAV) content. This paper presents the techniques, formalisms, and key results pertinent to this medium. We first review the three main stages of the HAV workflow: the production, distribution, and rendering of haptic effects. We then highlight the pressing necessity for evaluation techniques in this context and discuss the key challenges in the field. By building on existing technologies and tackling the specific challenges of the enhancement of audiovisual experience with haptics, we believe the field presents exciting research perspectives whose financial and societal stakes are significant.
29 CFR 2.13 - Audiovisual coverage prohibited.

Science.gov (United States)

2010-07-01

... 29 Labor 1 2010-07-01 2010-07-01 true Audiovisual coverage prohibited. 2.13 Section 2.13 Labor Office of the Secretary of Labor GENERAL REGULATIONS Audiovisual Coverage of Administrative Hearings § 2.13 Audiovisual coverage prohibited. The Department shall not permit audiovisual coverage of the...
Cholinergic Potentiation and Audiovisual Repetition-Imitation Therapy Improve Speech Production and Communication Deficits in a Person with Crossed Aphasia by Inducing Structural Plasticity in White Matter Tracts.

Science.gov (United States)

Berthier, Marcelo L; De-Torres, Irene; Paredes-Pacheco, José; Roé-Vellvé, Núria; Thurnhofer-Hemsi, Karl; Torres-Prioris, María J; Alfaro, Francisco; Moreno-Torres, Ignacio; López-Barroso, Diana; Dávila, Guadalupe

2017-01-01

Donepezil (DP), a cognitive-enhancing drug targeting the cholinergic system, combined with massed sentence repetition training augmented and speeded up recovery of speech production deficits in patients with chronic conduction aphasia and extensive left hemisphere infarctions (Berthier et al., 2014). Nevertheless, a still unsettled question is whether such improvements correlate with restorative structural changes in gray matter and white matter pathways mediating speech production. In the present study, we used pharmacological magnetic resonance imaging to study treatment-induced brain changes in gray matter and white matter tracts in a right-handed male with chronic conduction aphasia and a right subcortical lesion (crossed aphasia). A single-patient, open-label multiple-baseline design incorporating two different treatments and two post-treatment evaluations was used. The patient received an initial dose of DP (5 mg/day) which was maintained during 4 weeks and then titrated up to 10 mg/day and administered alone (without aphasia therapy) during 8 weeks (Endpoint 1). Thereafter, the drug was combined with an audiovisual repetition-imitation therapy (Look-Listen-Repeat, LLR) during 3 months (Endpoint 2). Language evaluations, diffusion weighted imaging (DWI), and voxel-based morphometry (VBM) were performed at baseline and at both endpoints in JAM and once in 21 healthy control males. Treatment with DP alone and combined with LLR therapy induced marked improvement in aphasia and communication deficits as well as in selected measures of connected speech production, and phrase repetition. The obtained gains in speech production remained well-above baseline scores even 4 months after ending combined therapy. Longitudinal DWI showed structural plasticity in the right frontal aslant tract and direct segment of the arcuate fasciculus with both interventions. VBM revealed no structural changes in other white matter tracts nor in cortical areas linked by these tracts. In
Cholinergic Potentiation and Audiovisual Repetition-Imitation Therapy Improve Speech Production and Communication Deficits in a Person with Crossed Aphasia by Inducing Structural Plasticity in White Matter Tracts

Directory of Open Access Journals (Sweden)

Marcelo L. Berthier

2017-06-01

Full Text Available Donepezil (DP, a cognitive-enhancing drug targeting the cholinergic system, combined with massed sentence repetition training augmented and speeded up recovery of speech production deficits in patients with chronic conduction aphasia and extensive left hemisphere infarctions (Berthier et al., 2014. Nevertheless, a still unsettled question is whether such improvements correlate with restorative structural changes in gray matter and white matter pathways mediating speech production. In the present study, we used pharmacological magnetic resonance imaging to study treatment-induced brain changes in gray matter and white matter tracts in a right-handed male with chronic conduction aphasia and a right subcortical lesion (crossed aphasia. A single-patient, open-label multiple-baseline design incorporating two different treatments and two post-treatment evaluations was used. The patient received an initial dose of DP (5 mg/day which was maintained during 4 weeks and then titrated up to 10 mg/day and administered alone (without aphasia therapy during 8 weeks (Endpoint 1. Thereafter, the drug was combined with an audiovisual repetition-imitation therapy (Look-Listen-Repeat, LLR during 3 months (Endpoint 2. Language evaluations, diffusion weighted imaging (DWI, and voxel-based morphometry (VBM were performed at baseline and at both endpoints in JAM and once in 21 healthy control males. Treatment with DP alone and combined with LLR therapy induced marked improvement in aphasia and communication deficits as well as in selected measures of connected speech production, and phrase repetition. The obtained gains in speech production remained well-above baseline scores even 4 months after ending combined therapy. Longitudinal DWI showed structural plasticity in the right frontal aslant tract and direct segment of the arcuate fasciculus with both interventions. VBM revealed no structural changes in other white matter tracts nor in cortical areas linked by these
The Effects of Audiovisual Stimulation on the Acceptance of Background Noise.

Science.gov (United States)

Plyler, Patrick N; Lang, Rowan; Monroe, Amy L; Gaudiano, Paul

2015-05-01

Previous examinations of noise acceptance have been conducted using an auditory stimulus only; however, the effect of visual speech supplementation of the auditory stimulus on acceptance of noise remains limited. The purpose of the present study was to determine the effect of audiovisual stimulation on the acceptance of noise in listeners with normal and impaired hearing. A repeated measures design was utilized. A total of 92 adult participants were recruited for this experiment. Of these participants, 54 were listeners with normal hearing and 38 were listeners with sensorineural hearing impairment. Most comfortable levels and acceptable noise levels (ANL) were obtained using auditory and auditory-visual stimulation modes for the unaided listening condition for each participant and for the aided listening condition for 35 of the participants with impaired hearing that owned hearing aids. Speech reading ability was assessed using the Utley test for each participant. The addition of visual input did not impact the most comfortable level values for listeners in either group; however, visual input improved unaided ANL values for listeners with normal hearing and aided ANL values in listeners with impaired hearing. ANL benefit received from visual speech input was related to the auditory ANL in listeners in each group; however, it was not related to speech reading ability for either listener group in any experimental condition. Visual speech input can significantly impact measures of noise acceptance. The current ANL measure may not accurately reflect acceptance of noise values when in more realistic environments, where the signal of interest is both audible and visible to the listener. American Academy of Audiology.
Visual Cues Contribute Differentially to Audiovisual Perception of Consonants and Vowels in Improving Recognition and Reducing Cognitive Demands in Listeners With Hearing Impairment Using Hearing Aids.

Science.gov (United States)

Moradi, Shahram; Lidestam, Björn; Danielsson, Henrik; Ng, Elaine Hoi Ning; Rönnberg, Jerker

2017-09-18

We sought to examine the contribution of visual cues in audiovisual identification of consonants and vowels-in terms of isolation points (the shortest time required for correct identification of a speech stimulus), accuracy, and cognitive demands-in listeners with hearing impairment using hearing aids. The study comprised 199 participants with hearing impairment (mean age = 61.1 years) with bilateral, symmetrical, mild-to-severe sensorineural hearing loss. Gated Swedish consonants and vowels were presented aurally and audiovisually to participants. Linear amplification was adjusted for each participant to assure audibility. The reading span test was used to measure participants' working memory capacity. Audiovisual presentation resulted in shortened isolation points and improved accuracy for consonants and vowels relative to auditory-only presentation. This benefit was more evident for consonants than vowels. In addition, correlations and subsequent analyses revealed that listeners with higher scores on the reading span test identified both consonants and vowels earlier in auditory-only presentation, but only vowels (not consonants) in audiovisual presentation. Consonants and vowels differed in terms of the benefits afforded from their associative visual cues, as indicated by the degree of audiovisual benefit and reduction in cognitive demands linked to the identification of consonants and vowels presented audiovisually.
The duration of uncertain times: audiovisual information about intervals is integrated in a statistically optimal fashion.

Directory of Open Access Journals (Sweden)

Jess Hartcher-O'Brien

Full Text Available Often multisensory information is integrated in a statistically optimal fashion where each sensory source is weighted according to its precision. This integration scheme isstatistically optimal because it theoretically results in unbiased perceptual estimates with the highest precisionpossible.There is a current lack of consensus about how the nervous system processes multiple sensory cues to elapsed time.In order to shed light upon this, we adopt a computational approach to pinpoint the integration strategy underlying duration estimationof audio/visual stimuli. One of the assumptions of our computational approach is that the multisensory signals redundantly specify the same stimulus property. Our results clearly show that despite claims to the contrary, perceived duration is the result of an optimal weighting process, similar to that adopted for estimates of space. That is, participants weight the audio and visual information to arrive at the most precise, single duration estimate possible. The work also disentangles how different integration strategies - i.e. consideringthe time of onset/offset ofsignals - might alter the final estimate. As such we provide the first concrete evidence of an optimal integration strategy in human duration estimates.
Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

Science.gov (United States)

Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

2010-01-01

A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
Visemic Processing in Audiovisual Discrimination of Natural Speech: A Simultaneous fMRI-EEG Study

Science.gov (United States)

Dubois, Cyril; Otzenberger, Helene; Gounot, Daniel; Sock, Rudolph; Metz-Lutz, Marie-Noelle

2012-01-01

In a noisy environment, visual perception of articulatory movements improves natural speech intelligibility. Parallel to phonemic processing based on auditory signal, visemic processing constitutes a counterpart based on "visemes", the distinctive visual units of speech. Aiming at investigating the neural substrates of visemic processing in a…

Temporal processing of audiovisual stimuli is enhanced in musicians: evidence from magnetoencephalography (MEG.

Directory of Open Access Journals (Sweden)

Yao Lu

Full Text Available Numerous studies have demonstrated that the structural and functional differences between professional musicians and non-musicians are not only found within a single modality, but also with regard to multisensory integration. In this study we have combined psychophysical with neurophysiological measurements investigating the processing of non-musical, synchronous or various levels of asynchronous audiovisual events. We hypothesize that long-term multisensory experience alters temporal audiovisual processing already at a non-musical stage. Behaviorally, musicians scored significantly better than non-musicians in judging whether the auditory and visual stimuli were synchronous or asynchronous. At the neural level, the statistical analysis for the audiovisual asynchronous response revealed three clusters of activations including the ACC and the SFG and two bilaterally located activations in IFG and STG in both groups. Musicians, in comparison to the non-musicians, responded to synchronous audiovisual events with enhanced neuronal activity in a broad left posterior temporal region that covers the STG, the insula and the Postcentral Gyrus. Musicians also showed significantly greater activation in the left Cerebellum, when confronted with an audiovisual asynchrony. Taken together, our MEG results form a strong indication that long-term musical training alters the basic audiovisual temporal processing already in an early stage (direct after the auditory N1 wave, while the psychophysical results indicate that musical training may also provide behavioral benefits in the accuracy of the estimates regarding the timing of audiovisual events.
Temporal processing of audiovisual stimuli is enhanced in musicians: evidence from magnetoencephalography (MEG).

Science.gov (United States)

Lu, Yao; Paraskevopoulos, Evangelos; Herholz, Sibylle C; Kuchenbuch, Anja; Pantev, Christo

2014-01-01

Numerous studies have demonstrated that the structural and functional differences between professional musicians and non-musicians are not only found within a single modality, but also with regard to multisensory integration. In this study we have combined psychophysical with neurophysiological measurements investigating the processing of non-musical, synchronous or various levels of asynchronous audiovisual events. We hypothesize that long-term multisensory experience alters temporal audiovisual processing already at a non-musical stage. Behaviorally, musicians scored significantly better than non-musicians in judging whether the auditory and visual stimuli were synchronous or asynchronous. At the neural level, the statistical analysis for the audiovisual asynchronous response revealed three clusters of activations including the ACC and the SFG and two bilaterally located activations in IFG and STG in both groups. Musicians, in comparison to the non-musicians, responded to synchronous audiovisual events with enhanced neuronal activity in a broad left posterior temporal region that covers the STG, the insula and the Postcentral Gyrus. Musicians also showed significantly greater activation in the left Cerebellum, when confronted with an audiovisual asynchrony. Taken together, our MEG results form a strong indication that long-term musical training alters the basic audiovisual temporal processing already in an early stage (direct after the auditory N1 wave), while the psychophysical results indicate that musical training may also provide behavioral benefits in the accuracy of the estimates regarding the timing of audiovisual events.
Semantics and the multisensory brain: how meaning modulates processes of audio-visual integration.

Science.gov (United States)

Doehrmann, Oliver; Naumer, Marcus J

2008-11-25

By using meaningful stimuli, multisensory research has recently started to investigate the impact of stimulus content on crossmodal integration. Variations in this respect have often been termed as "semantic". In this paper we will review work related to the question for which tasks the influence of semantic factors has been found and which cortical networks are most likely to mediate these effects. More specifically, the focus of this paper will be on processing of object stimuli presented in the auditory and visual sensory modalities. Furthermore, we will investigate which cortical regions are particularly responsive to experimental variations of content by comparing semantically matching ("congruent") and mismatching ("incongruent") experimental conditions. In this context, recent neuroimaging studies point toward a possible functional differentiation of temporal and frontal cortical regions, with the former being more responsive to semantically congruent and the latter to semantically incongruent audio-visual (AV) stimulation. To account for these differential effects, we will suggest in the final section of this paper a possible synthesis of these data on semantic modulation of AV integration with findings from neuroimaging studies and theoretical accounts of semantic memory.
Plantilla 1: El documento audiovisual: elementos importantes

OpenAIRE

Alemany, Dolores

2011-01-01

Concepto de documento audiovisual y de documentación audiovisual, profundizando en la distinción de documentación de imagen en movimiento con posible incorporación de sonido frente al concepto de documentación audiovisual según plantea Jorge Caldera. Diferenciación entre documentos audiovisuales, obras audiovisuales y patrimonio audiovisual según Félix del Valle.
Effectiveness of an Integrated Phonological Awareness Approach for Children with Childhood Apraxia of Speech (CAS)

Science.gov (United States)

McNeill, Brigid C.; Gillon, Gail T.; Dodd, Barbara

2009-01-01

This study investigated the effectiveness of an integrated phonological awareness approach for children with childhood apraxia of speech (CAS). Change in speech, phonological awareness, letter knowledge, word decoding, and spelling skills were examined. A controlled multiple single-subject design was employed. Twelve children aged 4-7 years with…
A Novel Audiovisual Brain-Computer Interface and Its Application in Awareness Detection

Science.gov (United States)

Wang, Fei; He, Yanbin; Pan, Jiahui; Xie, Qiuyou; Yu, Ronghao; Zhang, Rui; Li, Yuanqing

2015-01-01

Currently, detecting awareness in patients with disorders of consciousness (DOC) is a challenging task, which is commonly addressed through behavioral observation scales such as the JFK Coma Recovery Scale-Revised. Brain-computer interfaces (BCIs) provide an alternative approach to detect awareness in patients with DOC. However, these patients have a much lower capability of using BCIs compared to healthy individuals. This study proposed a novel BCI using temporally, spatially, and semantically congruent audiovisual stimuli involving numbers (i.e., visual and spoken numbers). Subjects were instructed to selectively attend to the target stimuli cued by instruction. Ten healthy subjects first participated in the experiment to evaluate the system. The results indicated that the audiovisual BCI system outperformed auditory-only and visual-only systems. Through event-related potential analysis, we observed audiovisual integration effects for target stimuli, which enhanced the discriminability between brain responses for target and nontarget stimuli and thus improved the performance of the audiovisual BCI. This system was then applied to detect the awareness of seven DOC patients, five of whom exhibited command following as well as number recognition. Thus, this audiovisual BCI system may be used as a supportive bedside tool for awareness detection in patients with DOC. PMID:26123281
Catching Audiovisual Interactions With a First-Person Fisherman Video Game.

Science.gov (United States)

Sun, Yile; Hickey, Timothy J; Shinn-Cunningham, Barbara; Sekuler, Robert

2017-07-01

The human brain is excellent at integrating information from different sources across multiple sensory modalities. To examine one particularly important form of multisensory interaction, we manipulated the temporal correlation between visual and auditory stimuli in a first-person fisherman video game. Subjects saw rapidly swimming fish whose size oscillated, either at 6 or 8 Hz. Subjects categorized each fish according to its rate of size oscillation, while trying to ignore a concurrent broadband sound seemingly emitted by the fish. In three experiments, categorization was faster and more accurate when the rate at which a fish oscillated in size matched the rate at which the accompanying, task-irrelevant sound was amplitude modulated. Control conditions showed that the difference between responses to matched and mismatched audiovisual signals reflected a performance gain in the matched condition, rather than a cost from the mismatched condition. The performance advantage with matched audiovisual signals was remarkably robust over changes in task demands between experiments. Performance with matched or unmatched audiovisual signals improved over successive trials at about the same rate, emblematic of perceptual learning in which visual oscillation rate becomes more discriminable with experience. Finally, analysis at the level of individual subjects' performance pointed to differences in the rates at which subjects can extract information from audiovisual stimuli.
Visual Temporal Acuity Is Related to Auditory Speech Perception Abilities in Cochlear Implant Users.

Science.gov (United States)

Jahn, Kelly N; Stevenson, Ryan A; Wallace, Mark T

Despite significant improvements in speech perception abilities following cochlear implantation, many prelingually deafened cochlear implant (CI) recipients continue to rely heavily on visual information to develop speech and language. Increased reliance on visual cues for understanding spoken language could lead to the development of unique audiovisual integration and visual-only processing abilities in these individuals. Brain imaging studies have demonstrated that good CI performers, as indexed by auditory-only speech perception abilities, have different patterns of visual cortex activation in response to visual and auditory stimuli as compared with poor CI performers. However, no studies have examined whether speech perception performance is related to any type of visual processing abilities following cochlear implantation. The purpose of the present study was to provide a preliminary examination of the relationship between clinical, auditory-only speech perception tests, and visual temporal acuity in prelingually deafened adult CI users. It was hypothesized that prelingually deafened CI users, who exhibit better (i.e., more acute) visual temporal processing abilities would demonstrate better auditory-only speech perception performance than those with poorer visual temporal acuity. Ten prelingually deafened adult CI users were recruited for this study. Participants completed a visual temporal order judgment task to quantify visual temporal acuity. To assess auditory-only speech perception abilities, participants completed the consonant-nucleus-consonant word recognition test and the AzBio sentence recognition test. Results were analyzed using two-tailed partial Pearson correlations, Spearman's rho correlations, and independent samples t tests. Visual temporal acuity was significantly correlated with auditory-only word and sentence recognition abilities. In addition, proficient CI users, as assessed via auditory-only speech perception performance, demonstrated
Speech Perception as a Multimodal Phenomenon

OpenAIRE

Rosenblum, Lawrence D.

2008-01-01

Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...
Audiovisual Blindsight: Audiovisual learning in the absence of primary visual cortex

OpenAIRE

Mehrdad eSeirafi; Peter eDe Weerd; Alan J Pegna; Beatrice ede Gelder

2016-01-01

Learning audiovisual associations is mediated by the primary cortical areas; however, recent animal studies suggest that such learning can take place even in the absence of the primary visual cortex. Other studies have demonstrated the involvement of extra-geniculate pathways and especially the superior colliculus (SC) in audiovisual association learning. Here, we investigated such learning in a rare human patient with complete loss of the bilateral striate cortex. We carried out an implicit...
Integrating Music Therapy Services and Speech-Language Therapy Services for Children with Severe Communication Impairments: A Co-Treatment Model

Science.gov (United States)

Geist, Kamile; McCarthy, John; Rodgers-Smith, Amy; Porter, Jessica

2008-01-01

Documenting how music therapy can be integrated with speech-language therapy services for children with communication delay is not evident in the literature. In this article, a collaborative model with procedures, experiences, and communication outcomes of integrating music therapy with the existing speech-language services is given. Using…
Integrating Information from Speech and Physiological Signals to Achieve Emotional Sensitivity

DEFF Research Database (Denmark)

Kim, Jonghwa; André, Elisabeth; Rehm, Matthias

2005-01-01

Recently, there has been a significant amount of work on the recognition of emotions from speech and biosignals. Most approaches to emotion recognition so far concentrate on a single modality and do not take advantage of the fact that an integrated multimodal analysis may help to resolve...
Hearing faces: how the infant brain matches the face it sees with the speech it hears.

Science.gov (United States)

Bristow, Davina; Dehaene-Lambertz, Ghislaine; Mattout, Jeremie; Soares, Catherine; Gliga, Teodora; Baillet, Sylvain; Mangin, Jean-François

2009-05-01

Speech is not a purely auditory signal. From around 2 months of age, infants are able to correctly match the vowel they hear with the appropriate articulating face. However, there is no behavioral evidence of integrated audiovisual perception until 4 months of age, at the earliest, when an illusory percept can be created by the fusion of the auditory stimulus and of the facial cues (McGurk effect). To understand how infants initially match the articulatory movements they see with the sounds they hear, we recorded high-density ERPs in response to auditory vowels that followed a congruent or incongruent silently articulating face in 10-week-old infants. In a first experiment, we determined that auditory-visual integration occurs during the early stages of perception as in adults. The mismatch response was similar in timing and in topography whether the preceding vowels were presented visually or aurally. In the second experiment, we studied audiovisual integration in the linguistic (vowel perception) and nonlinguistic (gender perception) domain. We observed a mismatch response for both types of change at similar latencies. Their topographies were significantly different demonstrating that cross-modal integration of these features is computed in parallel by two different networks. Indeed, brain source modeling revealed that phoneme and gender computations were lateralized toward the left and toward the right hemisphere, respectively, suggesting that each hemisphere possesses an early processing bias. We also observed repetition suppression in temporal regions and repetition enhancement in frontal regions. These results underscore how complex and structured is the human cortical organization which sustains communication from the first weeks of life on.
Word-by-word entrainment of speech rhythm during joint story building

Directory of Open Access Journals (Sweden)

Tommi eHimberg

2015-06-01

Full Text Available Movements and behaviour synchronise during social interaction at many levels, often unintentionally. During smooth conversation, for example, participants adapt to each others' speech rates. Here we aimed to find out to which extent speakers adapt their turn-taking rhythms during a story-building game.Nine sex-matched dyads of adults (12 males, 6 females created two 5-min stories by contributing to them alternatingly one word at a time. The participants were located in different rooms, with audio connection during one story and audiovisual during the other. They were free to select the topic of the story.Although the participants received no instructions regarding the timing of the story building, their word rhythms were highly entrained (R ̅ = 0.70, p < 0.001 even though the rhythms as such were unstable (R ̅ = 0.14 for pooled data. Such high entrainment in the absence of steady word rhythm occurred in every individual story, independently of whether the subjects were connected via audio-only or audiovisual link.The observed entrainment was of similar strength as typical entrainment in finger-tapping tasks where participants are specifically instructed to synchronize their behaviour. Thus speech seems to spontaneously induce strong entrainment between the conversation partners, likely reflecting automatic alignment of their semantic and syntactic processes.
What Iconic Gesture Fragments Reveal about Gesture-Speech Integration: When Synchrony Is Lost, Memory Can Help

Science.gov (United States)

Obermeier, Christian; Holle, Henning; Gunter, Thomas C.

2011-01-01

The present series of experiments explores several issues related to gesture-speech integration and synchrony during sentence processing. To be able to more precisely manipulate gesture-speech synchrony, we used gesture fragments instead of complete gestures, thereby avoiding the usual long temporal overlap of gestures with their coexpressive…
The McGurk Illusion in the Oddity Task

DEFF Research Database (Denmark)

Andersen, Tobias

2010-01-01

Despite many studies of audiovisual integration in speech perception very few studies have addressed the issue of cross- modal response bias. Using synthetic acoustic speech, the current study demonstrates the McGurk illusion in the oddity task which is not prone to cross-modal response bias...
29 CFR 2.12 - Audiovisual coverage permitted.

Science.gov (United States)

2010-07-01

... 29 Labor 1 2010-07-01 2010-07-01 true Audiovisual coverage permitted. 2.12 Section 2.12 Labor Office of the Secretary of Labor GENERAL REGULATIONS Audiovisual Coverage of Administrative Hearings § 2.12 Audiovisual coverage permitted. The following are the types of hearings where the Department...
Audiovisual preservation strategies, data models and value-chains

OpenAIRE

Addis, Matthew; Wright, Richard

2010-01-01

This is a report on preservation strategies, models and value-chains for digital file-based audiovisual content. The report includes: (a)current and emerging value-chains and business-models for audiovisual preservation;(b) a comparison of preservation strategies for audiovisual content including their strengths and weaknesses, and(c) a review of current preservation metadata models, and requirements for extension to support audiovisual files.
Electrophysiological correlates of predictive coding of auditory location in the perception of natural audiovisual events.

Science.gov (United States)

Stekelenburg, Jeroen J; Vroomen, Jean

2012-01-01

In many natural audiovisual events (e.g., a clap of the two hands), the visual signal precedes the sound and thus allows observers to predict when, where, and which sound will occur. Previous studies have reported that there are distinct neural correlates of temporal (when) versus phonetic/semantic (which) content on audiovisual integration. Here we examined the effect of visual prediction of auditory location (where) in audiovisual biological motion stimuli by varying the spatial congruency between the auditory and visual parts. Visual stimuli were presented centrally, whereas auditory stimuli were presented either centrally or at 90° azimuth. Typical sub-additive amplitude reductions (AV - V audiovisual interaction was also found at 40-60 ms (P50) in the spatially congruent condition, while no effect of congruency was found on the suppression of the P2. This indicates that visual prediction of auditory location can be coded very early in auditory processing.
Enhanced audio-visual interactions in the auditory cortex of elderly cochlear-implant users.

Science.gov (United States)

Schierholz, Irina; Finke, Mareike; Schulte, Svenja; Hauthal, Nadine; Kantzke, Christoph; Rach, Stefan; Büchner, Andreas; Dengler, Reinhard; Sandmann, Pascale

2015-10-01

Auditory deprivation and the restoration of hearing via a cochlear implant (CI) can induce functional plasticity in auditory cortical areas. How these plastic changes affect the ability to integrate combined auditory (A) and visual (V) information is not yet well understood. In the present study, we used electroencephalography (EEG) to examine whether age, temporary deafness and altered sensory experience with a CI can affect audio-visual (AV) interactions in post-lingually deafened CI users. Young and elderly CI users and age-matched NH listeners performed a speeded response task on basic auditory, visual and audio-visual stimuli. Regarding the behavioral results, a redundant signals effect, that is, faster response times to cross-modal (AV) than to both of the two modality-specific stimuli (A, V), was revealed for all groups of participants. Moreover, in all four groups, we found evidence for audio-visual integration. Regarding event-related responses (ERPs), we observed a more pronounced visual modulation of the cortical auditory response at N1 latency (approximately 100 ms after stimulus onset) in the elderly CI users when compared with young CI users and elderly NH listeners. Thus, elderly CI users showed enhanced audio-visual binding which may be a consequence of compensatory strategies developed due to temporary deafness and/or degraded sensory input after implantation. These results indicate that the combination of aging, sensory deprivation and CI facilitates the coupling between the auditory and the visual modality. We suggest that this enhancement in multisensory interactions could be used to optimize auditory rehabilitation, especially in elderly CI users, by the application of strong audio-visually based rehabilitation strategies after implant switch-on. Copyright © 2015 Elsevier B.V. All rights reserved.

Does dynamic information about the speaker's face contribute to semantic speech processing? ERP evidence.

Science.gov (United States)

Hernández-Gutiérrez, David; Abdel Rahman, Rasha; Martín-Loeches, Manuel; Muñoz, Francisco; Schacht, Annekathrin; Sommer, Werner

2018-07-01

Face-to-face interactions characterize communication in social contexts. These situations are typically multimodal, requiring the integration of linguistic auditory input with facial information from the speaker. In particular, eye gaze and visual speech provide the listener with social and linguistic information, respectively. Despite the importance of this context for an ecological study of language, research on audiovisual integration has mainly focused on the phonological level, leaving aside effects on semantic comprehension. Here we used event-related potentials (ERPs) to investigate the influence of facial dynamic information on semantic processing of connected speech. Participants were presented with either a video or a still picture of the speaker, concomitant to auditory sentences. Along three experiments, we manipulated the presence or absence of the speaker's dynamic facial features (mouth and eyes) and compared the amplitudes of the semantic N400 elicited by unexpected words. Contrary to our predictions, the N400 was not modulated by dynamic facial information; therefore, semantic processing seems to be unaffected by the speaker's gaze and visual speech. Even though, during the processing of expected words, dynamic faces elicited a long-lasting late posterior positivity compared to the static condition. This effect was significantly reduced when the mouth of the speaker was covered. Our findings may indicate an increase of attentional processing to richer communicative contexts. The present findings also demonstrate that in natural communicative face-to-face encounters, perceiving the face of a speaker in motion provides supplementary information that is taken into account by the listener, especially when auditory comprehension is non-demanding. Copyright © 2018 Elsevier Ltd. All rights reserved.
Semantic congruency but not temporal synchrony enhances long-term memory performance for audio-visual scenes.

Science.gov (United States)

Meyerhoff, Hauke S; Huff, Markus

2016-04-01

Human long-term memory for visual objects and scenes is tremendous. Here, we test how auditory information contributes to long-term memory performance for realistic scenes. In a total of six experiments, we manipulated the presentation modality (auditory, visual, audio-visual) as well as semantic congruency and temporal synchrony between auditory and visual information of brief filmic clips. Our results show that audio-visual clips generally elicit more accurate memory performance than unimodal clips. This advantage even increases with congruent visual and auditory information. However, violations of audio-visual synchrony hardly have any influence on memory performance. Memory performance remained intact even with a sequential presentation of auditory and visual information, but finally declined when the matching tracks of one scene were presented separately with intervening tracks during learning. With respect to memory performance, our results therefore show that audio-visual integration is sensitive to semantic congruency but remarkably robust against asymmetries between different modalities.
Electrophysiological correlates of predictive coding of auditory location in the perception of natural audiovisual events

Directory of Open Access Journals (Sweden)

Jeroen eStekelenburg

2012-05-01

Full Text Available In many natural audiovisual events (e.g., a clap of the two hands, the visual signal precedes the sound and thus allows observers to predict when, where, and which sound will occur. Previous studies have already reported that there are distinct neural correlates of temporal (when versus phonetic/semantic (which content on audiovisual integration. Here we examined the effect of visual prediction of auditory location (where in audiovisual biological motion stimuli by varying the spatial congruency between the auditory and visual part of the audiovisual stimulus. Visual stimuli were presented centrally, whereas auditory stimuli were presented either centrally or at 90° azimuth. Typical subadditive amplitude reductions (AV – V < A were found for the auditory N1 and P2 for spatially congruent and incongruent conditions. The new finding is that the N1 suppression was larger for spatially congruent stimuli. A very early audiovisual interaction was also found at 30-50 ms in the spatially congruent condition, while no effect of congruency was found on the suppression of the P2. This indicates that visual prediction of auditory location can be coded very early in auditory processing.
Audiovisual Rehabilitation in Hemianopia: A Model-Based Theoretical Investigation.

Science.gov (United States)

Magosso, Elisa; Cuppini, Cristiano; Bertini, Caterina

2017-01-01

Hemianopic patients exhibit visual detection improvement in the blind field when audiovisual stimuli are given in spatiotemporally coincidence. Beyond this "online" multisensory improvement, there is evidence of long-lasting, "offline" effects induced by audiovisual training: patients show improved visual detection and orientation after they were trained to detect and saccade toward visual targets given in spatiotemporal proximity with auditory stimuli. These effects are ascribed to the Superior Colliculus (SC), which is spared in these patients and plays a pivotal role in audiovisual integration and oculomotor behavior. Recently, we developed a neural network model of audiovisual cortico-collicular loops, including interconnected areas representing the retina, striate and extrastriate visual cortices, auditory cortex, and SC. The network simulated unilateral V1 lesion with possible spared tissue and reproduced "online" effects. Here, we extend the previous network to shed light on circuits, plastic mechanisms, and synaptic reorganization that can mediate the training effects and functionally implement visual rehabilitation. The network is enriched by the oculomotor SC-brainstem route, and Hebbian mechanisms of synaptic plasticity, and is used to test different training paradigms (audiovisual/visual stimulation in eye-movements/fixed-eyes condition) on simulated patients. Results predict different training effects and associate them to synaptic changes in specific circuits. Thanks to the SC multisensory enhancement, the audiovisual training is able to effectively strengthen the retina-SC route, which in turn can foster reinforcement of the SC-brainstem route (this occurs only in eye-movements condition) and reinforcement of the SC-extrastriate route (this occurs in presence of survived V1 tissue, regardless of eye condition). The retina-SC-brainstem circuit may mediate compensatory effects: the model assumes that reinforcement of this circuit can translate visual
Audiovisual Styling and the Film Experience

DEFF Research Database (Denmark)

Langkjær, Birger

2015-01-01

Approaches to music and audiovisual meaning in film appear to be very different in nature and scope when considered from the point of view of experimental psychology or humanistic studies. Nevertheless, this article argues that experimental studies square with ideas of audiovisual perception...... and meaning in humanistic film music studies in two ways: through studies of vertical synchronous interaction and through studies of horizontal narrative effects. Also, it is argued that the combination of insights from quantitative experimental studies and qualitative audiovisual film analysis may actually...... be combined into a more complex understanding of how audiovisual features interact in the minds of their audiences. This is demonstrated through a review of a series of experimental studies. Yet, it is also argued that textual analysis and concepts from within film and music studies can provide insights...
Mismatch negativity evoked by the McGurk-MacDonald effect: a phonetic representation within short-term memory.

Science.gov (United States)

Colin, C; Radeau, M; Soquet, A; Demolin, D; Colin, F; Deltenre, P

2002-04-01

The McGurk-MacDonald illusory percept is obtained by dubbing an incongruent articulatory movement on an auditory phoneme. This type of audiovisual speech perception contributes to the assessment of theories of speech perception. The mismatch negativity (MMN) reflects the detection of a deviant stimulus within the auditory short-term memory and besides an acoustic component, possesses, under certain conditions, a phonetic one. The present study assessed the existence of an MMN evoked by McGurk-MacDonald percepts elicited by audiovisual stimuli with constant auditory components. Cortical evoked potentials were recorded using the oddball paradigm on 8 adults in 3 experimental conditions: auditory alone, visual alone and audiovisual stimulation. Obtaining illusory percepts was confirmed in an additional psychophysical condition. The auditory deviant syllables and the audiovisual incongruent syllables elicited a significant MMN at Fz. In the visual condition, no negativity was observed either at Fz, or at O(z). An MMN can be evoked by visual articulatory deviants, provided they are presented in a suitable auditory context leading to a phonetically significant interaction. The recording of an MMN elicited by illusory McGurk percepts suggests that audiovisual integration mechanisms in speech take place rather early during the perceptual processes.
La regulación audiovisual: argumentos a favor y en contra The audio-visual regulation: the arguments for and against

Directory of Open Access Journals (Sweden)

Jordi Sopena Palomar

2008-03-01

Full Text Available El artículo analiza la efectividad de la regulación audiovisual y valora los diversos argumentos a favor y en contra de la existencia de consejos reguladores a nivel estatal. El debate sobre la necesidad de un organismo de este calado en España todavía persiste. La mayoría de los países comunitarios se han dotado de consejos competentes en esta materia, como es el caso del OFCOM en el Reino Unido o el CSA en Francia. En España, la regulación audiovisual se limita a organismos de alcance autonómico, como son el Consejo Audiovisual de Navarra, el de Andalucía y el Consell de l’Audiovisual de Catalunya (CAC, cuyo modelo también es abordado en este artículo. The article analyzes the effectiveness of the audio-visual regulation and assesses the different arguments for and against the existence of the broadcasting authorities at the state level. The debate of the necessity of a Spanish organism of regulation is still active. Most of the European countries have created some competent authorities, like the OFCOM in United Kingdom and the CSA in France. In Spain, the broadcasting regulation is developed by regional organisms, like the Consejo Audiovisual de Navarra, the Consejo Audiovisual de Andalucía and the Consell de l’Audiovisual de Catalunya (CAC, whose case is also studied in this article.
Perception of co-speech gestures in aphasic patients: a visual exploration study during the observation of dyadic conversations.

Science.gov (United States)

Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M

2015-03-01

Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.
Content congruency and its interplay with temporal synchrony modulate integration between rhythmic audiovisual streams

Directory of Open Access Journals (Sweden)

Yi-Huang eSu

2014-12-01

Full Text Available Both lower-level stimulus factors (e.g., temporal proximity and higher-level cognitive factors (e.g., content congruency are known to influence multisensory integration. The former can direct attention in a converging manner, and the latter can indicate whether information from the two modalities belongs together. The present research investigated whether and how these two factors interacted in the perception of rhythmic, audiovisual streams derived from a human movement scenario. Congruency here was based on sensorimotor correspondence pertaining to rhythm perception. Participants attended to bimodal stimuli consisting of a humanlike figure moving regularly to a sequence of auditory beat, and detected a possible auditory temporal deviant. The figure moved either downwards (congruently or upwards (incongruently to the downbeat, while in both situations the movement was either synchronous with the beat, or lagging behind it. Greater cross-modal binding was expected to hinder deviant detection. Results revealed poorer detection for congruent than for incongruent streams, suggesting stronger integration in the former. False alarms increased in asynchronous stimuli only for congruent streams, indicating greater tendency for deviant report due to visual capture of asynchronous auditory events. In addition, a greater increase in perceived synchrony was associated with a greater reduction in false alarms for congruent streams, while the pattern was reversed for incongruent ones. These results demonstrate that content congruency as a top-down factor not only promotes integration, but also modulates bottom-up effects of synchrony. Results are also discussed regarding how theories of integration and attentional entrainment may be combined in the context of rhythmic multisensory stimuli.
Content congruency and its interplay with temporal synchrony modulate integration between rhythmic audiovisual streams.

Science.gov (United States)

Su, Yi-Huang

2014-01-01

Both lower-level stimulus factors (e.g., temporal proximity) and higher-level cognitive factors (e.g., content congruency) are known to influence multisensory integration. The former can direct attention in a converging manner, and the latter can indicate whether information from the two modalities belongs together. The present research investigated whether and how these two factors interacted in the perception of rhythmic, audiovisual (AV) streams derived from a human movement scenario. Congruency here was based on sensorimotor correspondence pertaining to rhythm perception. Participants attended to bimodal stimuli consisting of a humanlike figure moving regularly to a sequence of auditory beat, and detected a possible auditory temporal deviant. The figure moved either downwards (congruently) or upwards (incongruently) to the downbeat, while in both situations the movement was either synchronous with the beat, or lagging behind it. Greater cross-modal binding was expected to hinder deviant detection. Results revealed poorer detection for congruent than for incongruent streams, suggesting stronger integration in the former. False alarms increased in asynchronous stimuli only for congruent streams, indicating greater tendency for deviant report due to visual capture of asynchronous auditory events. In addition, a greater increase in perceived synchrony was associated with a greater reduction in false alarms for congruent streams, while the pattern was reversed for incongruent ones. These results demonstrate that content congruency as a top-down factor not only promotes integration, but also modulates bottom-up effects of synchrony. Results are also discussed regarding how theories of integration and attentional entrainment may be combined in the context of rhythmic multisensory stimuli.
Audiovisual preconditioning enhances the efficacy of an anatomical dissection course: A randomised study.

Science.gov (United States)

Collins, Anne M; Quinlan, Christine S; Dolan, Roisin T; O'Neill, Shane P; Tierney, Paul; Cronin, Kevin J; Ridgway, Paul F

2015-07-01

The benefits of incorporating audiovisual materials into learning are well recognised. The outcome of integrating such a modality in to anatomical education has not been reported previously. The aim of this randomised study was to determine whether audiovisual preconditioning is a useful adjunct to learning at an upper limb dissection course. Prior to instruction participants completed a standardised pre course multiple-choice questionnaire (MCQ). The intervention group was subsequently shown a video with a pre-recorded commentary. Following initial dissection, both groups completed a second MCQ. The final MCQ was completed at the conclusion of the course. Statistical analysis confirmed a significant improvement in the performance in both groups over the duration of the three MCQs. The intervention group significantly outperformed their control group counterparts immediately following audiovisual preconditioning and in the post course MCQ. Audiovisual preconditioning is a practical and effective tool that should be incorporated in to future course curricula to optimise learning. Level of evidence This study appraises an intervention in medical education. Kirkpatrick Level 2b (modification of knowledge). Copyright © 2015 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. All rights reserved.
Visual information constrains early and late stages of spoken-word recognition in sentence context.

Science.gov (United States)

Brunellière, Angèle; Sánchez-García, Carolina; Ikumi, Nara; Soto-Faraco, Salvador

2013-07-01

Audiovisual speech perception has been frequently studied considering phoneme, syllable and word processing levels. Here, we examined the constraints that visual speech information might exert during the recognition of words embedded in a natural sentence context. We recorded event-related potentials (ERPs) to words that could be either strongly or weakly predictable on the basis of the prior semantic sentential context and, whose initial phoneme varied in the degree of visual saliency from lip movements. When the sentences were presented audio-visually (Experiment 1), words weakly predicted from semantic context elicited a larger long-lasting N400, compared to strongly predictable words. This semantic effect interacted with the degree of visual saliency over a late part of the N400. When comparing audio-visual versus auditory alone presentation (Experiment 2), the typical amplitude-reduction effect over the auditory-evoked N100 response was observed in the audiovisual modality. Interestingly, a specific benefit of high- versus low-visual saliency constraints occurred over the early N100 response and at the late N400 time window, confirming the result of Experiment 1. Taken together, our results indicate that the saliency of visual speech can exert an influence over both auditory processing and word recognition at relatively late stages, and thus suggest strong interactivity between audio-visual integration and other (arguably higher) stages of information processing during natural speech comprehension. Copyright © 2013 Elsevier B.V. All rights reserved.
Finding the Correspondence of Audio-Visual Events by Object Manipulation

Science.gov (United States)

Nishibori, Kento; Takeuchi, Yoshinori; Matsumoto, Tetsuya; Kudo, Hiroaki; Ohnishi, Noboru

A human being understands the objects in the environment by integrating information obtained by the senses of sight, hearing and touch. In this integration, active manipulation of objects plays an important role. We propose a method for finding the correspondence of audio-visual events by manipulating an object. The method uses the general grouping rules in Gestalt psychology, i.e. “simultaneity” and “similarity” among motion command, sound onsets and motion of the object in images. In experiments, we used a microphone, a camera, and a robot which has a hand manipulator. The robot grasps an object like a bell and shakes it or grasps an object like a stick and beat a drum in a periodic, or non-periodic motion. Then the object emits periodical/non-periodical events. To create more realistic scenario, we put other event source (a metronome) in the environment. As a result, we had a success rate of 73.8 percent in finding the correspondence between audio-visual events (afferent signal) which are relating to robot motion (efferent signal).
Decreased BOLD responses in audiovisual processing

NARCIS (Netherlands)

Wiersinga-Post, Esther; Tomaskovic, Sonja; Slabu, Lavinia; Renken, Remco; de Smit, Femke; Duifhuis, Hendrikus

2010-01-01

Audiovisual processing was studied in a functional magnetic resonance imaging study using the McGurk effect. Perceptual responses and the brain activity patterns were measured as a function of audiovisual delay. In several cortical and subcortical brain areas, BOLD responses correlated negatively
The development of audiovisual multisensory integration across childhood and early adolescence: a high-density electrical mapping study.

Science.gov (United States)

Brandwein, Alice B; Foxe, John J; Russo, Natalie N; Altschuler, Ted S; Gomes, Hilary; Molholm, Sophie

2011-05-01

The integration of multisensory information is essential to forming meaningful representations of the environment. Adults benefit from related multisensory stimuli but the extent to which the ability to optimally integrate multisensory inputs for functional purposes is present in children has not been extensively examined. Using a cross-sectional approach, high-density electrical mapping of event-related potentials (ERPs) was combined with behavioral measures to characterize neurodevelopmental changes in basic audiovisual (AV) integration from middle childhood through early adulthood. The data indicated a gradual fine-tuning of multisensory facilitation of performance on an AV simple reaction time task (as indexed by race model violation), which reaches mature levels by about 14 years of age. They also revealed a systematic relationship between age and the brain processes underlying multisensory integration (MSI) in the time frame of the auditory N1 ERP component (∼ 120 ms). A significant positive correlation between behavioral and neurophysiological measures of MSI suggested that the underlying brain processes contributed to the fine-tuning of multisensory facilitation of behavior that was observed over middle childhood. These findings are consistent with protracted plasticity in a dynamic system and provide a starting point from which future studies can begin to examine the developmental course of multisensory processing in clinical populations.
Learning sparse generative models of audiovisual signals

OpenAIRE

Monaci, Gianluca; Sommer, Friedrich T.; Vandergheynst, Pierre

2008-01-01

This paper presents a novel framework to learn sparse represen- tations for audiovisual signals. An audiovisual signal is modeled as a sparse sum of audiovisual kernels. The kernels are bimodal functions made of synchronous audio and video components that can be positioned independently and arbitrarily in space and time. We design an algorithm capable of learning sets of such audiovi- sual, synchronous, shift-invariant functions by alternatingly solving a coding and a learning pr...
Fusion for Audio-Visual Laughter Detection

NARCIS (Netherlands)

Reuderink, B.

2007-01-01

Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed
Sensory integration dysfunction affects efficacy of speech therapy on children with functional articulation disorders

Directory of Open Access Journals (Sweden)

Tung LC

2013-01-01

Full Text Available Li-Chen Tung,1,# Chin-Kai Lin,2,# Ching-Lin Hsieh,3,4 Ching-Chi Chen,1 Chin-Tsan Huang,1 Chun-Hou Wang5,6 1Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, Tainan, 2Program of Early Intervention, Department of Early Childhood Education, National Taichung University of Education, Taichung, 3School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei, 4Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Taipei, 5School of Physical Therapy, College of Medical Science and Technology, Chung Shan Medical University, Taichung, 6Physical Therapy Room, Chung Shan Medical University Hospital, Taichung, Taiwan#These authors contributed equally Background: Articulation disorders in young children are due to defects occurring at a certain stage in sensory and motor development. Some children with functional articulation disorders may also have sensory integration dysfunction (SID. We hypothesized that speech therapy would be less efficacious in children with SID than in those without SID. Hence, the purpose of this study was to compare the efficacy of speech therapy in two groups of children with functional articulation disorders: those without and those with SID.Method: A total of 30 young children with functional articulation disorders were divided into two groups, the no-SID group (15 children and the SID group (15 children. The number of pronunciation mistakes was evaluated before and after speech therapy.Results: There were no statistically significant differences in age, sex, sibling order, education of parents, and pretest number of mistakes in pronunciation between the two groups (P > 0.05. The mean and standard deviation in the pre- and posttest number of mistakes in pronunciation were 10.5 ± 3.2 and 3.3 ± 3.3 in the no-SID group, and 10.1 ± 2.9 and 6.9 ± 3.5 in the SID group, respectively. Results showed great changes after speech therapy treatment (F
The neural processing of foreign-accented speech and its relationship to listener bias

Directory of Open Access Journals (Sweden)

Han-Gyol eYi

2014-10-01

Full Text Available Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners’ perceptual ability to process Korean-accented English audiovisual speech (Yi et al., 2013. Here, we examine the neural mechanism underlying the influence of listener bias to foreign faces on speech perception. In a functional magnetic resonance imaging (fMRI study, native English speakers listened to native- and Korean-accented English sentences, with or without faces. The participants’ Asian-foreign association was measured using an implicit association test (IAT, conducted outside the scanner. We found that foreign-accented speech evoked greater activity in the bilateral primary auditory cortices and the inferior frontal gyri, potentially reflecting greater computational demand. Higher IAT scores, indicating greater bias, were associated with increased BOLD response to foreign-accented speech with faces in the primary auditory cortex, the early node for spectrotemporal analysis. We conclude the following: (1 foreign-accented speech perception places greater demand on the neural systems underlying speech perception; (2 face of the talker can exaggerate the perceived foreignness of foreign-accented speech; (3 implicit Asian-foreign association is associated with decreased neural efficiency in early spectrotemporal processing.
Real-time speech-driven animation of expressive talking faces

Science.gov (United States)

Liu, Jia; You, Mingyu; Chen, Chun; Song, Mingli

2011-05-01

In this paper, we present a real-time facial animation system in which speech drives mouth movements and facial expressions synchronously. Considering five basic emotions, a hierarchical structure with an upper layer of emotion classification is established. Based on the recognized emotion label, the under-layer classification at sub-phonemic level has been modelled on the relationship between acoustic features of frames and audio labels in phonemes. Using certain constraint, the predicted emotion labels of speech are adjusted to gain the facial expression labels which are combined with sub-phonemic labels. The combinations are mapped into facial action units (FAUs), and audio-visual synchronized animation with mouth movements and facial expressions is generated by morphing between FAUs. The experimental results demonstrate that the two-layer structure succeeds in both emotion and sub-phonemic classifications, and the synthesized facial sequences reach a comparative convincing quality.

Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech.

Science.gov (United States)

Broderick, Michael P; Anderson, Andrew J; Di Liberto, Giovanni M; Crosse, Michael J; Lalor, Edmund C

2018-03-05

People routinely hear and understand speech at rates of 120-200 words per minute [1, 2]. Thus, speech comprehension must involve rapid, online neural mechanisms that process words' meanings in an approximately time-locked fashion. However, electrophysiological evidence for such time-locked processing has been lacking for continuous speech. Although valuable insights into semantic processing have been provided by the "N400 component" of the event-related potential [3-6], this literature has been dominated by paradigms using incongruous words within specially constructed sentences, with less emphasis on natural, narrative speech comprehension. Building on the discovery that cortical activity "tracks" the dynamics of running speech [7-9] and psycholinguistic work demonstrating [10-12] and modeling [13-15] how context impacts on word processing, we describe a new approach for deriving an electrophysiological correlate of natural speech comprehension. We used a computational model [16] to quantify the meaning carried by words based on how semantically dissimilar they were to their preceding context and then regressed this measure against electroencephalographic (EEG) data recorded from subjects as they listened to narrative speech. This produced a prominent negativity at a time lag of 200-600 ms on centro-parietal EEG channels, characteristics common to the N400. Applying this approach to EEG datasets involving time-reversed speech, cocktail party attention, and audiovisual speech-in-noise demonstrated that this response was very sensitive to whether or not subjects understood the speech they heard. These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion. Copyright © 2018 Elsevier Ltd. All rights reserved.
The brain's dorsal route for speech represents word meaning: evidence from gesture.

Science.gov (United States)

Josse, Goulven; Joseph, Sabine; Bertasi, Eric; Giraud, Anne-Lise

2012-01-01

The dual-route model of speech processing includes a dorsal stream that maps auditory to motor features at the sublexical level rather than at the lexico-semantic level. However, the literature on gesture is an invitation to revise this model because it suggests that the premotor cortex of the dorsal route is a major site of lexico-semantic interaction. Here we investigated lexico-semantic mapping using word-gesture pairs that were either congruent or incongruent. Using fMRI-adaptation in 28 subjects, we found that temporo-parietal and premotor activity during auditory processing of single action words was modulated by the prior audiovisual context in which the words had been repeated. The BOLD signal was suppressed following repetition of the auditory word alone, and further suppressed following repetition of the word accompanied by a congruent gesture (e.g. ["grasp" + grasping gesture]). Conversely, repetition suppression was not observed when the same action word was accompanied by an incongruent gesture (e.g. ["grasp" + sprinkle]). We propose a simple model to explain these results: auditory and visual information converge onto premotor cortex where it is represented in a comparable format to determine (in)congruence between speech and gesture. This ability of the dorsal route to detect audiovisual semantic (in)congruence suggests that its function is not restricted to the sublexical level.
Inactivation of Primate Prefrontal Cortex Impairs Auditory and Audiovisual Working Memory.

Science.gov (United States)

Plakke, Bethany; Hwang, Jaewon; Romanski, Lizabeth M

2015-07-01

The prefrontal cortex is associated with cognitive functions that include planning, reasoning, decision-making, working memory, and communication. Neurophysiology and neuropsychology studies have established that dorsolateral prefrontal cortex is essential in spatial working memory while the ventral frontal lobe processes language and communication signals. Single-unit recordings in nonhuman primates has shown that ventral prefrontal (VLPFC) neurons integrate face and vocal information and are active during audiovisual working memory. However, whether VLPFC is essential in remembering face and voice information is unknown. We therefore trained nonhuman primates in an audiovisual working memory paradigm using naturalistic face-vocalization movies as memoranda. We inactivated VLPFC, with reversible cortical cooling, and examined performance when faces, vocalizations or both faces and vocalization had to be remembered. We found that VLPFC inactivation impaired subjects' performance in audiovisual and auditory-alone versions of the task. In contrast, VLPFC inactivation did not disrupt visual working memory. Our studies demonstrate the importance of VLPFC in auditory and audiovisual working memory for social stimuli but suggest a different role for VLPFC in unimodal visual processing. The ventral frontal lobe, or inferior frontal gyrus, plays an important role in audiovisual communication in the human brain. Studies with nonhuman primates have found that neurons within ventral prefrontal cortex (VLPFC) encode both faces and vocalizations and that VLPFC is active when animals need to remember these social stimuli. In the present study, we temporarily inactivated VLPFC by cooling the cortex while nonhuman primates performed a working memory task. This impaired the ability of subjects to remember a face and vocalization pair or just the vocalization alone. Our work highlights the importance of the primate VLPFC in the processing of faces and vocalizations in a manner that
SPEECH VISUALIZATION SISTEM AS A BASIS FOR SPEECH TRAINING AND COMMUNICATION AIDS

Directory of Open Access Journals (Sweden)

Oliana KRSTEVA

1997-09-01

Full Text Available One receives much more information through a visual sense than through a tactile one. However, most visual aids for hearing-impaired persons are not wearable because it is difficult to make them compact and it is not a best way to mask always their vision.Generally it is difficult to get the integrated patterns by a single mathematical transform of signals, such as a Foruier transform. In order to obtain the integrated pattern speech parameters should be carefully extracted by an analysis according as each parameter, and a visual pattern, which can intuitively be understood by anyone, must be synthesized from them. Successful integration of speech parameters will never disturb understanding of individual features, so that the system can be used for speech training and communication.
Cross-modal enhancement of speech detection in young and older adults: does signal content matter?

Science.gov (United States)

Tye-Murray, Nancy; Spehar, Brent; Myerson, Joel; Sommers, Mitchell S; Hale, Sandra

2011-01-01

The purpose of the present study was to examine the effects of age and visual content on cross-modal enhancement of auditory speech detection. Visual content consisted of three clearly distinct types of visual information: an unaltered video clip of a talker's face, a low-contrast version of the same clip, and a mouth-like Lissajous figure. It was hypothesized that both young and older adults would exhibit reduced enhancement as visual content diverged from the original clip of the talker's face, but that the decrease would be greater for older participants. Nineteen young adults and 19 older adults were asked to detect a single spoken syllable (/ba/) in speech-shaped noise, and the level of the signal was adaptively varied to establish the signal-to-noise ratio (SNR) at threshold. There was an auditory-only baseline condition and three audiovisual conditions in which the syllable was accompanied by one of the three visual signals (the unaltered clip of the talker's face, the low-contrast version of that clip, or the Lissajous figure). For each audiovisual condition, the SNR at threshold was compared with the SNR at threshold for the auditory-only condition to measure the amount of cross-modal enhancement. Young adults exhibited significant cross-modal enhancement with all three types of visual stimuli, with the greatest amount of enhancement observed for the unaltered clip of the talker's face. Older adults, in contrast, exhibited significant cross-modal enhancement only with the unaltered face. Results of this study suggest that visual signal content affects cross-modal enhancement of speech detection in both young and older adults. They also support a hypothesized age-related deficit in processing low-contrast visual speech stimuli, even in older adults with normal contrast sensitivity.
Seeing the Talker's Face Improves Free Recall of Speech for Young Adults With Normal Hearing but Not Older Adults With Hearing Loss.

Science.gov (United States)

Rudner, Mary; Mishra, Sushmit; Stenfelt, Stefan; Lunner, Thomas; Rönnberg, Jerker

2016-06-01

Seeing the talker's face improves speech understanding in noise, possibly releasing resources for cognitive processing. We investigated whether it improves free recall of spoken two-digit numbers. Twenty younger adults with normal hearing and 24 older adults with hearing loss listened to and subsequently recalled lists of 13 two-digit numbers, with alternating male and female talkers. Lists were presented in quiet as well as in stationary and speech-like noise at a signal-to-noise ratio giving approximately 90% intelligibility. Amplification compensated for loss of audibility. Seeing the talker's face improved free recall performance for the younger but not the older group. Poorer performance in background noise was contingent on individual differences in working memory capacity. The effect of seeing the talker's face did not differ in quiet and noise. We have argued that the absence of an effect of seeing the talker's face for older adults with hearing loss may be due to modulation of audiovisual integration mechanisms caused by an interaction between task demands and participant characteristics. In particular, we suggest that executive task demands and interindividual executive skills may play a key role in determining the benefit of seeing the talker's face during a speech-based cognitive task.
Reproducibility and discriminability of brain patterns of semantic categories enhanced by congruent audiovisual stimuli.

Directory of Open Access Journals (Sweden)

Yuanqing Li

Full Text Available One of the central questions in cognitive neuroscience is the precise neural representation, or brain pattern, associated with a semantic category. In this study, we explored the influence of audiovisual stimuli on the brain patterns of concepts or semantic categories through a functional magnetic resonance imaging (fMRI experiment. We used a pattern search method to extract brain patterns corresponding to two semantic categories: "old people" and "young people." These brain patterns were elicited by semantically congruent audiovisual, semantically incongruent audiovisual, unimodal visual, and unimodal auditory stimuli belonging to the two semantic categories. We calculated the reproducibility index, which measures the similarity of the patterns within the same category. We also decoded the semantic categories from these brain patterns. The decoding accuracy reflects the discriminability of the brain patterns between two categories. The results showed that both the reproducibility index of brain patterns and the decoding accuracy were significantly higher for semantically congruent audiovisual stimuli than for unimodal visual and unimodal auditory stimuli, while the semantically incongruent stimuli did not elicit brain patterns with significantly higher reproducibility index or decoding accuracy. Thus, the semantically congruent audiovisual stimuli enhanced the within-class reproducibility of brain patterns and the between-class discriminability of brain patterns, and facilitate neural representations of semantic categories or concepts. Furthermore, we analyzed the brain activity in superior temporal sulcus and middle temporal gyrus (STS/MTG. The strength of the fMRI signal and the reproducibility index were enhanced by the semantically congruent audiovisual stimuli. Our results support the use of the reproducibility index as a potential tool to supplement the fMRI signal amplitude for evaluating multimodal integration.
Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

Science.gov (United States)

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Audiovisual Interaction

DEFF Research Database (Denmark)

Karandreas, Theodoros-Alexandros

in a manner that allowed the subjective audiovisual evaluation of loudspeakers under controlled conditions. Additionally, unimodal audio and visual evaluations were used as a baseline for comparison. The same procedure was applied in the investigation of the validity of less than optimal stimuli presentations...
A speech reception in noise test for preschool children (the Galker-test)

DEFF Research Database (Denmark)

Lauritsen, Maj-Britt Glenn; Kreiner, Svend; Söderström, Margareta

2015-01-01

Purpose: This study evaluates initial validity and reliability of the “Galker test of speech reception in noise” developed for Danish preschool children suspected to have problems with hearing or understanding speech against strict psychometric standards and assesses acceptance by the children....... Methods:The Galker test is an audio-visual, computerised, word discrimination test in background noise, originally comprised of 50 word pairs. Three hundred and eighty eight children attending ordinary day care centres and aged 3–5 years were included. With multiple regression and the Rasch item response...... model it was examined whether the total score of the Galker test validly reflected item responses across subgroups defined by sex, age, bilingualism, tympanometry, audiometry and verbal comprehension. Results: A total of 370 children (95%) accepted testing and 339 (87%) completed all 50 items...
Audiovisual Review

Science.gov (United States)

Physiology Teacher, 1976

1976-01-01

Lists and reviews recent audiovisual materials in areas of medical, dental, nursing and allied health, and veterinary medicine; undergraduate, and high school studies. Each is classified as to level, type of instruction, usefulness, and source of availability. Topics include respiration, renal physiology, muscle mechanics, anatomy, evolution,…
The effectiveness of Speech-Music Therapy for Aphasia (SMTA) in five speakers with Apraxia of Speech and aphasia

NARCIS (Netherlands)

Hurkmans, Joost; Jonkers, Roel; de Bruijn, Madeleen; Boonstra, Anne M.; Hartman, Paul P.; Arendzen, Hans; Reinders - Messelink, Heelen

2015-01-01

Background: Several studies using musical elements in the treatment of neurological language and speech disorders have reported improvement of speech production. One such programme, Speech-Music Therapy for Aphasia (SMTA), integrates speech therapy and music therapy (MT) to treat the individual with
La protección de los menores en la política audiovisual de la Unión Europea : un objetivo prioritario

Directory of Open Access Journals (Sweden)

Juan María Martínez Otero

2012-05-01

Full Text Available En el nuevo paisaje mediático y digital, los menores de edad se enfrentan a numerosos peligros relacionados con el ejercicio de las libertades informativas, derivados de su acceso o participación en contenidos ilícitos o nocivos en el entorno audiovisual. Estos contenidos perjudiciales para los más pequeños pueden conculcar varios de sus derechos: al correcto desarrollo de su personalidad, al honor, a la intimidad, a la propia imagen, a la protección de sus datos personales, a su indemnidad sexual, etc. Por ello, a muy diferentes niveles, la protección de los menores frente a los abusos de las libertades informativas ha sido reconocida como un límite al ejercicio de las mismas, al tiempo que desde instancias nacionales y supranacionales se han ensayado medidas tendentes a proteger a los menores en el ámbito de los medios de comunicación. La política audiovisual europea ha prestado desde sus orígenes una especial atención a esta cuestión, de modo particular en la Directiva Televisión Sin Fronteras, renombrada tras su última reforma como la Directiva de Servicios de Comunicación Audiovisual. El artículo describe los diferentes instrumentos desarrollados por la UE para proteger a los más pequeños en el ámbito de los medios, realizando un recorrido descendente desde los instrumentos jurídicamente vinculantes (Carta de Derechos Fundamentales de la Unión y directivas, pasando por documentos de trabajo y recomendaciones (libros verdes, recomendaciones y concluyendo con la descripción de políticas comunitarias concretas. El autor subraya el liderazgo que la UE ha desempeñado en la causa de la construcción de un espacio público audiovisual respetuoso con los derechos de todos, también de los niños y adolescentes. In the new digital media landscape, children face many dangers associated with the exercise of freedom of speech and information. This risky situation is derived from their access or participation in illicit or
On the Role of Crossmodal Prediction in Audiovisual Emotion Perception

Directory of Open Access Journals (Sweden)

Sarah eJessen

2013-07-01

Full Text Available Humans rely on multiple sensory modalities to determine the emotional state of others. In fact, such multisensory perception may be one of the mechanisms explaining the ease and efficiency by which others’ emotions are recognized. But how and when exactly do the different modalities interact? One aspect in multisensory perception that has received increasing interest in recent years is the concept of crossmodal prediction. In emotion perception, as in most other settings, visual information precedes the auditory one. Thereby, leading in visual information can facilitate subsequent auditory processing. While this mechanism has often been described in audiovisual speech perception, it has not been addressed so far in audiovisual emotion perception. Based on the current state of the art in (a crossmodal prediction and (b multisensory emotion perception research, we propose that it is essential to consider the former in order to fully understand the latter. Focusing on electroencephalographic (EEG and magnetoencephalographic (MEG studies, we provide a brief overview of the current research in both fields. In discussing these findings, we suggest that emotional visual information may allow for a more reliable prediction of auditory information compared to non-emotional visual information. In support of this hypothesis, we present a re-analysis of a previous data set that shows an inverse correlation between the N1 response in the EEG and the duration of visual emotional but not non-emotional information. If the assumption that emotional content allows for more reliable predictions can be corroborated in future studies, crossmodal prediction is a crucial factor in our understanding of multisensory emotion perception.
On the role of crossmodal prediction in audiovisual emotion perception.

Science.gov (United States)

Jessen, Sarah; Kotz, Sonja A

2013-01-01

Humans rely on multiple sensory modalities to determine the emotional state of others. In fact, such multisensory perception may be one of the mechanisms explaining the ease and efficiency by which others' emotions are recognized. But how and when exactly do the different modalities interact? One aspect in multisensory perception that has received increasing interest in recent years is the concept of cross-modal prediction. In emotion perception, as in most other settings, visual information precedes the auditory information. Thereby, leading in visual information can facilitate subsequent auditory processing. While this mechanism has often been described in audiovisual speech perception, so far it has not been addressed in audiovisual emotion perception. Based on the current state of the art in (a) cross-modal prediction and (b) multisensory emotion perception research, we propose that it is essential to consider the former in order to fully understand the latter. Focusing on electroencephalographic (EEG) and magnetoencephalographic (MEG) studies, we provide a brief overview of the current research in both fields. In discussing these findings, we suggest that emotional visual information may allow more reliable predicting of auditory information compared to non-emotional visual information. In support of this hypothesis, we present a re-analysis of a previous data set that shows an inverse correlation between the N1 EEG response and the duration of visual emotional, but not non-emotional information. If the assumption that emotional content allows more reliable predicting can be corroborated in future studies, cross-modal prediction is a crucial factor in our understanding of multisensory emotion perception.
Frontal and temporal contributions to understanding the iconic co-speech gestures that accompany speech.

Science.gov (United States)

Dick, Anthony Steven; Mok, Eva H; Raja Beharelle, Anjali; Goldin-Meadow, Susan; Small, Steven L

2014-03-01

In everyday conversation, listeners often rely on a speaker's gestures to clarify any ambiguities in the verbal message. Using fMRI during naturalistic story comprehension, we examined which brain regions in the listener are sensitive to speakers' iconic gestures. We focused on iconic gestures that contribute information not found in the speaker's talk, compared with those that convey information redundant with the speaker's talk. We found that three regions-left inferior frontal gyrus triangular (IFGTr) and opercular (IFGOp) portions, and left posterior middle temporal gyrus (MTGp)--responded more strongly when gestures added information to nonspecific language, compared with when they conveyed the same information in more specific language; in other words, when gesture disambiguated speech as opposed to reinforced it. An increased BOLD response was not found in these regions when the nonspecific language was produced without gesture, suggesting that IFGTr, IFGOp, and MTGp are involved in integrating semantic information across gesture and speech. In addition, we found that activity in the posterior superior temporal sulcus (STSp), previously thought to be involved in gesture-speech integration, was not sensitive to the gesture-speech relation. Together, these findings clarify the neurobiology of gesture-speech integration and contribute to an emerging picture of how listeners glean meaning from gestures that accompany speech. Copyright © 2012 Wiley Periodicals, Inc.
36 CFR 1237.16 - How do agencies store audiovisual records?

Science.gov (United States)

2010-07-01

... audiovisual records? 1237.16 Section 1237.16 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC, AND RELATED RECORDS MANAGEMENT § 1237.16 How do agencies store audiovisual records? Agencies must maintain appropriate storage conditions for permanent...
A Catalan code of best practices for the audiovisual sector

OpenAIRE

Teodoro, Emma; Casanovas, Pompeu

2010-01-01

In spite of a new general law regarding Audiovisual Communication, the regulatory framework of the audiovisual sector in Spain can still be defined as huge, disperse and obsolete. The first part of this paper provides an overview of the major challenges of the Spanish audiovisual sector as a result of the convergence of platforms, services and operators, paying especial attention to the Audiovisual Sector in Catalonia. In the second part, we will present an example of self-regulation through...
Audiovisual Interval Size Estimation Is Associated with Early Musical Training.

Science.gov (United States)

Abel, Mary Kathryn; Li, H Charles; Russo, Frank A; Schlaug, Gottfried; Loui, Psyche

2016-01-01

Although pitch is a fundamental attribute of auditory perception, substantial individual differences exist in our ability to perceive differences in pitch. Little is known about how these individual differences in the auditory modality might affect crossmodal processes such as audiovisual perception. In this study, we asked whether individual differences in pitch perception might affect audiovisual perception, as it relates to age of onset and number of years of musical training. Fifty-seven subjects made subjective ratings of interval size when given point-light displays of audio, visual, and audiovisual stimuli of sung intervals. Audiovisual stimuli were divided into congruent and incongruent (audiovisual-mismatched) stimuli. Participants' ratings correlated strongly with interval size in audio-only, visual-only, and audiovisual-congruent conditions. In the audiovisual-incongruent condition, ratings correlated more with audio than with visual stimuli, particularly for subjects who had better pitch perception abilities and higher nonverbal IQ scores. To further investigate the effects of age of onset and length of musical training, subjects were divided into musically trained and untrained groups. Results showed that among subjects with musical training, the degree to which participants' ratings correlated with auditory interval size during incongruent audiovisual perception was correlated with both nonverbal IQ and age of onset of musical training. After partialing out nonverbal IQ, pitch discrimination thresholds were no longer associated with incongruent audio scores, whereas age of onset of musical training remained associated with incongruent audio scores. These findings invite future research on the developmental effects of musical training, particularly those relating to the process of audiovisual perception.
Audiovisual Interval Size Estimation Is Associated with Early Musical Training.

Directory of Open Access Journals (Sweden)

Mary Kathryn Abel

Full Text Available Although pitch is a fundamental attribute of auditory perception, substantial individual differences exist in our ability to perceive differences in pitch. Little is known about how these individual differences in the auditory modality might affect crossmodal processes such as audiovisual perception. In this study, we asked whether individual differences in pitch perception might affect audiovisual perception, as it relates to age of onset and number of years of musical training. Fifty-seven subjects made subjective ratings of interval size when given point-light displays of audio, visual, and audiovisual stimuli of sung intervals. Audiovisual stimuli were divided into congruent and incongruent (audiovisual-mismatched stimuli. Participants' ratings correlated strongly with interval size in audio-only, visual-only, and audiovisual-congruent conditions. In the audiovisual-incongruent condition, ratings correlated more with audio than with visual stimuli, particularly for subjects who had better pitch perception abilities and higher nonverbal IQ scores. To further investigate the effects of age of onset and length of musical training, subjects were divided into musically trained and untrained groups. Results showed that among subjects with musical training, the degree to which participants' ratings correlated with auditory interval size during incongruent audiovisual perception was correlated with both nonverbal IQ and age of onset of musical training. After partialing out nonverbal IQ, pitch discrimination thresholds were no longer associated with incongruent audio scores, whereas age of onset of musical training remained associated with incongruent audio scores. These findings invite future research on the developmental effects of musical training, particularly those relating to the process of audiovisual perception.

Integrating speech technology to meet crew station design requirements

Science.gov (United States)

Simpson, Carol A.; Ruth, John C.; Moore, Carolyn A.

The last two years have seen improvements in speech generation and speech recognition technology that make speech I/O for crew station controls and displays viable for operational systems. These improvements include increased robustness of algorithm performance in high levels of background noise, increased vocabulary size, improved performance in the connected speech mode, and less speaker dependence. This improved capability makes possible far more sophisticated user interface design than was possible with earlier technology. Engineering, linguistic, and human factors design issues are discussed in the context of current voice I/O technology performance.
Reduced audiovisual recalibration in the elderly.

Science.gov (United States)

Chan, Yu Man; Pianta, Michael J; McKendrick, Allison M

2014-01-01

Perceived synchrony of visual and auditory signals can be altered by exposure to a stream of temporally offset stimulus pairs. Previous literature suggests that adapting to audiovisual temporal offsets is an important recalibration to correctly combine audiovisual stimuli into a single percept across a range of source distances. Healthy aging results in synchrony perception over a wider range of temporally offset visual and auditory signals, independent of age-related unisensory declines in vision and hearing sensitivities. However, the impact of aging on audiovisual recalibration is unknown. Audiovisual synchrony perception for sound-lead and sound-lag stimuli was measured for 15 younger (22-32 years old) and 15 older (64-74 years old) healthy adults using a method-of-constant-stimuli, after adapting to a stream of visual and auditory pairs. The adaptation pairs were either synchronous or asynchronous (sound-lag of 230 ms). The adaptation effect for each observer was computed as the shift in the mean of the individually fitted psychometric functions after adapting to asynchrony. Post-adaptation to synchrony, the younger and older observers had average window widths (±standard deviation) of 326 (±80) and 448 (±105) ms, respectively. There was no adaptation effect for sound-lead pairs. Both the younger and older observers, however, perceived more sound-lag pairs as synchronous. The magnitude of the adaptation effect in the older observers was not correlated with how often they saw the adapting sound-lag stimuli as asynchronous. Our finding demonstrates that audiovisual synchrony perception adapts less with advancing age.
Suppression of the µ rhythm during speech and non-speech discrimination revealed by independent component analysis: implications for sensorimotor integration in speech processing.

Science.gov (United States)

Bowers, Andrew; Saltuklaroglu, Tim; Harkrider, Ashley; Cuellar, Megan

2013-01-01

Constructivist theories propose that articulatory hypotheses about incoming phonetic targets may function to enhance perception by limiting the possibilities for sensory analysis. To provide evidence for this proposal, it is necessary to map ongoing, high-temporal resolution changes in sensorimotor activity (i.e., the sensorimotor μ rhythm) to accurate speech and non-speech discrimination performance (i.e., correct trials.). Sixteen participants (15 female and 1 male) were asked to passively listen to or actively identify speech and tone-sweeps in a two-force choice discrimination task while the electroencephalograph (EEG) was recorded from 32 channels. The stimuli were presented at signal-to-noise ratios (SNRs) in which discrimination accuracy was high (i.e., 80-100%) and low SNRs producing discrimination performance at chance. EEG data were decomposed using independent component analysis and clustered across participants using principle component methods in EEGLAB. ICA revealed left and right sensorimotor µ components for 14/16 and 13/16 participants respectively that were identified on the basis of scalp topography, spectral peaks, and localization to the precentral and postcentral gyri. Time-frequency analysis of left and right lateralized µ component clusters revealed significant (pFDRspeech discrimination trials relative to chance trials following stimulus offset. Findings are consistent with constructivist, internal model theories proposing that early forward motor models generate predictions about likely phonemic units that are then synthesized with incoming sensory cues during active as opposed to passive processing. Future directions and possible translational value for clinical populations in which sensorimotor integration may play a functional role are discussed.
Audiovisual integration of emotional signals from music improvisation does not depend on temporal correspondence.

Science.gov (United States)

Petrini, Karin; McAleer, Phil; Pollick, Frank

2010-04-06

In the present study we applied a paradigm often used in face-voice affect perception to solo music improvisation to examine how the emotional valence of sound and gesture are integrated when perceiving an emotion. Three brief excerpts expressing emotion produced by a drummer and three by a saxophonist were selected. From these bimodal congruent displays the audio-only, visual-only, and audiovisually incongruent conditions (obtained by combining the two signals both within and between instruments) were derived. In Experiment 1 twenty musical novices judged the perceived emotion and rated the strength of each emotion. The results indicate that sound dominated the visual signal in the perception of affective expression, though this was more evident for the saxophone. In Experiment 2 a further sixteen musical novices were asked to either pay attention to the musicians' movements or to the sound when judging the perceived emotions. The results showed no effect of visual information when judging the sound. On the contrary, when judging the emotional content of the visual information, a worsening in performance was obtained for the incongruent condition that combined different emotional auditory and visual information for the same instrument. The effect of emotionally discordant information thus became evident only when the auditory and visual signals belonged to the same categorical event despite their temporal mismatch. This suggests that the integration of emotional information may be reinforced by its semantic attributes but might be independent from temporal features. Copyright 2010 Elsevier B.V. All rights reserved.
Tackling the complexity in speech

DEFF Research Database (Denmark)

section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...
Audiovisual functional magnetic resonance imaging adaptation reveals multisensory integration effects in object-related sensory cortices.

Science.gov (United States)

Doehrmann, Oliver; Weigelt, Sarah; Altmann, Christian F; Kaiser, Jochen; Naumer, Marcus J

2010-03-03

Information integration across different sensory modalities contributes to object recognition, the generation of associations and long-term memory representations. Here, we used functional magnetic resonance imaging adaptation to investigate the presence of sensory integrative effects at cortical levels as early as nonprimary auditory and extrastriate visual cortices, which are implicated in intermediate stages of object processing. Stimulation consisted of an adapting audiovisual stimulus S(1) and a subsequent stimulus S(2) from the same basic-level category (e.g., cat). The stimuli were carefully balanced with respect to stimulus complexity and semantic congruency and presented in four experimental conditions: (1) the same image and vocalization for S(1) and S(2), (2) the same image and a different vocalization, (3) different images and the same vocalization, or (4) different images and vocalizations. This two-by-two factorial design allowed us to assess the contributions of auditory and visual stimulus repetitions and changes in a statistically orthogonal manner. Responses in visual regions of right fusiform gyrus and right lateral occipital cortex were reduced for repeated visual stimuli (repetition suppression). Surprisingly, left lateral occipital cortex showed stronger responses to repeated auditory stimuli (repetition enhancement). Similarly, auditory regions of interest of the right middle superior temporal gyrus and sulcus exhibited repetition suppression to auditory repetitions and repetition enhancement to visual repetitions. Our findings of crossmodal repetition-related effects in cortices of the respective other sensory modality add to the emerging view that in human subjects sensory integrative mechanisms operate on earlier cortical processing levels than previously assumed.
EXPLICITATION AND ADDITION TECHNIQUES IN AUDIOVISUAL TRANSLATION: A MULTIMODAL APPROACH OF ENGLISHINDONESIAN SUBTITLES

Directory of Open Access Journals (Sweden)

Ichwan Suyudi

2017-12-01

Full Text Available In audiovisual translation, the multimodality of the audiovisual text is both a challenge and a resource for subtitlers. This paper illustrates how multi-modes provide information that helps subtitlers to gain a better understanding of meaning-making practices that will influence them to make a decision-making in translating a certain verbal text. Subtitlers may explicit, add, and condense the texts based on the multi-modes as seen on the visual frames. Subtitlers have to consider the distribution and integration of the meanings of multi-modes in order to create comprehensive equivalence between the source and target texts. Excerpts of visual frames in this paper are taken from English films Forrest Gump (drama, 1996, and James Bond (thriller, 2010.
Audiovisual signs and information science: an evaluation

Directory of Open Access Journals (Sweden)

Jalver Bethônico

2006-12-01

Full Text Available This work evaluates the relationship of Information Science with audiovisual signs, pointing out conceptual limitations, difficulties imposed by the verbal fundament of knowledge, the reduced use within libraries and the ways in the direction of a more consistent analysis of the audiovisual means, supported by the semiotics of Charles Peirce.
Subjective Evaluation of Audiovisual Signals

Directory of Open Access Journals (Sweden)

F. Fikejz

2010-01-01

Full Text Available This paper deals with subjective evaluation of audiovisual signals, with emphasis on the interaction between acoustic and visual quality. The subjective test is realized by a simple rating method. The audiovisual signal used in this test is a combination of images compressed by JPEG compression codec and sound samples compressed by MPEG-1 Layer III. Images and sounds have various contents. It simulates a real situation when the subject listens to compressed music and watches compressed pictures without the access to original, i.e. uncompressed signals.
Audiovisual Modulation in Mouse Primary Visual Cortex Depends on Cross-Modal Stimulus Configuration and Congruency.

Science.gov (United States)

Meijer, Guido T; Montijn, Jorrit S; Pennartz, Cyriel M A; Lansink, Carien S

2017-09-06

The sensory neocortex is a highly connected associative network that integrates information from multiple senses, even at the level of the primary sensory areas. Although a growing body of empirical evidence supports this view, the neural mechanisms of cross-modal integration in primary sensory areas, such as the primary visual cortex (V1), are still largely unknown. Using two-photon calcium imaging in awake mice, we show that the encoding of audiovisual stimuli in V1 neuronal populations is highly dependent on the features of the stimulus constituents. When the visual and auditory stimulus features were modulated at the same rate (i.e., temporally congruent), neurons responded with either an enhancement or suppression compared with unisensory visual stimuli, and their prevalence was balanced. Temporally incongruent tones or white-noise bursts included in audiovisual stimulus pairs resulted in predominant response suppression across the neuronal population. Visual contrast did not influence multisensory processing when the audiovisual stimulus pairs were congruent; however, when white-noise bursts were used, neurons generally showed response suppression when the visual stimulus contrast was high whereas this effect was absent when the visual contrast was low. Furthermore, a small fraction of V1 neurons, predominantly those located near the lateral border of V1, responded to sound alone. These results show that V1 is involved in the encoding of cross-modal interactions in a more versatile way than previously thought. SIGNIFICANCE STATEMENT The neural substrate of cross-modal integration is not limited to specialized cortical association areas but extends to primary sensory areas. Using two-photon imaging of large groups of neurons, we show that multisensory modulation of V1 populations is strongly determined by the individual and shared features of cross-modal stimulus constituents, such as contrast, frequency, congruency, and temporal structure. Congruent
The Fungible Audio-Visual Mapping and its Experience

Directory of Open Access Journals (Sweden)

Adriana Sa

2014-12-01

Full Text Available This article draws a perceptual approach to audio-visual mapping. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping. Our aim here is to glean its constitution and aspect. We will report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, after which we pose quantitative and qualitative questions regarding their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole.
Parametric packet-based audiovisual quality model for IPTV services

CERN Document Server

Garcia, Marie-Neige

2014-01-01

This volume presents a parametric packet-based audiovisual quality model for Internet Protocol TeleVision (IPTV) services. The model is composed of three quality modules for the respective audio, video and audiovisual components. The audio and video quality modules take as input a parametric description of the audiovisual processing path, and deliver an estimate of the audio and video quality. These outputs are sent to the audiovisual quality module which provides an estimate of the audiovisual quality. Estimates of perceived quality are typically used both in the network planning phase and as part of the quality monitoring. The same audio quality model is used for both these phases, while two variants of the video quality model have been developed for addressing the two application scenarios. The addressed packetization scheme is MPEG2 Transport Stream over Real-time Transport Protocol over Internet Protocol. In the case of quality monitoring, that is the case for which the network is already set-up, the aud...
Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm.

Science.gov (United States)

Chen, Yung-Yue

2018-05-08

Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H ₂ estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.
Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm

Directory of Open Access Journals (Sweden)

Yung-Yue Chen

2018-05-01

Full Text Available Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H2 estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.
Audiovisual interpretative skills: between textual culture and formalized literacy

Directory of Open Access Journals (Sweden)

Estefanía Jiménez, Ph. D.

2010-01-01

Full Text Available This paper presents the results of a study on the process of acquiring interpretative skills to decode audiovisual texts among adolescents and youth. Based on the conception of such competence as the ability to understand the meanings connoted beneath the literal discourses of audiovisual texts, this study compared two variables: the acquisition of such skills from the personal and social experience in the consumption of audiovisual products (which is affected by age difference, and, on the second hand, the differences marked by the existence of formalized processes of media literacy. Based on focus groups of young students, the research assesses the existing academic debate about these processes of acquiring skills to interpret audiovisual materials.
Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

DEFF Research Database (Denmark)

Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

2017-01-01

Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....
Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

DEFF Research Database (Denmark)

Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

2018-01-01

Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....
Quality models for audiovisual streaming

Science.gov (United States)

Thang, Truong Cong; Kim, Young Suk; Kim, Cheon Seog; Ro, Yong Man

2006-01-01

Quality is an essential factor in multimedia communication, especially in compression and adaptation. Quality metrics can be divided into three categories: within-modality quality, cross-modality quality, and multi-modality quality. Most research has so far focused on within-modality quality. Moreover, quality is normally just considered from the perceptual perspective. In practice, content may be drastically adapted, even converted to another modality. In this case, we should consider the quality from semantic perspective as well. In this work, we investigate the multi-modality quality from the semantic perspective. To model the semantic quality, we apply the concept of "conceptual graph", which consists of semantic nodes and relations between the nodes. As an typical of multi-modality example, we focus on audiovisual streaming service. Specifically, we evaluate the amount of information conveyed by a audiovisual content where both video and audio channels may be strongly degraded, even audio are converted to text. In the experiments, we also consider the perceptual quality model of audiovisual content, so as to see the difference with semantic quality model.
Audiovisual Archive Exploitation in the Networked Information Society

NARCIS (Netherlands)

Ordelman, Roeland J.F.

2011-01-01

Safeguarding the massive body of audiovisual content, including rich music collections, in audiovisual archives and enabling access for various types of user groups is a prerequisite for unlocking the social-economic value of these collections. Data quantities and the need for specific content
A randomized controlled trial on the beneficial effects of training letter-speech sound integration on reading fluency in children with dyslexia

NARCIS (Netherlands)

Fraga González, G.; Žarić, G.; Tijms, J.; Bonte, M.; Blomert, L.; van der Molen, M.W.

2015-01-01

A recent account of dyslexia assumes that a failure to develop automated letter-speech sound integration might be responsible for the observed lack of reading fluency. This study uses a pre-test-training-post-test design to evaluate the effects of a training program based on letter-speech sound

Effects of Audiovisual Media on L2 Listening Comprehension: A Preliminary Study in French

Science.gov (United States)

Becker, Shannon R.; Sturm, Jessica L.

2017-01-01

The purpose of the present study was to determine whether integrating online audiovisual materials into the listening instruction of L2 French learners would have a measurable impact on their listening comprehension development. Students from two intact sections of second-semester French were tested on their listening comprehension before and…
Decision-level fusion for audio-visual laughter detection

NARCIS (Netherlands)

Reuderink, B.; Poel, M.; Truong, K.; Poppe, R.; Pantic, M.

2008-01-01

Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is
Use of Audiovisual Texts in University Education Process

Science.gov (United States)

Aleksandrov, Evgeniy P.

2014-01-01

Audio-visual learning technologies offer great opportunities in the development of students' analytical and projective abilities. These technologies can be used in classroom activities and for homework. This article discusses the features of audiovisual media texts use in a series of social sciences and humanities in the University curriculum.
Severe Multisensory Speech Integration Deficits in High-Functioning School-Aged Children with Autism Spectrum Disorder (ASD) and Their Resolution During Early Adolescence

Science.gov (United States)

Foxe, John J.; Molholm, Sophie; Del Bene, Victor A.; Frey, Hans-Peter; Russo, Natalie N.; Blanco, Daniella; Saint-Amour, Dave; Ross, Lars A.

2015-01-01

Under noisy listening conditions, visualizing a speaker's articulations substantially improves speech intelligibility. This multisensory speech integration ability is crucial to effective communication, and the appropriate development of this capacity greatly impacts a child's ability to successfully navigate educational and social settings. Research shows that multisensory integration abilities continue developing late into childhood. The primary aim here was to track the development of these abilities in children with autism, since multisensory deficits are increasingly recognized as a component of the autism spectrum disorder (ASD) phenotype. The abilities of high-functioning ASD children (n = 84) to integrate seen and heard speech were assessed cross-sectionally, while environmental noise levels were systematically manipulated, comparing them with age-matched neurotypical children (n = 142). Severe integration deficits were uncovered in ASD, which were increasingly pronounced as background noise increased. These deficits were evident in school-aged ASD children (5–12 year olds), but were fully ameliorated in ASD children entering adolescence (13–15 year olds). The severity of multisensory deficits uncovered has important implications for educators and clinicians working in ASD. We consider the observation that the multisensory speech system recovers substantially in adolescence as an indication that it is likely amenable to intervention during earlier childhood, with potentially profound implications for the development of social communication abilities in ASD children. PMID:23985136
Knowledge Generated by Audiovisual Narrative Action Research Loops

Science.gov (United States)

Bautista Garcia-Vera, Antonio

2012-01-01

We present data collected from the research project funded by the Ministry of Education and Science of Spain entitled "Audiovisual Narratives and Intercultural Relations in Education." One of the aims of the research was to determine the nature of thought processes occurring during audiovisual narratives. We studied the possibility of…
Audiovisual Association Learning in the Absence of Primary Visual Cortex.

Science.gov (United States)

Seirafi, Mehrdad; De Weerd, Peter; Pegna, Alan J; de Gelder, Beatrice

2015-01-01

Learning audiovisual associations is mediated by the primary cortical areas; however, recent animal studies suggest that such learning can take place even in the absence of the primary visual cortex. Other studies have demonstrated the involvement of extra-geniculate pathways and especially the superior colliculus (SC) in audiovisual association learning. Here, we investigated such learning in a rare human patient with complete loss of the bilateral striate cortex. We carried out an implicit audiovisual association learning task with two different colors of red and purple (the latter color known to minimally activate the extra-genicular pathway). Interestingly, the patient learned the association between an auditory cue and a visual stimulus only when the unseen visual stimulus was red, but not when it was purple. The current study presents the first evidence showing the possibility of audiovisual association learning in humans with lesioned striate cortex. Furthermore, in line with animal studies, it supports an important role for the SC in audiovisual associative learning.
Long-term music training modulates the recalibration of audiovisual simultaneity.

Science.gov (United States)

Jicol, Crescent; Proulx, Michael J; Pollick, Frank E; Petrini, Karin

2018-07-01

To overcome differences in physical transmission time and neural processing, the brain adaptively recalibrates the point of simultaneity between auditory and visual signals by adapting to audiovisual asynchronies. Here, we examine whether the prolonged recalibration process of passively sensed visual and auditory signals is affected by naturally occurring multisensory training known to enhance audiovisual perceptual accuracy. Hence, we asked a group of drummers, of non-drummer musicians and of non-musicians to judge the audiovisual simultaneity of musical and non-musical audiovisual events, before and after adaptation with two fixed audiovisual asynchronies. We found that the recalibration for the musicians and drummers was in the opposite direction (sound leading vision) to that of non-musicians (vision leading sound), and change together with both increased music training and increased perceptual accuracy (i.e. ability to detect asynchrony). Our findings demonstrate that long-term musical training reshapes the way humans adaptively recalibrate simultaneity between auditory and visual signals.
The audiovisual communication policy of the socialist Government (2004-2009: A neoliberal turn

Directory of Open Access Journals (Sweden)

Ramón Zallo, Ph. D.

2010-01-01

Full Text Available The first legislature of Jose Luis Rodriguez Zapatero’s government (2004-08 generated important initiatives for some progressive changes in the public communicative system. However, all of these initiatives have been dissolving in the second legislature to give way to a non-regulated and privatizing model that is detrimental to the public service. Three phases can be distinguished, even temporarily: the first one is characterized by interesting reforms; followed by contradictory reforms and, in the second legislature, an accumulation of counter reforms, that lead the system towards a communicative system model completely different from the one devised in the first legislature. This indicates that there has been not one but two different audiovisual policies running the cyclical route of the audiovisual policy from one end to the other. The emphasis has changed from the public service to private concentration; from decentralization to centralization; from the diffusion of knowledge to the accumulation and appropriation of the cognitive capital; from the Keynesian model - combined with the Schumpeterian model and a preference for social access - to a delayed return to the neoliberal model, after having distorted the market through public decisions in the benefit of the most important audiovisual services providers. All this seems to crystallize the impressive process of concentration occurring between audiovisual services providers in two large groups that would be integrated by Mediaset and Sogecable and - in negotiations - between Antena 3 and Imagina. A combination of neo-statist restructuring of the market and neo-liberalism.
Gestión documental de la información audiovisual deportiva en las televisiones generalistas Documentary management of the sport audio-visual information in the generalist televisions

Directory of Open Access Journals (Sweden)

Jorge Caldera Serrano

2005-01-01

Full Text Available Se analiza la gestión de la información audiovisual deportiva en el marco de los Sistemas de Información Documental de las cadenas estatales, zonales y locales. Para ello se realiza un realiza un recorrido por la cadena documental que realiza la información audiovisual deportiva con el fin de ir analizando cada uno de los parámetros, mostrando así una serie de recomendaciones y normativas para la confección del registro audiovisual deportivo. Evidentemente la documentación deportiva audiovisual no se diferencia en exceso del análisis de otros tipos documentales televisivos por lo que se lleva a cabo una profundización yampliación de su gestión y difusión, mostrando el flujo informacional dentro del Sistema.The management of the sport audio-visual documentation of the Information Systems of the state, zonal and local chains is analyzed within the framework. For it it is made makes a route by the documentary chain that makes the sport audio-visual information with the purpose of being analyzing each one of the parameters, showing therefore a series of recommendations and norms for the preparation of the sport audio-visual registry. Evidently the audio-visual sport documentation difference in excess of the analysis of other televising documentary types reason why is not carried out a deepening and extension of its management and diffusion, showing the informational flow within the System.
Challenges and opportunities for audiovisual diversity in the Internet

Directory of Open Access Journals (Sweden)

Trinidad García Leiva

2017-06-01

Full Text Available http://dx.doi.org/10.5007/2175-7984.2017v16n35p132 At the gates of the first quarter of the XXI century, nobody doubts the fact that the value chain of the audiovisual industry has suffered important transformations. The digital era presents opportunities for cultural enrichment as well as displays new challenges. After presenting a general portray of the audiovisual industries in the digital era, taking as a point of departure the Spanish case and paying attention to players and logics in tension, this paper will present some notes about the advantages and disadvantages that exist for the diversity of audiovisual production, distribution and consumption online. It is here sustained that the diversity of the audiovisual sector online is not guaranteed because the formula that has made some players successful and powerful is based on walled-garden models to monetize contents (which, besides, add restrictions to their reproduction and circulation by and among consumers. The final objective is to present some ideas about the elements that prevent the strengthening of the diversity of the audiovisual industry in the digital scenario. Barriers to overcome are classified as technological, financial, social, legal and political.
An Instrumented Glove for Control Audiovisual Elements in Performing Arts

Directory of Open Access Journals (Sweden)

Rafael Tavares

2018-02-01

Full Text Available The use of cutting-edge technologies such as wearable devices to control reactive audiovisual systems are rarely applied in more conventional stage performances, such as opera performances. This work reports a cross-disciplinary approach for the research and development of the WMTSensorGlove, a data-glove used in an opera performance to control audiovisual elements on stage through gestural movements. A system architecture of the interaction between the wireless wearable device and the different audiovisual systems is presented, taking advantage of the Open Sound Control (OSC protocol. The developed wearable system was used as audiovisual controller in “As sete mulheres de Jeremias Epicentro”, a portuguese opera by Quarteto Contratempus, which was premiered in September 2017.
Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers

Directory of Open Access Journals (Sweden)

Juan Zhang

2018-05-01

Full Text Available The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration.
Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers

Science.gov (United States)

Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen

2018-01-01

The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration. PMID:29780312
A Network Model of Observation and Imitation of Speech

Science.gov (United States)

Mashal, Nira; Solodkin, Ana; Dick, Anthony Steven; Chen, E. Elinor; Small, Steven L.

2012-01-01

Much evidence has now accumulated demonstrating and quantifying the extent of shared regional brain activation for observation and execution of speech. However, the nature of the actual networks that implement these functions, i.e., both the brain regions and the connections among them, and the similarities and differences across these networks has not been elucidated. The current study aims to characterize formally a network for observation and imitation of syllables in the healthy adult brain and to compare their structure and effective connectivity. Eleven healthy participants observed or imitated audiovisual syllables spoken by a human actor. We constructed four structural equation models to characterize the networks for observation and imitation in each of the two hemispheres. Our results show that the network models for observation and imitation comprise the same essential structure but differ in important ways from each other (in both hemispheres) based on connectivity. In particular, our results show that the connections from posterior superior temporal gyrus and sulcus to ventral premotor, ventral premotor to dorsal premotor, and dorsal premotor to primary motor cortex in the left hemisphere are stronger during imitation than during observation. The first two connections are implicated in a putative dorsal stream of speech perception, thought to involve translating auditory speech signals into motor representations. Thus, the current results suggest that flow of information during imitation, starting at the posterior superior temporal cortex and ending in the motor cortex, enhances input to the motor cortex in the service of speech execution. PMID:22470360
Bayesian calibration of simultaneity in audiovisual temporal order judgments.

Directory of Open Access Journals (Sweden)

Shinya Yamamoto

Full Text Available After repeated exposures to two successive audiovisual stimuli presented in one frequent order, participants eventually perceive a pair separated by some lag time in the same order as occurring simultaneously (lag adaptation. In contrast, we previously found that perceptual changes occurred in the opposite direction in response to tactile stimuli, conforming to bayesian integration theory (bayesian calibration. We further showed, in theory, that the effect of bayesian calibration cannot be observed when the lag adaptation was fully operational. This led to the hypothesis that bayesian calibration affects judgments regarding the order of audiovisual stimuli, but that this effect is concealed behind the lag adaptation mechanism. In the present study, we showed that lag adaptation is pitch-insensitive using two sounds at 1046 and 1480 Hz. This enabled us to cancel lag adaptation by associating one pitch with sound-first stimuli and the other with light-first stimuli. When we presented each type of stimulus (high- or low-tone in a different block, the point of simultaneity shifted to "sound-first" for the pitch associated with sound-first stimuli, and to "light-first" for the pitch associated with light-first stimuli. These results are consistent with lag adaptation. In contrast, when we delivered each type of stimulus in a randomized order, the point of simultaneity shifted to "light-first" for the pitch associated with sound-first stimuli, and to "sound-first" for the pitch associated with light-first stimuli. The results clearly show that bayesian calibration is pitch-specific and is at work behind pitch-insensitive lag adaptation during temporal order judgment of audiovisual stimuli.
Contributions of Letter-Speech Sound Learning and Visual Print Tuning to Reading Improvement: Evidence from Brain Potential and Dyslexia Training Studies

Directory of Open Access Journals (Sweden)

Gorka Fraga González

2017-01-01

Full Text Available We use a neurocognitive perspective to discuss the contribution of learning letter-speech sound (L-SS associations and visual specialization in the initial phases of reading in dyslexic children. We review findings from associative learning studies on related cognitive skills important for establishing and consolidating L-SS associations. Then we review brain potential studies, including our own, that yielded two markers associated with reading fluency. Here we show that the marker related to visual specialization (N170 predicts word and pseudoword reading fluency in children who received additional practice in the processing of morphological word structure. Conversely, L-SS integration (indexed by mismatch negativity (MMN may only remain important when direct orthography to semantic conversion is not possible, such as in pseudoword reading. In addition, the correlation between these two markers supports the notion that multisensory integration facilitates visual specialization. Finally, we review the role of implicit learning and executive functions in audiovisual learning in dyslexia. Implications for remedial research are discussed and suggestions for future studies are presented.
Documentary management of the sport audio-visual information in the generalist televisions

OpenAIRE

Jorge Caldera Serrano; Felipe Alonso

2007-01-01

The management of the sport audio-visual documentation of the Information Systems of the state, zonal and local chains is analyzed within the framework. For it it is made makes a route by the documentary chain that makes the sport audio-visual information with the purpose of being analyzing each one of the parameters, showing therefore a series of recommendations and norms for the preparation of the sport audio-visual registry. Evidently the audio-visual sport documentation difference i...
Audio-Visual Integration Modifies Emotional Judgment in Music

Directory of Open Access Journals (Sweden)

Shen-Yuan Su

2011-10-01

Full Text Available The conventional view that perceived emotion in music is derived mainly from auditory signals has led to neglect of the contribution of visual image. In this study, we manipulated mode (major vs. minor and examined the influence of a video image on emotional judgment in music. Melodies in either major or minor mode were controlled for tempo and rhythm and played to the participants. We found that Taiwanese participants, like Westerners, judged major melodies as expressing positive, and minor melodies negative, emotions. The major or minor melodies were then paired with video images of the singers, which were either emotionally congruent or incongruent with their modes. Results showed that participants perceived stronger positive or negative emotions with congruent audio-visual stimuli. Compared to listening to music alone, stronger emotions were perceived when an emotionally congruent video image was added and weaker emotions were perceived when an incongruent image was added. We therefore demonstrate that mode is important to perceive the emotional valence in music and that treating musical art as a purely auditory event might lose the enhanced emotional strength perceived in music, since going to a concert may lead to stronger perceived emotion than listening to the CD at home.
Narrativa audiovisual. Estrategias y recursos [Reseña

OpenAIRE

Cuenca Jaramillo, María Dolores

2011-01-01

Reseña del libro "Narrativa audiovisual. Estrategias y recursos" de Fernando Canet y Josep Prósper. Cuenca Jaramillo, MD. (2011). Narrativa audiovisual. Estrategias y recursos [Reseña]. Vivat Academia. Revista de Comunicación. Año XIV(117):125-130. http://hdl.handle.net/10251/46210 Senia 125 130 Año XIV 117
Audio-visual biofeedback for respiratory-gated radiotherapy: Impact of audio instruction and audio-visual biofeedback on respiratory-gated radiotherapy

International Nuclear Information System (INIS)

George, Rohini; Chung, Theodore D.; Vedam, Sastry S.; Ramakrishnan, Viswanathan; Mohan, Radhe; Weiss, Elisabeth; Keall, Paul J.

2006-01-01

Purpose: Respiratory gating is a commercially available technology for reducing the deleterious effects of motion during imaging and treatment. The efficacy of gating is dependent on the reproducibility within and between respiratory cycles during imaging and treatment. The aim of this study was to determine whether audio-visual biofeedback can improve respiratory reproducibility by decreasing residual motion and therefore increasing the accuracy of gated radiotherapy. Methods and Materials: A total of 331 respiratory traces were collected from 24 lung cancer patients. The protocol consisted of five breathing training sessions spaced about a week apart. Within each session the patients initially breathed without any instruction (free breathing), with audio instructions and with audio-visual biofeedback. Residual motion was quantified by the standard deviation of the respiratory signal within the gating window. Results: Audio-visual biofeedback significantly reduced residual motion compared with free breathing and audio instruction. Displacement-based gating has lower residual motion than phase-based gating. Little reduction in residual motion was found for duty cycles less than 30%; for duty cycles above 50% there was a sharp increase in residual motion. Conclusions: The efficiency and reproducibility of gating can be improved by: incorporating audio-visual biofeedback, using a 30-50% duty cycle, gating during exhalation, and using displacement-based gating

Media Aid Beyond the Factual: Culture, Development, and Audiovisual Assistance

Directory of Open Access Journals (Sweden)

Benjamin A. J. Pearson

2015-01-01

Full Text Available This paper discusses audiovisual assistance, a form of development aid that focuses on the production and distribution of cultural and entertainment media such as fictional films and TV shows. While the first audiovisual assistance program dates back to UNESCO’s International Fund for the Promotion of Culture in the 1970s, the past two decades have seen a proliferation of audiovisual assistance that, I argue, is related to a growing concern for culture in post-2015 global development agendas. In this paper, I examine the aims and motivations behind the EU’s audiovisual assistance programs to countries in the Global South, using data from policy documents and semi-structured, in-depth interviews with Program Managers and administrative staff in Brussels. These programs prioritize forms of audiovisual content that are locally specific, yet globally tradable. Furthermore, I argue that they have an ambivalent relationship with traditional notions of international development, one that conceptualizes media not only as a means to achieve economic development and human rights aims, but as a form of development itself.
Gestión documental de la información audiovisual deportiva en las televisiones generalistas
Documentary management of the sport audio-visual information in the generalist televisions

OpenAIRE

Jorge Caldera Serrano; Felipe Zapico Alonso

2005-01-01

Se analiza la gestión de la información audiovisual deportiva en el marco de los Sistemas de Información Documental de las cadenas estatales, zonales y locales. Para ello se realiza un realiza un recorrido por la cadena documental que realiza la información audiovisual deportiva con el fin de ir analizando cada uno de los parámetros, mostrando así una serie de recomendaciones y normativas para la confección del registro audiovisual deportivo. Evidentemente la documentación deportiva audiovisu...
Parametric Representation of the Speaker's Lips for Multimodal Sign Language and Speech Recognition

Science.gov (United States)

Ryumin, D.; Karpov, A. A.

2017-05-01

In this article, we propose a new method for parametric representation of human's lips region. The functional diagram of the method is described and implementation details with the explanation of its key stages and features are given. The results of automatic detection of the regions of interest are illustrated. A speed of the method work using several computers with different performances is reported. This universal method allows applying parametrical representation of the speaker's lipsfor the tasks of biometrics, computer vision, machine learning, and automatic recognition of face, elements of sign languages, and audio-visual speech, including lip-reading.
Trigger videos on the Web: Impact of audiovisual design

NARCIS (Netherlands)

Verleur, R.; Heuvelman, A.; Verhagen, Pleunes Willem

2011-01-01

Audiovisual design might impact emotional responses, as studies from the 1970s and 1980s on movie and television content show. Given today's abundant presence of web-based videos, this study investigates whether audiovisual design will impact web-video content in a similar way. The study is
Mujeres e industria audiovisual hoy: Involución, experimentación y nuevos modelos narrativos Women and the audiovisual (industry today: regression, experiment and new narrative models

Directory of Open Access Journals (Sweden)

Ana MARTÍNEZ-COLLADO MARTÍNEZ

2011-07-01

Full Text Available Este artículo analiza las prácticas artísticas audiovisuales en el contexto actual. Describe, en primer lugar, el proceso de involución de las prácticas audiovisuales realizadas por mujeres artistas. Las mujeres no están presentes ni como productoras, ni realizadoras, ni como ejecutivas de la industria audiovisual de tal manera que inevitablemente se reconstruyen y refuerzan los estereotipos tradicionales de género. A continuación el artículo se aproxima a la práctica artística audiovisual feminista en la década de los 70 y 80. Tomar la cámara se hizo absolutamente necesario no sólo para dar voz a muchas mujeres. Era necesario reinscribir los discursos ausentes y señalar un discurso crítico respecto a la representación cultural. Analiza, también, cómo estas prácticas a partir de la década de los 90 exploran nuevos modelos narrativos vinculados a las transformaciones de la subjetividad contemporánea, al tiempo que desarrollan su producción audiovisual en un “campo expandido” de exhibición. Por último, el artículo señala la relación de las prácticas feministas audiovisuales con el complejo territorio de la globalización y la sociedad de la información. La narración de la experiencia local ha encontrado en el audiovisual un medio privilegiado para señalar los problemas de la diferencia, la identidad, la raza y la etnicidad.This article analyses audiovisual art in the contemporary context. Firstly it describes the current regression of the role of women artists’ audiovisual practices. Women have little or no presence in the audiovisual industry as producers, filmmakers or executives, a condition that inevitably reconstitutes and reinforces traditional gender stereotypes. The article goes on to look at the feminist audiovisual practices of the nineteen seventies and eighties when women’s filmmaking became an absolutely necessity, not only to give voice to women but also to inscribe discourses found to be
Prácticas de producción audiovisual universitaria reflejadas en los trabajos presentados en la muestra audiovisual universitaria Ventanas 2005-2009

Directory of Open Access Journals (Sweden)

Maria Urbanczyk

2011-01-01

Full Text Available Este artículo presenta los resultados de la investigación realizada sobre la producción audiovisual universitaria en Colombia, a partir de los trabajos presentados en la muestra audiovisual Ventanas 2005-2009. El estudio de los trabajos trató de abarcar de la manera más completa posible el proceso de producción audiovisual que realizan los jóvenes universitarios, desde el nacimiento de la idea hasta el producto final, la circulación y la socialización. Se encontró que los temas más recurrentes son la violencia y los sentimientos, reflejados desde distintos géneros, tratamientos estéticos y abordajes conceptuales. Ante la ausencia de investigaciones que legitimen el saber que se produce en las aulas en cuanto al campo audiovisual en Colombia, esta investigación pretende abrir un camino para evidenciar el aporte que dejan los jóvenes en la consolidación de una narrativa nacional y en la preservación de la memoria del país.
[Accommodation effects of the audiovisual stimulation in the patients experiencing eyestrain with the concomitant disturbances of psychological adaptation].

Science.gov (United States)

Shakula, A V; Emel'ianov, G A

2014-01-01

The present study was designed to evaluate the effectiveness of audiovisual stimulation on the state of the eye accommodation system in the patients experiencing eyes train with the concomitant disturbances of psychological. It was shown that a course of audiovisual stimulation (seeing a psychorelaxing film accompanied by a proper music) results in positive (5.9-21.9%) dynamics of the objective accommodation parameters and of the subjective status (4.5-33.2%). Taken together, these findings whole allow this method to be regarded as "relaxing preparation" in the integral complex of the measures for the preservation of the professional vision in this group of the patients.
36 CFR 1237.18 - What are the environmental standards for audiovisual records storage?

Science.gov (United States)

2010-07-01

... standards for audiovisual records storage? 1237.18 Section 1237.18 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC, AND RELATED RECORDS MANAGEMENT § 1237.18 What are the environmental standards for audiovisual records storage? (a...
Haptic and Audio-visual Stimuli: Enhancing Experiences and Interaction

NARCIS (Netherlands)

Nijholt, Antinus; Dijk, Esko O.; Lemmens, Paul M.C.; Luitjens, S.B.

2010-01-01

The intention of the symposium on Haptic and Audio-visual stimuli at the EuroHaptics 2010 conference is to deepen the understanding of the effect of combined Haptic and Audio-visual stimuli. The knowledge gained will be used to enhance experiences and interactions in daily life. To this end, a
earGram Actors: An Interactive Audiovisual System Based on Social Behavior

Directory of Open Access Journals (Sweden)

Peter Beyls

2015-11-01

Full Text Available In multi-agent systems, local interactions among system components following relatively simple rules often result in complex overall systemic behavior. Complex behavioral and morphological patterns have been used to generate and organize audiovisual systems with artistic purposes. In this work, we propose to use the Actor model of social interactions to drive a concatenative synthesis engine called earGram in real time. The Actor model was originally developed to explore the emergence of complex visual patterns. On the other hand, earGram was originally developed to facilitate the creative exploration of concatenative sound synthesis. The integrated audiovisual system allows a human performer to interact with the system dynamics while receiving visual and auditory feedback. The interaction happens indirectly by disturbing the rules governing the social relationships amongst the actors, which results in a wide range of dynamic spatiotemporal patterns. A performer thus improvises within the behavioural scope of the system while evaluating the apparent connections between parameter values and actual complexity of the system output.
Action-outcome learning and prediction shape the window of simultaneity of audiovisual outcomes.

Science.gov (United States)

Desantis, Andrea; Haggard, Patrick

2016-08-01

To form a coherent representation of the objects around us, the brain must group the different sensory features composing these objects. Here, we investigated whether actions contribute in this grouping process. In particular, we assessed whether action-outcome learning and prediction contribute to audiovisual temporal binding. Participants were presented with two audiovisual pairs: one pair was triggered by a left action, and the other by a right action. In a later test phase, the audio and visual components of these pairs were presented at different onset times. Participants judged whether they were simultaneous or not. To assess the role of action-outcome prediction on audiovisual simultaneity, each action triggered either the same audiovisual pair as in the learning phase ('predicted' pair), or the pair that had previously been associated with the other action ('unpredicted' pair). We found the time window within which auditory and visual events appeared simultaneous increased for predicted compared to unpredicted pairs. However, no change in audiovisual simultaneity was observed when audiovisual pairs followed visual cues, rather than voluntary actions. This suggests that only action-outcome learning promotes temporal grouping of audio and visual effects. In a second experiment we observed that changes in audiovisual simultaneity do not only depend on our ability to predict what outcomes our actions generate, but also on learning the delay between the action and the multisensory outcome. When participants learned that the delay between action and audiovisual pair was variable, the window of audiovisual simultaneity for predicted pairs increased, relative to a fixed action-outcome pair delay. This suggests that participants learn action-based predictions of audiovisual outcome, and adapt their temporal perception of outcome events based on such predictions. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Top-Down Modulation of Auditory-Motor Integration during Speech Production: The Role of Working Memory.

Science.gov (United States)

Guo, Zhiqiang; Wu, Xiuqin; Li, Weifeng; Jones, Jeffery A; Yan, Nan; Sheft, Stanley; Liu, Peng; Liu, Hanjun

2017-10-25

Although working memory (WM) is considered as an emergent property of the speech perception and production systems, the role of WM in sensorimotor integration during speech processing is largely unknown. We conducted two event-related potential experiments with female and male young adults to investigate the contribution of WM to the neurobehavioural processing of altered auditory feedback during vocal production. A delayed match-to-sample task that required participants to indicate whether the pitch feedback perturbations they heard during vocalizations in test and sample sequences matched, elicited significantly larger vocal compensations, larger N1 responses in the left middle and superior temporal gyrus, and smaller P2 responses in the left middle and superior temporal gyrus, inferior parietal lobule, somatosensory cortex, right inferior frontal gyrus, and insula compared with a control task that did not require memory retention of the sequence of pitch perturbations. On the other hand, participants who underwent extensive auditory WM training produced suppressed vocal compensations that were correlated with improved auditory WM capacity, and enhanced P2 responses in the left middle frontal gyrus, inferior parietal lobule, right inferior frontal gyrus, and insula that were predicted by pretraining auditory WM capacity. These findings indicate that WM can enhance the perception of voice auditory feedback errors while inhibiting compensatory vocal behavior to prevent voice control from being excessively influenced by auditory feedback. This study provides the first evidence that auditory-motor integration for voice control can be modulated by top-down influences arising from WM, rather than modulated exclusively by bottom-up and automatic processes. SIGNIFICANCE STATEMENT One outstanding question that remains unsolved in speech motor control is how the mismatch between predicted and actual voice auditory feedback is detected and corrected. The present study
Trigger Videos on the Web: Impact of Audiovisual Design

Science.gov (United States)

Verleur, Ria; Heuvelman, Ard; Verhagen, Plon W.

2011-01-01

Audiovisual design might impact emotional responses, as studies from the 1970s and 1980s on movie and television content show. Given today's abundant presence of web-based videos, this study investigates whether audiovisual design will impact web-video content in a similar way. The study is motivated by the potential influence of video-evoked…
Gesture facilitates the syntactic analysis of speech

Directory of Open Access Journals (Sweden)

Henning eHolle

2012-03-01

Full Text Available Recent research suggests that the brain routinely binds together information from gesture and speech. However, most of this research focused on the integration of representational gestures with the semantic content of speech. Much less is known about how other aspects of gesture, such as emphasis, influence the interpretation of the syntactic relations in a spoken message. Here, we investigated whether beat gestures alter which syntactic structure is assigned to ambiguous spoken German sentences. The P600 component of the Event Related Brain Potential indicated that the more complex syntactic structure is easier to process when the speaker emphasizes the subject of a sentence with a beat. Thus, a simple flick of the hand can change our interpretation of who has been doing what to whom in a spoken sentence. We conclude that gestures and speech are an integrated system. Unlike previous studies, which have shown that the brain effortlessly integrates semantic information from gesture and speech, our study is the first to demonstrate that this integration also occurs for syntactic information. Moreover, the effect appears to be gesture-specific and was not found for other stimuli that draw attention to certain parts of speech, including prosodic emphasis, or a moving visual stimulus with the same trajectory as the gesture. This suggests that only visual emphasis produced with a communicative intention in mind (that is, beat gestures influences language comprehension, but not a simple visual movement lacking such an intention.
78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

Science.gov (United States)

2013-08-15

...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...
Impact of audio-visual storytelling in simulation learning experiences of undergraduate nursing students.

Science.gov (United States)

Johnston, Sandra; Parker, Christina N; Fox, Amanda

2017-09-01

Use of high fidelity simulation has become increasingly popular in nursing education to the extent that it is now an integral component of most nursing programs. Anecdotal evidence suggests that students have difficulty engaging with simulation manikins due to their unrealistic appearance. Introduction of the manikin as a 'real patient' with the use of an audio-visual narrative may engage students in the simulated learning experience and impact on their learning. A paucity of literature currently exists on the use of audio-visual narratives to enhance simulated learning experiences. This study aimed to determine if viewing an audio-visual narrative during a simulation pre-brief altered undergraduate nursing student perceptions of the learning experience. A quasi-experimental post-test design was utilised. A convenience sample of final year baccalaureate nursing students at a large metropolitan university. Participants completed a modified version of the Student Satisfaction with Simulation Experiences survey. This 12-item questionnaire contained questions relating to the ability to transfer skills learned in simulation to the real clinical world, the realism of the simulation and the overall value of the learning experience. Descriptive statistics were used to summarise demographic information. Two tailed, independent group t-tests were used to determine statistical differences within the categories. Findings indicated that students reported high levels of value, realism and transferability in relation to the viewing of an audio-visual narrative. Statistically significant results (t=2.38, psimulation to clinical practice. The subgroups of age and gender although not significant indicated some interesting results. High satisfaction with simulation was indicated by all students in relation to value and realism. There was a significant finding in relation to transferability on knowledge and this is vital to quality educational outcomes. Copyright © 2017. Published by
Effect of Simultaneous Bilingualism on Speech Intelligibility across Different Masker Types, Modalities, and Signal-to-Noise Ratios in School-Age Children.

Science.gov (United States)

Reetzke, Rachel; Lam, Boji Pak-Wing; Xie, Zilong; Sheng, Li; Chandrasekaran, Bharath

2016-01-01

Recognizing speech in adverse listening conditions is a significant cognitive, perceptual, and linguistic challenge, especially for children. Prior studies have yielded mixed results on the impact of bilingualism on speech perception in noise. Methodological variations across studies make it difficult to converge on a conclusion regarding the effect of bilingualism on speech-in-noise performance. Moreover, there is a dearth of speech-in-noise evidence for bilingual children who learn two languages simultaneously. The aim of the present study was to examine the extent to which various adverse listening conditions modulate differences in speech-in-noise performance between monolingual and simultaneous bilingual children. To that end, sentence recognition was assessed in twenty-four school-aged children (12 monolinguals; 12 simultaneous bilinguals, age of English acquisition ≤ 3 yrs.). We implemented a comprehensive speech-in-noise battery to examine recognition of English sentences across different modalities (audio-only, audiovisual), masker types (steady-state pink noise, two-talker babble), and a range of signal-to-noise ratios (SNRs; 0 to -16 dB). Results revealed no difference in performance between monolingual and simultaneous bilingual children across each combination of modality, masker, and SNR. Our findings suggest that when English age of acquisition and socioeconomic status is similar between groups, monolingual and bilingual children exhibit comparable speech-in-noise performance across a range of conditions analogous to everyday listening environments.
Cross-modal cueing in audiovisual spatial attention

DEFF Research Database (Denmark)

Blurton, Steven Paul; Greenlee, Mark W.; Gondan, Matthias

2015-01-01

effects have been reported for endogenous visual cues while exogenous cues seem to be mostly ineffective. In three experiments, we investigated cueing effects on the processing of audiovisual signals. In Experiment 1 we used endogenous cues to investigate their effect on the detection of auditory, visual......, and audiovisual targets presented with onset asynchrony. Consistent cueing effects were found in all target conditions. In Experiment 2 we used exogenous cues and found cueing effects only for visual target detection, but not auditory target detection. In Experiment 3 we used predictive exogenous cues to examine...
GÖRSEL-İŞİTSEL ÇEVİRİ / AUDIOVISUAL TRANSLATION

Directory of Open Access Journals (Sweden)

Sevtap GÜNAY KÖPRÜLÜ

2016-04-01

Full Text Available Audiovisual translation dating back to the silent film era is a special translation method which has been developed for the translation of the movies and programs shown on TV and cinema. Therefore, in the beginning, the term “film translation” was used for this type of translation. Due to the growing number of audiovisual texts it has attracted the interest of scientists and has been assessed under the translation studies. Also in our country the concept of film translation was used for this area, but recently, the concept of audio-visual has been used. Since it not only encompasses the films but also covers all the audio-visual communicatian tools as well, especially in scientific field. In this study, the aspects are analyzed which should be taken into consideration by the translator during the audio-visual translation process within the framework of source text, translated text, film, technical knowledge and knowledge. At the end of the study, it is shown that there are some factors, apart from linguistic and paralinguistic factors and they must be considered carefully as they can influence the quality of the translation. And it is also shown that the given factors require technical knowledge in translation. In this sense, the audio-visual translation is accessed from a different angle compared to the other researches which have been done.
Dissociated roles of the inferior frontal gyrus and superior temporal sulcus in audiovisual processing: top-down and bottom-up mismatch detection.

Science.gov (United States)

Uno, Takeshi; Kawai, Kensuke; Sakai, Katsuyuki; Wakebe, Toshihiro; Ibaraki, Takuya; Kunii, Naoto; Matsuo, Takeshi; Saito, Nobuhito

2015-01-01

Visual inputs can distort auditory perception, and accurate auditory processing requires the ability to detect and ignore visual input that is simultaneous and incongruent with auditory information. However, the neural basis of this auditory selection from audiovisual information is unknown, whereas integration process of audiovisual inputs is intensively researched. Here, we tested the hypothesis that the inferior frontal gyrus (IFG) and superior temporal sulcus (STS) are involved in top-down and bottom-up processing, respectively, of target auditory information from audiovisual inputs. We recorded high gamma activity (HGA), which is associated with neuronal firing in local brain regions, using electrocorticography while patients with epilepsy judged the syllable spoken by a voice while looking at a voice-congruent or -incongruent lip movement from the speaker. The STS exhibited stronger HGA if the patient was presented with information of large audiovisual incongruence than of small incongruence, especially if the auditory information was correctly identified. On the other hand, the IFG exhibited stronger HGA in trials with small audiovisual incongruence when patients correctly perceived the auditory information than when patients incorrectly perceived the auditory information due to the mismatched visual information. These results indicate that the IFG and STS have dissociated roles in selective auditory processing, and suggest that the neural basis of selective auditory processing changes dynamically in accordance with the degree of incongruity between auditory and visual information.

36 CFR 1237.12 - What record elements must be created and preserved for permanent audiovisual records?

Science.gov (United States)

2010-07-01

... created and preserved for permanent audiovisual records? 1237.12 Section 1237.12 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC... permanent audiovisual records? For permanent audiovisual records, the following record elements must be...
The production of audiovisual teaching tools in minimally invasive surgery.

Science.gov (United States)

Tolerton, Sarah K; Hugh, Thomas J; Cosman, Peter H

2012-01-01

Audiovisual learning resources have become valuable adjuncts to formal teaching in surgical training. This report discusses the process and challenges of preparing an audiovisual teaching tool for laparoscopic cholecystectomy. The relative value in surgical education and training, for both the creator and viewer are addressed. This audiovisual teaching resource was prepared as part of the Master of Surgery program at the University of Sydney, Australia. The different methods of video production used to create operative teaching tools are discussed. Collating and editing material for an audiovisual teaching resource can be a time-consuming and technically challenging process. However, quality learning resources can now be produced even with limited prior video editing experience. With minimal cost and suitable guidance to ensure clinically relevant content, most surgeons should be able to produce short, high-quality education videos of both open and minimally invasive surgery. Despite the challenges faced during production of audiovisual teaching tools, these resources are now relatively easy to produce using readily available software. These resources are particularly attractive to surgical trainees when real time operative footage is used. They serve as valuable adjuncts to formal teaching, particularly in the setting of minimally invasive surgery. Copyright © 2012 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
N1 enhancement in synesthesia during visual and audio-visual perception in semantic cross-modal conflict situations: an ERP study

Directory of Open Access Journals (Sweden)

Christopher eSinke

2014-01-01

Full Text Available Synesthesia entails a special kind of sensory perception, where stimulation in one sensory modality leads to an internally generated perceptual experience of another, not stimulated sensory modality. This phenomenon can be viewed as an abnormal multisensory integration process as here the synesthetic percept is aberrantly fused with the stimulated modality. Indeed, recent synesthesia research has focused on multimodal processing even outside of the specific synesthesia-inducing context and has revealed changed multimodal integration, thus suggesting perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and inanimated objects in an audio-visual congruent and incongruent manner. Fourteen subjects with auditory-visual and/or grapheme-color synesthesia and 14 control subjects participated in the experiment. During presentation of the stimuli, event-related potentials were recorded from 32 electrodes. The analysis of reaction times and error rates revealed no group differences with best performance for audio-visually congruent stimulation indicating the well-known multimodal facilitation effect. We found an enhanced amplitude of the N1 component over occipital electrode sites for synesthetes compared to controls. The differences occurred irrespective of the experimental condition and therefore suggest a global influence on early sensory processing in synesthetes.
From "Piracy" to Payment: Audio-Visual Copyright and Teaching Practice.

Science.gov (United States)

Anderson, Peter

1993-01-01

The changing circumstances in Australia governing the use of broadcast television and radio material in education are examined, from the uncertainty of the early 1980s to current management of copyrighted audiovisual material under the statutory licensing agreement between universities and an audiovisual copyright agency. (MSE)
Audiovisual consumption and its social logics on the web

OpenAIRE

Rose Marie Santini; Juan C. Calvi

2013-01-01

This article analyzes the social logics underlying audiovisualconsumption on digital networks. We retrieved some data on the Internet globaltraffic of audiovisual files since 2008 to identify formats, modes of distributionand consumption of audiovisual contents that tend to prevail on the Web. Thisresearch shows the types of social practices which are dominant among usersand its relation to what we designate as “Internet culture”.
36 CFR 1237.10 - How must agencies manage their audiovisual, cartographic, and related records?

Science.gov (United States)

2010-07-01

... their audiovisual, cartographic, and related records? 1237.10 Section 1237.10 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC, AND RELATED RECORDS MANAGEMENT § 1237.10 How must agencies manage their audiovisual, cartographic, and related...
Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness.

Science.gov (United States)

Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas

2016-01-01

This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a similar level of monaural
[Audio-visual communication in the history of psychiatry].

Science.gov (United States)

Farina, B; Remoli, V; Russo, F

1993-12-01

The authors analyse the evolution of visual communication in the history of psychiatry. From the 18th century oil paintings to the first dagherrotic prints until the cinematography and the modern audiovisual systems they observed an increasing diffusion of the new communication techniques in psychiatry, and described the use of the different techniques in psychiatric practice. The article ends with a brief review of the current applications of the audiovisual in therapy, training, teaching, and research.
Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

Directory of Open Access Journals (Sweden)

Md. Rabiul Islam

2014-01-01

Full Text Available The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs and Linear Prediction Cepstral Coefficients (LPCCs are combined to get the audio feature vectors and Active Shape Model (ASM based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features.
36 CFR 1237.20 - What are special considerations in the maintenance of audiovisual records?

Science.gov (United States)

2010-07-01

... considerations in the maintenance of audiovisual records? 1237.20 Section 1237.20 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC, AND RELATED RECORDS MANAGEMENT § 1237.20 What are special considerations in the maintenance of audiovisual...
77 FR 22803 - Certain Audiovisual Components and Products Containing the Same; Institution of Investigation...

Science.gov (United States)

2012-04-17

... INTERNATIONAL TRADE COMMISSION [Inv. No. 337-TA-837] Certain Audiovisual Components and Products... importation of certain audiovisual components and products containing the same by reason of infringement of... importation, or the sale within the United States after importation of certain audiovisual components and...
Cortical oscillations modulated by congruent and incongruent audiovisual stimuli.

Science.gov (United States)

Herdman, A T; Fujioka, T; Chau, W; Ross, B; Pantev, C; Picton, T W

2004-11-30

Congruent or incongruent grapheme-phoneme stimuli are easily perceived as one or two linguistic objects. The main objective of this study was to investigate the changes in cortical oscillations that reflect the processing of congruent and incongruent audiovisual stimuli. Graphemes were Japanese Hiragana characters for four different vowels (/a/, /o/, /u/, and /i/). They were presented simultaneously with their corresponding phonemes (congruent) or non-corresponding phonemes (incongruent) to native-speaking Japanese participants. Participants' reaction times to the congruent audiovisual stimuli were significantly faster by 57 ms as compared to reaction times to incongruent stimuli. We recorded the brain responses for each condition using a whole-head magnetoencephalograph (MEG). A novel approach to analysing MEG data, called synthetic aperture magnetometry (SAM), was used to identify event-related changes in cortical oscillations involved in audiovisual processing. The SAM contrast between congruent and incongruent responses revealed greater event-related desynchonization (8-16 Hz) bilaterally in the occipital lobes and greater event-related synchronization (4-8 Hz) in the left transverse temporal gyrus. Results from this study further support the concept of interactions between the auditory and visual sensory cortices in multi-sensory processing of audiovisual objects.
Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

Directory of Open Access Journals (Sweden)

Patterson Eric K

2002-01-01

Full Text Available Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are
Neural pathways for visual speech perception

Directory of Open Access Journals (Sweden)

Lynne E Bernstein

2014-12-01

Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Speech-to-Speech Relay Service

Science.gov (United States)

Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...
Audiovisual consumption and its social logics on the web

Directory of Open Access Journals (Sweden)

Rose Marie Santini

2013-06-01

Full Text Available This article analyzes the social logics underlying audiovisualconsumption on digital networks. We retrieved some data on the Internet globaltraffic of audiovisual files since 2008 to identify formats, modes of distributionand consumption of audiovisual contents that tend to prevail on the Web. Thisresearch shows the types of social practices which are dominant among usersand its relation to what we designate as “Internet culture”.
A Randomized Controlled Trial on The Beneficial Effects of Training Letter-Speech Sound Integration on Reading Fluency in Children with Dyslexia.

Directory of Open Access Journals (Sweden)

Gorka Fraga González

Full Text Available A recent account of dyslexia assumes that a failure to develop automated letter-speech sound integration might be responsible for the observed lack of reading fluency. This study uses a pre-test-training-post-test design to evaluate the effects of a training program based on letter-speech sound associations with a special focus on gains in reading fluency. A sample of 44 children with dyslexia and 23 typical readers, aged 8 to 9, was recruited. Children with dyslexia were randomly allocated to either the training program group (n = 23 or a waiting-list control group (n = 21. The training intensively focused on letter-speech sound mapping and consisted of 34 individual sessions of 45 minutes over a five month period. The children with dyslexia showed substantial reading gains for the main word reading and spelling measures after training, improving at a faster rate than typical readers and waiting-list controls. The results are interpreted within the conceptual framework assuming a multisensory integration deficit as the most proximal cause of dysfluent reading in dyslexia.ISRCTN register ISRCTN12783279.
Search in audiovisual broadcast archives

NARCIS (Netherlands)

Huurnink, B.

2010-01-01

Documentary makers, journalists, news editors, and other media professionals routinely require previously recorded audiovisual material for new productions. For example, a news editor might wish to reuse footage from overseas services for the evening news, or a documentary maker describing the
A simple and efficient method to enhance audiovisual binding tendencies

Directory of Open Access Journals (Sweden)

Brian Odegaard

2017-04-01

Full Text Available Individuals vary in their tendency to bind signals from multiple senses. For the same set of sights and sounds, one individual may frequently integrate multisensory signals and experience a unified percept, whereas another individual may rarely bind them and often experience two distinct sensations. Thus, while this binding/integration tendency is specific to each individual, it is not clear how plastic this tendency is in adulthood, and how sensory experiences may cause it to change. Here, we conducted an exploratory investigation which provides evidence that (1 the brain’s tendency to bind in spatial perception is plastic, (2 that it can change following brief exposure to simple audiovisual stimuli, and (3 that exposure to temporally synchronous, spatially discrepant stimuli provides the most effective method to modify it. These results can inform current theories about how the brain updates its internal model of the surrounding sensory world, as well as future investigations seeking to increase integration tendencies.
The process of developing audiovisual patient information: challenges and opportunities.

Science.gov (United States)

Hutchison, Catherine; McCreaddie, May

2007-11-01

The aim of this project was to produce audiovisual patient information, which was user friendly and fit for purpose. The purpose of the audiovisual patient information is to inform patients about randomized controlled trials, as a supplement to their trial-specific written information sheet. Audiovisual patient information is known to be an effective way of informing patients about treatment. User involvement is also recognized as being important in the development of service provision. The aim of this paper is (i) to describe and discuss the process of developing the audiovisual patient information and (ii) to highlight the challenges and opportunities, thereby identifying implications for practice. A future study will test the effectiveness of the audiovisual patient information in the cancer clinical trial setting. An advisory group was set up to oversee the project and provide guidance in relation to information content, level and delivery. An expert panel of two patients provided additional guidance and a dedicated operational team dealt with the logistics of the project including: ethics; finance; scriptwriting; filming; editing and intellectual property rights. Challenges included the limitations of filming in a busy clinical environment, restricted technical and financial resources, ethical needs and issues around copyright. There were, however, substantial opportunities that included utilizing creative skills, meaningfully involving patients, teamworking and mutual appreciation of clinical, multidisciplinary and technical expertise. Developing audiovisual patient information is an important area for nurses to be involved with. However, this must be performed within the context of the multiprofessional team. Teamworking, including patient involvement, is crucial as a wide variety of expertise is required. Many aspects of the process are transferable and will provide information and guidance for nurses, regardless of specialty, considering developing this

Strategies for media literacy: Audiovisual skills and the citizenship in Andalusia

Directory of Open Access Journals (Sweden)

Ignacio Aguaded-Gómez

2012-07-01

Full Text Available Media consumption is an undeniable fact in present-day society. The hours that members of all social segments spend in front of a screen take up a large part of their leisure time worldwide. Audiovisual communication becomes especially important within the context of today’s digital society (society-network, where information and communication technologies pervade all corners of everyday life. However, people do not own enough audiovisual media skills to cope with this mass media omnipresence. Neither the education system nor civic associations, or the media themselves, have promoted audiovisual skills to make people critically competent when viewing media. This study aims to provide an updated conceptualization of the “audiovisual skill” in this digital environment and transpose it onto a specific interventional environment, seeking to detect needs and shortcomings, plan global strategies to be adopted by governments and devise training programmes for the various sectors involved.
Situación actual de la traducción audiovisual en Colombia

Directory of Open Access Journals (Sweden)

Jeffersson David Orrego Carmona

2010-05-01

Full Text Available Objetivos: el presente artículo tiene dos objetivos: dar a conocer el panorama general del mercado actual de la traducción audiovisual en Colombia y resaltar la importancia de desarrollar estudios en esta área. Método: la metodología empleada incluyó investigación y lectura de bibliografía relacionada con el tema, aplicación de encuestas a diferentes grupos vinculados con la traducción audiovisual y el posterior análisis. Resultados: éstos mostraron el desconocimiento general que hay sobre esta labor y las preferencias de los grupos encuestados sobre las modalidades de traducción audiovisual. Se pudo observar que hay una marcada preferencia por el subtitulaje, por razones particulares de cada grupo. Conclusiones: los traductores colombianos necesitan un entrenamiento en traducción audiovisual para satisfacer las demandas del mercado y se resalta la importancia de desarrollar estudios más profundos enfocados en el desarrollo de la traducción audiovisual en Colombia.
A conceptual framework for audio-visual museum media

DEFF Research Database (Denmark)

Kirkedahl Lysholm Nielsen, Mikkel

2017-01-01

In today's history museums, the past is communicated through many other means than original artefacts. This interdisciplinary and theoretical article suggests a new approach to studying the use of audio-visual media, such as film, video and related media types, in a museum context. The centre...... and museum studies, existing case studies, and real life observations, the suggested framework instead stress particular characteristics of contextual use of audio-visual media in history museums, such as authenticity, virtuality, interativity, social context and spatial attributes of the communication...
Future-saving audiovisual content for Data Science: Preservation of geoinformatics video heritage with the TIB|AV-Portal

Science.gov (United States)

Löwe, Peter; Plank, Margret; Ziedorn, Frauke

2015-04-01

of Science and Technology. The web-based portal allows for extended search capabilities based on enhanced metadata derived by automated video analysis. By combining state-of-the-art multimedia retrieval techniques such as speech-, text-, and image recognition with semantic analysis, content-based access to videos at the segment level is provided. Further, by using the open standard Media Fragment Identifier (MFID), a citable Digital Object Identifier is displayed for each video segment. In addition to the continuously growing footprint of contemporary content, the importance of vintage audiovisual information needs to be considered: This paper showcases the successful application of the TIB|AV-Portal in the preservation and provision of a newly discovered version of a GRASS GIS promotional video produced by US Army -Corps of Enginers Laboratory (US-CERL) in 1987. The video is provides insight into the constraints of the very early days of the GRASS GIS project, which is the oldest active Free and Open Source Software (FOSS) GIS project which has been active for over thirty years. GRASS itself has turned into a collaborative scientific platform and a repository of scientific peer-reviewed code and algorithm/knowledge hub for future generation of scientists [1]. This is a reference case for future preservation activities regarding semantic-enhanced Web 2.0 content from geospatial software projects within Academia and beyond. References: [1] Chemin, Y., Petras V., Petrasova, A., Landa, M., Gebbert, S., Zambelli, P., Neteler, M., Löwe, P.: GRASS GIS: a peer-reviewed scientific platform and future research Repository, Geophysical Research Abstracts, Vol. 17, EGU2015-8314-1, 2015 (submitted)
Understanding the basics of audiovisual archiving in Africa and the ...

African Journals Online (AJOL)

In the developed world, the cultural value of the audiovisual media gained legitimacy and widening acceptance after World War II, and this is what Africa still requires. There are a lot of problems in Africa, and because of this, activities such as preservation of a historical record, especially in the audiovisual media are seen as ...
Evidence of a visual-to-auditory cross-modal sensory gating phenomenon as reflected by the human P50 event-related brain potential modulation.

Science.gov (United States)

Lebib, Riadh; Papo, David; de Bode, Stella; Baudonnière, Pierre Marie

2003-05-08

We investigated the existence of a cross-modal sensory gating reflected by the modulation of an early electrophysiological index, the P50 component. We analyzed event-related brain potentials elicited by audiovisual speech stimuli manipulated along two dimensions: congruency and discriminability. The results showed that the P50 was attenuated when visual and auditory speech information were redundant (i.e. congruent), in comparison with this same event-related potential component elicited with discrepant audiovisual dubbing. When hard to discriminate, however, bimodal incongruent speech stimuli elicited a similar pattern of P50 attenuation. We concluded to the existence of a visual-to-auditory cross-modal sensory gating phenomenon. These results corroborate previous findings revealing a very early audiovisual interaction during speech perception. Finally, we postulated that the sensory gating system included a cross-modal dimension.
Personal Audiovisual Aptitude Influences the Interaction Between Landscape and Soundscape Appraisal.

Science.gov (United States)

Sun, Kang; Echevarria Sanchez, Gemma M; De Coensel, Bert; Van Renterghem, Timothy; Talsma, Durk; Botteldooren, Dick

2018-01-01

It has been established that there is an interaction between audition and vision in the appraisal of our living environment, and that this appraisal is influenced by personal factors. Here, we test the hypothesis that audiovisual aptitude influences appraisal of our sonic and visual environment. To measure audiovisual aptitude, an auditory deviant detection experiment was conducted in an ecologically valid and complex context. This experiment allows us to distinguish between accurate and less accurate listeners. Additionally, it allows to distinguish between participants that are easily visually distracted and those who are not. To do so, two previously conducted laboratory experiments were re-analyzed. The first experiment focuses on self-reported noise annoyance in a living room context, whereas the second experiment focuses on the perceived pleasantness of using outdoor public spaces. In the first experiment, the influence of visibility of vegetation on self-reported noise annoyance was modified by audiovisual aptitude. In the second one, it was found that the overall appraisal of walking across a bridge is influenced by audiovisual aptitude, in particular when a visually intrusive noise barrier is used to reduce highway traffic noise levels. We conclude that audiovisual aptitude may affect the appraisal of the living environment.
Basic to Applied Research: The Benefits of Audio-Visual Speech Perception Research in Teaching Foreign Languages

Science.gov (United States)

Erdener, Dogu

2016-01-01

Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…
Neuronal basis of speech comprehension.

Science.gov (United States)

Specht, Karsten

2014-01-01

Verbal communication does not rely only on the simple perception of auditory signals. It is rather a parallel and integrative processing of linguistic and non-linguistic information, involving temporal and frontal areas in particular. This review describes the inherent complexity of auditory speech comprehension from a functional-neuroanatomical perspective. The review is divided into two parts. In the first part, structural and functional asymmetry of language relevant structures will be discus. The second part of the review will discuss recent neuroimaging studies, which coherently demonstrate that speech comprehension processes rely on a hierarchical network involving the temporal, parietal, and frontal lobes. Further, the results support the dual-stream model for speech comprehension, with a dorsal stream for auditory-motor integration, and a ventral stream for extracting meaning but also the processing of sentences and narratives. Specific patterns of functional asymmetry between the left and right hemisphere can also be demonstrated. The review article concludes with a discussion on interactions between the dorsal and ventral streams, particularly the involvement of motor related areas in speech perception processes, and outlines some remaining unresolved issues. This article is part of a Special Issue entitled Human Auditory Neuroimaging. Copyright © 2013 Elsevier B.V. All rights reserved.
On the relevance of script writing basics in audiovisual translation practice and training

Directory of Open Access Journals (Sweden)

Juan José Martínez-Sierra

2012-07-01

Full Text Available http://dx.doi.org/10.5007/2175-7968.2012v1n29p145 Audiovisual texts possess characteristics that clearly differentiate audiovisual translation from both oral and written translation, and prospective screen translators are usually taught about the issues that typically arise in audiovisual translation. This article argues for the development of an interdisciplinary approach that brings together Translation Studies and Film Studies, which would prepare future audiovisual translators to work with the nature and structure of a script in mind, in addition to the study of common and diverse translational aspects. Focusing on film, the article briefly discusses the nature and structure of scripts, and identifies key points in the development and structuring of a plot. These key points and various potential hurdles are illustrated with examples from the films Chinatown and La habitación de Fermat. The second part of this article addresses some implications for teaching audiovisual translation.
36 CFR 1237.26 - What materials and processes must agencies use to create audiovisual records?

Science.gov (United States)

2010-07-01

... must agencies use to create audiovisual records? 1237.26 Section 1237.26 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC, AND RELATED RECORDS MANAGEMENT § 1237.26 What materials and processes must agencies use to create audiovisual...
La estacíon de trabajo del traductor audiovisual: Herramientas y Recursos.

Directory of Open Access Journals (Sweden)

Anna Matamala

2005-01-01

Full Text Available In this article, we discuss the relationship between audiovisual translation and new technologies, and describe the characteristics of the audiovisual translator´s workstation, especially as regards dubbing and voiceover. After presenting the tools necessary for the translator to perform his/ her task satisfactorily as well as pointing to future perspectives, we make a list of sources that can be consulted in order to solve translation problems, including those available on the Internet. Keywords: audiovisual translation, new technologies, Internet, translator´s tools.
A general audiovisual temporal processing deficit in adult readers with dyslexia

NARCIS (Netherlands)

Francisco, A.A.; Jesse, A.; Groen, M.A.; McQueen, J.M.

2017-01-01

Purpose: Because reading is an audiovisual process, reading impairment may reflect an audiovisual processing deficit. The aim of the present study was to test the existence and scope of such a deficit in adult readers with dyslexia. Method: We tested 39 typical readers and 51 adult readers with
Multiple concurrent temporal recalibrations driven by audiovisual stimuli with apparent physical differences.

Science.gov (United States)

Yuan, Xiangyong; Bi, Cuihua; Huang, Xiting

2015-05-01

Out-of-synchrony experiences can easily recalibrate one's subjective simultaneity point in the direction of the experienced asynchrony. Although temporal adjustment of multiple audiovisual stimuli has been recently demonstrated to be spatially specific, perceptual grouping processes that organize separate audiovisual stimuli into distinctive "objects" may play a more important role in forming the basis for subsequent multiple temporal recalibrations. We investigated whether apparent physical differences between audiovisual pairs that make them distinct from each other can independently drive multiple concurrent temporal recalibrations regardless of spatial overlap. Experiment 1 verified that reducing the physical difference between two audiovisual pairs diminishes the multiple temporal recalibrations by exposing observers to two utterances with opposing temporal relationships spoken by one single speaker rather than two distinct speakers at the same location. Experiment 2 found that increasing the physical difference between two stimuli pairs can promote multiple temporal recalibrations by complicating their non-temporal dimensions (e.g., disks composed of two rather than one attribute and tones generated by multiplying two frequencies); however, these recalibration aftereffects were subtle. Experiment 3 further revealed that making the two audiovisual pairs differ in temporal structures (one transient and one gradual) was sufficient to drive concurrent temporal recalibration. These results confirm that the more audiovisual pairs physically differ, especially in temporal profile, the more likely multiple temporal perception adjustments will be content-constrained regardless of spatial overlap. These results indicate that multiple temporal recalibrations are based secondarily on the outcome of perceptual grouping processes.
Multimodal user interfaces to improve social integration of elderly and mobility impaired.

Science.gov (United States)

Dias, Miguel Sales; Pires, Carlos Galinho; Pinto, Fernando Miguel; Teixeira, Vítor Duarte; Freitas, João

2012-01-01

Technologies for Human-Computer Interaction (HCI) and Communication have evolved tremendously over the past decades. However, citizens such as mobility impaired or elderly or others, still face many difficulties interacting with communication services, either due to HCI issues or intrinsic design problems with the services. In this paper we start by presenting the results of two user studies, the first one conducted with a group of mobility impaired users, comprising paraplegic and quadriplegic individuals; and the second one with elderly. The study participants carried out a set of tasks with a multimodal (speech, touch, gesture, keyboard and mouse) and multi-platform (mobile, desktop) system, offering an integrated access to communication and entertainment services, such as email, agenda, conferencing, instant messaging and social media, referred to as LHC - Living Home Center. The system was designed to take into account the requirements captured from these users, with the objective of evaluating if the adoption of multimodal interfaces for audio-visual communication and social media services, could improve the interaction with such services. Our study revealed that a multimodal prototype system, offering natural interaction modalities, especially supporting speech and touch, can in fact improve access to the presented services, contributing to the reduction of social isolation of mobility impaired, as well as elderly, and improving their digital inclusion.
A pilot study of audiovisual family meetings in the intensive care unit.

Science.gov (United States)

de Havenon, Adam; Petersen, Casey; Tanana, Michael; Wold, Jana; Hoesch, Robert

2015-10-01

We hypothesized that virtual family meetings in the intensive care unit with conference calling or Skype videoconferencing would result in increased family member satisfaction and more efficient decision making. This is a prospective, nonblinded, nonrandomized pilot study. A 6-question survey was completed by family members after family meetings, some of which used conference calling or Skype by choice. Overall, 29 (33%) of the completed surveys came from audiovisual family meetings vs 59 (67%) from control meetings. The survey data were analyzed using hierarchical linear modeling, which did not find any significant group differences between satisfaction with the audiovisual meetings vs controls. There was no association between the audiovisual intervention and withdrawal of care (P = .682) or overall hospital length of stay (z = 0.885, P = .376). Although we do not report benefit from an audiovisual intervention, these results are preliminary and heavily influenced by notable limitations to the study. Given that the intervention was feasible in this pilot study, audiovisual and social media intervention strategies warrant additional investigation given their unique ability to facilitate communication among family members in the intensive care unit. Copyright © 2015 Elsevier Inc. All rights reserved.
Exploring Student Perceptions of Audiovisual Feedback via Screencasting in Online Courses

Science.gov (United States)

Mathieson, Kathleen

2012-01-01

Using Moore's (1993) theory of transactional distance as a framework, this action research study explored students' perceptions of audiovisual feedback provided via screencasting as a supplement to text-only feedback. A crossover design was employed to ensure that all students experienced both text-only and text-plus-audiovisual feedback and to…
Speech understanding in noise with integrated in-ear and muff-style hearing protection systems

Directory of Open Access Journals (Sweden)

Sharon M Abel

2011-01-01

Full Text Available Integrated hearing protection systems are designed to enhance free field and radio communications during military operations while protecting against the damaging effects of high-level noise exposure. A study was conducted to compare the effect of increasing the radio volume on the intelligibility of speech over the radios of two candidate systems, in-ear and muff-style, in 85-dBA speech babble noise presented free field. Twenty normal-hearing, English-fluent subjects, half male and half female, were tested in same gender pairs. Alternating as talker and listener, their task was to discriminate consonant-vowel-consonant syllables that contrasted either the initial or final consonant. Percent correct consonant discrimination increased with increases in the radio volume. At the highest volume, subjects achieved 79% with the in-ear device but only 69% with the muff-style device, averaged across the gender of listener/talker pairs and consonant position. Although there was no main effect of gender, female listener/talkers showed a 10% advantage for the final consonant and male listener/talkers showed a 1% advantage for the initial consonant. These results indicate that normal hearing users can achieve reasonably high radio communication scores with integrated in-ear hearing protection in moderately high-level noise that provides both energetic and informational masking. The adequacy of the range of available radio volumes for users with hearing loss has yet to be determined.
La comunicación corporativa audiovisual: propuesta metodológica de estudio

OpenAIRE

Lorán Herrero, María Dolores

2016-01-01

Esta investigación, versa en torno a dos conceptos, la Comunicación Audiovisual y La Comunicación Corporativa, disciplinas que afectan a las organizaciones y que se van articulando de tal manera que dan lugar a la Comunicación Corporativa Audiovisual, concepto que se propone en esta tesis. Se realiza una clasificación y definición de los formatos que utilizan las organizaciones para su comunicación. Se trata de poder analizar cualquier documento audiovisual corporativo para constatar si el l...
Spatiotemporal Relationships among Audiovisual Stimuli Modulate Auditory Facilitation of Visual Target Discrimination.

Science.gov (United States)

Li, Qi; Yang, Huamin; Sun, Fang; Wu, Jinglong

2015-03-01

Sensory information is multimodal; through audiovisual interaction, task-irrelevant auditory stimuli tend to speed response times and increase visual perception accuracy. However, mechanisms underlying these performance enhancements have remained unclear. We hypothesize that task-irrelevant auditory stimuli might provide reliable temporal and spatial cues for visual target discrimination and behavioral response enhancement. Using signal detection theory, the present study investigated the effects of spatiotemporal relationships on auditory facilitation of visual target discrimination. Three experiments were conducted where an auditory stimulus maintained reliable temporal and/or spatial relationships with visual target stimuli. Results showed that perception sensitivity (d') to visual target stimuli was enhanced only when a task-irrelevant auditory stimulus maintained reliable spatiotemporal relationships with a visual target stimulus. When only reliable spatial or temporal information was contained, perception sensitivity was not enhanced. These results suggest that reliable spatiotemporal relationships between visual and auditory signals are required for audiovisual integration during a visual discrimination task, most likely due to a spread of attention. These results also indicate that auditory facilitation of visual target discrimination follows from late-stage cognitive processes rather than early stage sensory processes. © 2015 SAGE Publications.

77 FR 16561 - Certain Audiovisual Components and Products Containing the Same; Notice of Receipt of Complaint...

Science.gov (United States)

2012-03-21

... INTERNATIONAL TRADE COMMISSION [DN 2884] Certain Audiovisual Components and Products Containing.... International Trade Commission has received a complaint entitled Certain Audiovisual Components and Products... audiovisual components and products containing the same. The complaint names as respondents Funai Electric...
36 CFR 1237.14 - What are the additional scheduling requirements for audiovisual, cartographic, and related records?

Science.gov (United States)

2010-07-01

... scheduling requirements for audiovisual, cartographic, and related records? 1237.14 Section 1237.14 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL... audiovisual, cartographic, and related records? The disposition instructions should also provide that...
77 FR 16560 - Certain Audiovisual Components and Products Containing the Same; Notice of Receipt of Complaint...

Science.gov (United States)

2012-03-21

... INTERNATIONAL TRADE COMMISSION [DN 2884] Certain Audiovisual Components and Products Containing.... International Trade Commission has received a complaint entitled Certain Audiovisual Components and Products... audiovisual components and products containing the same. The complaint names as respondents Funai Electric...
Psychophysiological effects of audiovisual stimuli during cycle exercise.

Science.gov (United States)

Barreto-Silva, Vinícius; Bigliassi, Marcelo; Chierotti, Priscila; Altimari, Leandro R

2018-05-01

Immersive environments induced by audiovisual stimuli are hypothesised to facilitate the control of movements and ameliorate fatigue-related symptoms during exercise. The objective of the present study was to investigate the effects of pleasant and unpleasant audiovisual stimuli on perceptual and psychophysiological responses during moderate-intensity exercises performed on an electromagnetically braked cycle ergometer. Twenty young adults were administered three experimental conditions in a randomised and counterbalanced order: unpleasant stimulus (US; e.g. images depicting laboured breathing); pleasant stimulus (PS; e.g. images depicting pleasant emotions); and neutral stimulus (NS; e.g. neutral facial expressions). The exercise had 10 min of duration (2 min of warm-up + 6 min of exercise + 2 min of warm-down). During all conditions, the rate of perceived exertion and heart rate variability were monitored to further understanding of the moderating influence of audiovisual stimuli on perceptual and psychophysiological responses, respectively. The results of the present study indicate that PS ameliorated fatigue-related symptoms and reduced the physiological stress imposed by the exercise bout. Conversely, US increased the global activity of the autonomic nervous system and increased exertional responses to a greater degree when compared to PS. Accordingly, audiovisual stimuli appear to induce a psychophysiological response in which individuals visualise themselves within the story presented in the video. In such instances, individuals appear to copy the behaviour observed in the videos as if the situation was real. This mirroring mechanism has the potential to up-/down-regulate the cardiac work as if in fact the exercise intensities were different in each condition.
Audio-visual feedback improves the BCI performance in the navigational control of a humanoid robot

Directory of Open Access Journals (Sweden)

Emmanuele eTidoni

2014-06-01

Full Text Available Advancement in brain computer interfaces (BCI technology allows people to actively interact in the world through surrogates. Controlling real humanoid robots using BCI as intuitively as we control our body represents a challenge for current research in robotics and neuroscience. In order to successfully interact with the environment the brain integrates multiple sensory cues to form a coherent representation of the world. Cognitive neuroscience studies demonstrate that multisensory integration may imply a gain with respect to a single modality and ultimately improve the overall sensorimotor performance. For example, reactivity to simultaneous visual and auditory stimuli may be higher than to the sum of the same stimuli delivered in isolation or in temporal sequence. Yet, knowledge about whether audio-visual integration may improve the control of a surrogate is meager. To explore this issue, we provided human footstep sounds as audio feedback to BCI users while controlling a humanoid robot. Participants were asked to steer their robot surrogate and perform a pick-and-place task through BCI-SSVEPs. We found that audio-visual synchrony between footsteps sound and actual humanoid’s walk reduces the time required for steering the robot. Thus, auditory feedback congruent with the humanoid actions may improve motor decisions of the BCI’s user and help in the feeling of control over it. Our results shed light on the possibility to increase robot’s control through the combination of multisensory feedback to a BCI user.
KAMAN PELAYANAN MEDIA AUDIOVISUAL: STUDI KASUS DI THE BRITISH COUNCIL JAKARTA

Directory of Open Access Journals (Sweden)

Hindar Purnomo

2015-12-01

Full Text Available Tujuan penelitian ini adalah untuk mengetahui cara penyelenggaraan pelayanan media AV, efektivitas pelayanan serta tingkat kepuasan pemustaka terhadap berbagai aspek pelayanan. Penelitian dilakukan di The British Council Jakarta dengan cara evaluasi karena dengan cara ini dapat diketahui berbagai fenomena yang terjadi. Perpustakaan British Council menyediakan tiga jenis media yaitu berupa kaset video, kaset audio, dan siaran televisi BBC. Subjek penelitian adalah pemakai jasa pelaya-nan media audiovisual yang terdaftar sebagai anggota. Subjek dikelompokkan berdasarkan kelompok usia dan kelompok tujuan pemanfaatan media AV. Data angket terkumpul sebanyak 157 responden (75,48% kemudian dianalisis secara statistik dengan uji analisis varian sate arah Kruskal-Wallis. Hasil penelitian menunjukkan bahwa ketiga media tersebut diminati oleh banyak pemakai terutama pada kelompok usia muda. Sebagian besar pemustaka lebih menyukai jenis fiksi dibandingkan jenis nonfiksi, mereka menggunakan media audiovisual untuk mencari informasi pengetahuan. Pelayanan media audiovisual terbukti sangat efektif dilihat dari angka keterpakaian koleksi maupun tingkat kepuasan pemakain. Hasil uji hipotesis menunjukkan bahwa antarkelompok usia maupun tujuan kegunaan tidak ada perbedaan yang berarti dalam menanggapi berbagai aspek pelayanan media audiovisual. Kata Kunci: MediaAudio Visual-Layanan Perpustakaan
Inner Speech: Development, Cognitive Functions, Phenomenology, and Neurobiology

Science.gov (United States)

2015-01-01

Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and largely unintegrated. This review examines prominent theoretical approaches to inner speech and methodological challenges in its study, before reviewing current evidence on inner speech in children and adults from both typical and atypical populations. We conclude by considering prospects for an integrated cognitive science of inner speech, and present a multicomponent model of the phenomenon informed by developmental, cognitive, and psycholinguistic considerations. Despite its variability among individuals and across the life span, inner speech appears to perform significant functions in human cognition, which in some cases reflect its developmental origins and its sharing of resources with other cognitive processes. PMID:26011789
Self-Voice, but Not Self-Face, Reduces the McGurk Effect

Directory of Open Access Journals (Sweden)

Christopher Aruffo

2011-10-01

Full Text Available The McGurk effect represents a perceptual illusion resulting from the integration of an auditory syllable dubbed onto an incongruous visual syllable. The involuntary and impenetrable nature of the illusion is frequently used to support the multisensory nature of audiovisual speech perception. Here we show that both self-speech and familiarized speech reduce the effect. When self-speech was separated into self-voice and self-face mismatched with different faces and voices, only self-voice weakened the illusion. Thus, a familiar vocal identity automatically confers a processing advantage to multisensory speech, while a familiar facial identity does not. When another group of participants were familiarized with the speakers, participants' ability to take advantage of that familiarization was inversely correlated with their overall susceptibility to the McGurk illusion.
36 CFR 1235.42 - What specifications and standards for transfer apply to audiovisual records, cartographic, and...

Science.gov (United States)

2010-07-01

... standards for transfer apply to audiovisual records, cartographic, and related records? 1235.42 Section 1235... Standards § 1235.42 What specifications and standards for transfer apply to audiovisual records... elements that are needed for future preservation, duplication, and reference for audiovisual records...
16 CFR 307.8 - Requirements for disclosure in audiovisual and audio advertising.

Science.gov (United States)

2010-01-01

... 16 Commercial Practices 1 2010-01-01 2010-01-01 false Requirements for disclosure in audiovisual and audio advertising. 307.8 Section 307.8 Commercial Practices FEDERAL TRADE COMMISSION REGULATIONS... ACT OF 1986 Advertising Disclosures § 307.8 Requirements for disclosure in audiovisual and audio...
A General Audiovisual Temporal Processing Deficit in Adult Readers with Dyslexia

Science.gov (United States)

Francisco, Ana A.; Jesse, Alexandra; Groen, Margriet A.; McQueen, James M.

2017-01-01

Purpose: Because reading is an audiovisual process, reading impairment may reflect an audiovisual processing deficit. The aim of the present study was to test the existence and scope of such a deficit in adult readers with dyslexia. Method: We tested 39 typical readers and 51 adult readers with dyslexia on their sensitivity to the simultaneity of…
Film Studies in Motion : From Audiovisual Essay to Academic Research Video

NARCIS (Netherlands)

Kiss, Miklós; van den Berg, Thomas

2016-01-01

Our (co-written with Thomas van den Berg) ‪media rich,‬ ‪‎open access‬ ‪‎Scalar‬ ‪e-book‬ on the ‪‎Audiovisual Essay‬ practice is available online: http://scalar.usc.edu/works/film-studies-in-motion Audiovisual essaying should be more than an appropriation of traditional video artistry, or a mere
The development of co-speech gesture and its semantic integration with speech in 6- to 12-year-old children with autism spectrum disorders.

Science.gov (United States)

So, Wing-Chee; Wong, Miranda Kit-Yi; Lui, Ming; Yip, Virginia

2015-11-01

Previous work leaves open the question of whether children with autism spectrum disorders aged 6-12 years have delay in producing gestures compared to their typically developing peers. This study examined gestural production among school-aged children in a naturalistic context and how their gestures are semantically related to the accompanying speech. Delay in gestural production was found in children with autism spectrum disorders through their middle to late childhood. Compared to their typically developing counterparts, children with autism spectrum disorders gestured less often and used fewer types of gestures, in particular markers, which carry culture-specific meaning. Typically developing children's gestural production was related to language and cognitive skills, but among children with autism spectrum disorders, gestural production was more strongly related to the severity of socio-communicative impairment. Gesture impairment also included the failure to integrate speech with gesture: in particular, supplementary gestures are absent in children with autism spectrum disorders. The findings extend our understanding of gestural production in school-aged children with autism spectrum disorders during spontaneous interaction. The results can help guide new therapies for gestural production for children with autism spectrum disorders in middle and late childhood. © The Author(s) 2014.
Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments

OpenAIRE

Pascoal, Rui; Ribeiro, Ricardo; Batista, Fernando; de Almeida, Ana

2017-01-01

This paper describes the process of integrating automatic speech recognition (ASR) into a mobile application and explores the benefits and challenges of integrating speech with augmented reality (AR) in outdoor environments. The augmented reality allows end-users to interact with the information displayed and perform tasks, while increasing the users perception about the real world by adding virtual information to it. Speech is the most natural way of communication: it allows hands-free inte...
Audiovisual cultural heritage: bridging the gap between digital archives and its users

NARCIS (Netherlands)

Ongena, G.; Donoso, Veronica; Geerts, David; Cesar, Pablo; de Grooff, Dirk

2009-01-01

This document describes a PhD research track on the disclosure of audiovisual digital archives. The domain of audiovisual material is introduced as well as a problem description is formulated. The main research objective is to investigate the gap between the different users and the digital archives.
Studies of Speech Disorders in Schizophrenia. History and State-of-the-art

Directory of Open Access Journals (Sweden)

Shedovskiy E. F.

2015-08-01

Full Text Available The article reviews studies of speech disorders in schizophrenia. The authors paid attention to a historical course and characterization of studies of areas: the actual psychopathological (speech disorders as a psychopathological symptoms, their description and taxonomy, psychological (isolated neurons and pathopsychological perspective analysis separately analyzed some modern foreign works, covering a variety of approaches to the study of speech disorders in the endogenous mental disorders. Disorders and features of speech are among the most striking manifestations of schizophrenia along with impaired thinking (Savitskaya A. V., Mikirtumov B. E.. With all the variety of symptoms, speech disorders in schizophrenia could be classified and organized. The few clinical psychological studies of speech activity in schizophrenia presented work on the study of generation and standard speech utterance; features verbal associative process, speed parameters of speech utterances. Special attention is given to integrated research in the mainstream of biological psychiatry and genetic trends. It is shown that the topic for more than a half-century history of originality of speech pathology in schizophrenia has received some coverage in the psychiatric and psychological literature and continues to generate interest in the modern integrated multidisciplinary approach
Tuning and disrupting the brain – modulating the McGurk illusion with electrical stimulation

Directory of Open Access Journals (Sweden)

Lucas M Marques

2014-08-01

Full Text Available In the so-called McGurk illusion, when the synchronized presentation of the visual stimulus /ga/ is paired with the auditory stimulus /ba/, people in general hear it as /da/. Multisensory integration processing underlying this illusion seems to occur within the Superior Temporal Sulcus (STS. Herein, we present evidence demonstrating that bilateral cathodal transcranial direct current stimulation (tDCS of this area can decrease the McGurk illusion-type responses. Additionally, we show that the manipulation of this audio-visual integrated output occurs irrespective of the number of eye-fixations on the mouth of the speaker. Bilateral anodal tDCS of the Parietal Cortex also modulates the illusion, but in the opposite manner, inducing more illusion-type responses. This is the first demonstration of using non-invasive brain stimulation to modulate multisensory speech perception in an illusory context (i.e., both increasing and decreasing illusion-type responses to a verbal audio-visual integration task. These findings provide clear evidence that both the superior temporal and parietal areas contribute to multisensory integration processing related to speech perception. Specifically, STS seems fundamental for the temporal synchronization and integration of auditory and visual inputs. For its part, PPC may adjust the arrival of incoming audio and visual information to STS thereby enhancing their interaction in this latter area.
Audiovisual Association Learning in the Absence of Primary Visual Cortex

OpenAIRE

Seirafi, Mehrdad; De Weerd, Peter; Pegna, Alan J.; de Gelder, Beatrice

2016-01-01

Learning audiovisual associations is mediated by the primary cortical areas; however, recent animal studies suggest that such learning can take place even in the absence of the primary visual cortex. Other studies have demonstrated the involvement of extra-geniculate pathways and especially the superior colliculus (SC) in audiovisual association learning. Here, we investigated such learning in a rare human patient with complete loss of the bilateral striate cortex. We carried out an implicit ...
Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

Directory of Open Access Journals (Sweden)

Heracleous Panikos

2007-01-01

Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap

OpenAIRE

Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin?Ya

2013-01-01

It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated, and opposing, estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possib...

Ensenyar amb casos audiovisuals en l'entorn virtual: metodologia i resultats

OpenAIRE

Triadó i Ivern, Xavier Ma.; Aparicio Chueca, Ma. del Pilar (María del Pilar); Jaría Chacón, Natalia; Gallardo-Gallardo, Eva; Elasri Ejjaberi, Amal

2010-01-01

Aquest quadern pretén posar i donar a conèixer les bases d'una metodologia que serveixi per engegar experiències d'aprenentatge amb casos audiovisuals en l'entorn del campus virtual. Per aquest motiu, s'ha definit un protocol metodològic per utilitzar els casos audiovisuals dins l'entorn del campus virtual a diferents assignatures.
SUSTAINABILITY IN THE BOWELS OF SPEECHES

Directory of Open Access Journals (Sweden)

Jadir Mauro Galvao

2012-10-01

Full Text Available The theme of sustainability has not yet achieved the feat of make up as an integral part the theoretical medley that brings out our most everyday actions, often visits some of our thoughts and permeates many of our speeches. The big event of 2012, the meeting gathered Rio +20 glances from all corners of the planet around that theme as burning, but we still see forward timidly. Although we have no very clear what the term sustainability closes it does not sound quite strange. Associate with things like ecology, planet, wastes emitted by smokestacks of factories, deforestation, recycling and global warming must be related, but our goal in this article is the least of clarifying the term conceptually and more try to observe as it appears in speeches of such conference. When the competent authorities talk about sustainability relate to what? We intend to investigate the lines and between the lines of these speeches, any assumptions associated with the term. Therefore we will analyze the speech of the People´s Summit, the opening speech of President Dilma and emblematic speech of the President of Uruguay, José Pepe Mujica.
Policing Fish at Boston's Museum of Science: Studying Audiovisual Interaction in the Wild.

Science.gov (United States)

Goldberg, Hannah; Sun, Yile; Hickey, Timothy J; Shinn-Cunningham, Barbara; Sekuler, Robert

2015-08-01

Boston's Museum of Science supports researchers whose projects advance science and provide educational opportunities to the Museum's visitors. For our project, 60 visitors to the Museum played "Fish Police!!," a video game that examines audiovisual integration, including the ability to ignore irrelevant sensory information. Players, who ranged in age from 6 to 82 years, made speeded responses to computer-generated fish that swam rapidly across a tablet display. Responses were to be based solely on the rate (6 or 8 Hz) at which a fish's size modulated, sinusoidally growing and shrinking. Accompanying each fish was a task-irrelevant broadband sound, amplitude modulated at either 6 or 8 Hz. The rates of visual and auditory modulation were either Congruent (both 6 Hz or 8 Hz) or Incongruent (6 and 8 or 8 and 6 Hz). Despite being instructed to ignore the sound, players of all ages responded more accurately and faster when a fish's auditory and visual signatures were Congruent. In a controlled laboratory setting, a related task produced comparable results, demonstrating the robustness of the audiovisual interaction reported here. Some suggestions are made for conducting research in public settings.
Plan empresa productora de audiovisuales : La Central Audiovisual y Publicidad

OpenAIRE

Arroyave Velasquez, Alejandro

2015-01-01

El presente documento corresponde al plan de creación de empresa La Central Publicidad y Audiovisual, una empresa dedicada a la pre-producción, producción y post-producción de material de tipo audiovisual. La empresa estará ubicada en la ciudad de Cali y tiene como mercado objetivo atender los diferentes tipos de empresas de la ciudad, entre las cuales se encuentran las pequeñas, medianas y grandes empresas.
Neural Entrainment to Speech Modulates Speech Intelligibility

NARCIS (Netherlands)

Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

2018-01-01

Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and
Effects of Early Bilingual Experience with a Tone and a Non-Tone Language on Speech-Music Integration.

Directory of Open Access Journals (Sweden)

Salomi S Asaridou

Full Text Available We investigated music and language processing in a group of early bilinguals who spoke a tone language and a non-tone language (Cantonese and Dutch. We assessed online speech-music processing interactions, that is, interactions that occur when speech and music are processed simultaneously in songs, with a speeded classification task. In this task, participants judged sung pseudowords either musically (based on the direction of the musical interval or phonologically (based on the identity of the sung vowel. We also assessed longer-term effects of linguistic experience on musical ability, that is, the influence of extensive prior experience with language when processing music. These effects were assessed with a task in which participants had to learn to identify musical intervals and with four pitch-perception tasks. Our hypothesis was that due to their experience in two different languages using lexical versus intonational tone, the early Cantonese-Dutch bilinguals would outperform the Dutch control participants. In online processing, the Cantonese-Dutch bilinguals processed speech and music more holistically than controls. This effect seems to be driven by experience with a tone language, in which integration of segmental and pitch information is fundamental. Regarding longer-term effects of linguistic experience, we found no evidence for a bilingual advantage in either the music-interval learning task or the pitch-perception tasks. Together, these results suggest that being a Cantonese-Dutch bilingual does not have any measurable longer-term effects on pitch and music processing, but does have consequences for how speech and music are processed jointly.
Effect of Audiovisual Treatment Information on Relieving Anxiety in Patients Undergoing Impacted Mandibular Third Molar Removal.

Science.gov (United States)

Choi, Sung-Hwan; Won, Ji-Hoon; Cha, Jung-Yul; Hwang, Chung-Ju

2015-11-01

The authors hypothesized that an audiovisual slide presentation that provided treatment information regarding the removal of an impacted mandibular third molar could improve patient knowledge of postoperative complications and decrease anxiety in young adults before and after surgery. A group that received an audiovisual description was compared with a group that received the conventional written description of the procedure. This randomized clinical trial included young adult patients who required surgical removal of an impacted mandibular third molar and fulfilled the predetermined criteria. The predictor variable was the presentation of an audiovisual slideshow. The audiovisual informed group provided informed consent after viewing an audiovisual slideshow. The control group provided informed consent after reading a written description of the procedure. The outcome variables were the State-Trait Anxiety Inventory, the Dental Anxiety Scale, a self-reported anxiety questionnaire, completed immediately before and 1 week after surgery, and a postoperative questionnaire about the level of understanding of potential postoperative complications. The data were analyzed with χ(2) tests, independent t tests, Mann-Whitney U tests, and Spearman rank correlation coefficients. Fifty-one patients fulfilled the inclusion criteria. The audiovisual informed group was comprised of 20 men and 5 women; the written informed group was comprised of 21 men and 5 women. The audiovisual informed group remembered significantly more information than the control group about a potential allergic reaction to local anesthesia or medication and potential trismus (P audiovisual informed group had lower self-reported anxiety scores than the control group 1 week after surgery (P audiovisual slide presentation could improve patient knowledge about postoperative complications and aid in alleviating anxiety after the surgical removal of an impacted mandibular third molar. Copyright © 2015
Academic e-learning experience in the enhancement of open access audiovisual and media education

OpenAIRE

Pacholak, Anna; Sidor, Dorota

2015-01-01

The paper presents how the academic e-learning experience and didactic methods of the Centre for Open and Multimedia Education (COME UW), University of Warsaw, enhance the open access to audiovisual and media education at various levels of education. The project is implemented within the Audiovisual and Media Education Programme (PEAM). It is funded by the Polish Film Institute (PISF). The aim of the project is to create a proposal of a comprehensive and open programme for the audiovisual (me...
Benefits of Music Training for Perception of Emotional Speech Prosody in Deaf Children With Cochlear Implants.

Science.gov (United States)

Good, Arla; Gordon, Karen A; Papsin, Blake C; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A

Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation.
Interaction and Representational Integration: Evidence from Speech Errors

Science.gov (United States)

Goldrick, Matthew; Baker, H. Ross; Murphy, Amanda; Baese-Berk, Melissa

2011-01-01

We examine the mechanisms that support interaction between lexical, phonological and phonetic processes during language production. Studies of the phonetics of speech errors have provided evidence that partially activated lexical and phonological representations influence phonetic processing. We examine how these interactive effects are modulated…
Modern Tools in Patient-Centred Speech Therapy for Romanian Language

Directory of Open Access Journals (Sweden)

Mirela Danubianu

2016-03-01

Full Text Available The most common way to communicate with those around us is speech. Suffering from a speech disorder can have negative social effects: from leaving the individuals with low confidence and moral to problems with social interaction and the ability to live independently like adults. The speech therapy intervention is a complex process having particular objectives such as: discovery and identification of speech disorder and directing the therapy to correction, recovery, compensation, adaptation and social integration of patients. Computer-based Speech Therapy systems are a real help for therapists by creating a special learning environment. The Romanian language is a phonetic one, with special linguistic particularities. This paper aims to present a few computer-based speech therapy systems developed for the treatment of various speech disorders specific to Romanian language.
Threats and opportunities for new audiovisual cultural heritage archive services: the Dutch case

NARCIS (Netherlands)

Ongena, G.; Huizer, E.; van de Wijngaert, Lidwien

2012-01-01

Purpose The purpose of this paper is to analyze the business-to-consumer market for digital audiovisual archiving services. In doing so we identify drivers, threats, and opportunities for new services based on audiovisual archives in the cultural heritage domain. By analyzing the market we provide
Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

Directory of Open Access Journals (Sweden)

Hiroshi Saruwatari

2007-01-01

Full Text Available We present the use of stethoscope and silicon NAM (nonaudible murmur microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible speech, but also very quietly uttered speech (nonaudible murmur. As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc. for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a 93.9% word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
Vicarious audiovisual learning in perfusion education.

Science.gov (United States)

Rath, Thomas E; Holt, David W

2010-12-01

Perfusion technology is a mechanical and visual science traditionally taught with didactic instruction combined with clinical experience. It is difficult to provide perfusion students the opportunity to experience difficult clinical situations, set up complex perfusion equipment, or observe corrective measures taken during catastrophic events because of patient safety concerns. Although high fidelity simulators offer exciting opportunities for future perfusion training, we explore the use of a less costly low fidelity form of simulation instruction, vicarious audiovisual learning. Two low fidelity modes of instruction; description with text and a vicarious, first person audiovisual production depicting the same content were compared. Students (n = 37) sampled from five North American perfusion schools were prospectively randomized to one of two online learning modules, text or video.These modules described the setup and operation of the MAQUET ROTAFLOW stand-alone centrifugal console and pump. Using a 10 question multiple-choice test, students were assessed immediately after viewing the module (test #1) and then again 2 weeks later (test #2) to determine cognition and recall of the module content. In addition, students completed a questionnaire assessing the learning preferences of today's perfusion student. Mean test scores from test #1 for video learners (n = 18) were significantly higher (88.89%) than for text learners (n = 19) (74.74%), (p audiovisual learning modules may be an efficacious, low cost means of delivering perfusion training on subjects such as equipment setup and operation. Video learning appears to improve cognition and retention of learned content and may play an important role in how we teach perfusion in the future, as simulation technology becomes more prevalent.
Lousa Digital Interativa: avaliação da interação didática e proposta de aplicação de narrativa audiovisual / Interactive White Board – IWB: assessment in interaction didactic and audiovisual narrative proposal

Directory of Open Access Journals (Sweden)

Francisco García García

2011-04-01

Full Text Available O uso de audiovisual em sala de aula não garante uma eficácia na aprendizagem, mas para os estudantes é um elemento interessante e ainda atrativo. Este trabalho — uma aproximação de duas pesquisas: a primeira apresenta a importância da interação didática com a LDI e a segunda, uma lista de elementos de narrativa audiovisual que podem ser aplicados em sala de aula — propõe o domínio de elementos da narrativa audiovisual como uma possibilidade teórica para o professor que quer produzir um conteúdo audiovisual para aplicar em plataformas digitais, como é o caso da Lousa Digital Interativa - LDI. O texto está divido em três partes: a primeira apresenta os conceitos teóricos das duas pesquisas, a segunda discute os resultados de ambas e, por fim, a terceira parte propõe uma prática pedagógica de interação didática com elementos de narrativa audiovisual para uso em LDI. AbstractThe audiovisual use in classroom does not guarantee effectiveness in learning, but for students is an interesting element and still attractive. This work suggests that the field of audiovisual elements of the narrative is a theoretical possibility for the teacher who wants to produce an audiovisual content to apply to digital platforms, such as the Interactive Digital Whiteboard - LDI. This work is an approximation of two doctoral theses, the first that shows the importance of interaction with the didactic and the second LDI provides a list of audiovisual narrative elements that can be applied in the classroom. This work is divided into three parts, the first part presents the theoretical concepts of the two surveys, the second part discusses the results of two surveys and finally the third part, proposes a practical pedagogical didactic interaction with audiovisual narrative elements to use in LDI.
Processing changes when listening to foreign-accented speech

Directory of Open Access Journals (Sweden)

Carlos eRomero-Rivas

2015-03-01

Full Text Available This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker.
Sustainable models of audiovisual commons

Directory of Open Access Journals (Sweden)

Mayo Fuster Morell

2013-03-01

Full Text Available This paper addresses an emerging phenomenon characterized by continuous change and experimentation: the collaborative commons creation of audiovisual content online. The analysis wants to focus on models of sustainability of collaborative online creation, paying particular attention to the use of different forms of advertising. This article is an excerpt of a larger investigation, which unit of analysis are cases of Online Creation Communities that take as their central node of activity the Catalan territory. From 22 selected cases, the methodology combines quantitative analysis, through a questionnaire delivered to all cases, and qualitative analysis through face interviews conducted in 8 cases studied. The research, which conclusions we summarize in this article,in this article, leads us to conclude that the sustainability of the project depends largely on relationships of trust and interdependence between different voluntary agents, the non-monetary contributions and retributions as well as resources and infrastructure of free use. All together leads us to understand that this is and will be a very important area for the future of audiovisual content and its sustainability, which will imply changes in the policies that govern them.
Multimedia content production inside the classroom. A teaching proposal for journalism and audiovisual communication students

Directory of Open Access Journals (Sweden)

Eva Herrero Curiel

2014-03-01

Full Text Available The main objective of this article is to present and describe two multimedia experiences carried out during two practice groups in the Journalism and Audiovisual Communications program. Thirty students participated in Experience A during 14 teaching sessions, and the experience required each student to record a 3-minute interview of someone newsworthy within academia and, then, create a short documentary piece of up to 5 minutes. Experience B focused on content curation using Storify, and the ultimate goal of the practice exercise was to produce a story from different multimedia contents found within the platform. A SWOT analysis after integrating both experiences revealed that, although students were willing and motivated to use new technologies and produce audiovisual content, they also showed low motivation to work in groups, scant prior knowledge of the medium, and a lack of adaptation to complex situations. As such, the researchers conclude this type of experience can be valuable as the convergence of content and skills in audiovisual and journalistic settings responds to the courses’ demands and facilitates their adaptation to the EHEA requirements. Producción de contenido multimedia en el aula. Una propuesta docente para alumnos de periodismo y comunicación audiovisual Resumen El principal objetivo de este estudio de caso es presentar y describir dos experiencias multimedia llevadas a cabo en dos grupos prácticos de los grados de periodismo y comunicación audiovisual. En la Experiencia A participaron 30 estudiantes durante 14 sesiones docentes, y consistió en la grabación de una entrevista de 3 minutos a un personaje noticioso en el ámbito académico y, después, debían construir en grupo una pequeña pieza documental de, como máximo, 5 minutos de duración. La Experiencia B se centró en curación de contenidos utilizando Storify, en la que los alumnos construyeron una noticia a partir de diferentes contenidos multimedia
Longevity and Depreciation of Audiovisual Equipment.

Science.gov (United States)

Post, Richard

1987-01-01

Describes results of survey of media service directors at public universities in Ohio to determine the expected longevity of audiovisual equipment. Use of the Delphi technique for estimates is explained, results are compared with an earlier survey done in 1977, and use of spreadsheet software to calculate depreciation is discussed. (LRW)
Multisensory integration of emotional faces and voices in schizophrenics

NARCIS (Netherlands)

Gelder, B. de; Vroomen, J.H.M.; Jong, S.J. de; Masthoff, E.D.M.; Trompenaars, F.J.; Hodiamont, P.P.G.

2005-01-01

their natural environment, organisms receive information through multiple sensory channels and these inputs from different sensory systems are routinely combined into integrated percepts. Previously, we reported that in a population of schizophrenics, deficits in audiovisual integration were

Some links on this page may take you to non-federal websites. Their policies may differ from this site.