WorldWideScience

Sample records for integrating audiovisual speech

  1. Speech cues contribute to audiovisual spatial integration.

    Directory of Open Access Journals (Sweden)

    Christopher W Bishop

    Full Text Available Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.

  2. Audiovisual integration in speech perception: a multi-stage process

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    investigate whether the integration of auditory and visual speech observed in these two audiovisual integration effects are specific traits of speech perception. We further ask whether audiovisual integration is undertaken in a single processing stage or multiple processing stages....

  3. Speech-specificity of two audiovisual integration effects

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2010-01-01

    Seeing the talker’s articulatory mouth movements can influence the auditory speech percept both in speech identification and detection tasks. Here we show that these audiovisual integration effects also occur for sine wave speech (SWS), which is an impoverished speech signal that naïve observers...... often fail to perceive as speech. While audiovisual integration in the identification task only occurred when observers were informed of the speech-like nature of SWS, integration occurred in the detection task both for informed and naïve observers. This shows that both speech-specific and general...... mechanisms underlie audiovisual integration of speech....

  4. Multistage audiovisual integration of speech: dissociating identification and detection.

    Science.gov (United States)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias S

    2011-02-01

    Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech signal. Here, we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers, the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multistage account of audiovisual integration of speech in which the many attributes of the audiovisual speech signal are integrated by separate integration processes.

  5. Multistage audiovisual integration of speech: dissociating identification and detection

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Tuomainen, Jyrki; Andersen, Tobias

    2011-01-01

    Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech...... signal. Here we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers...... informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multi-stage account of audiovisual integration of speech in which the many attributes...

  6. Electrophysiological evidence for speech-specific audiovisual integration.

    Science.gov (United States)

    Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

    2014-01-01

    Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. © 2013 Published by Elsevier Ltd.

  7. Electrophysiological evidence for speech-specific audiovisual integration

    NARCIS (Netherlands)

    Baart, M.; Stekelenburg, J.J.; Vroomen, J.

    2014-01-01

    Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were

  8. Electrophysiological assessment of audiovisual integration in speech perception

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Dau, Torsten

    Speech perception integrates signal from ear and eye. This is witnessed by a wide range of audiovisual integration effects, such as ventriloquism and the McGurk illusion. Some behavioral evidence suggest that audiovisual integration of specific aspects is special for speech perception. However, our...... knowledge of such bimodal integration would be strengthened if the phenomena could be investigated by objective, neutrally based methods. One key question of the present work is if perceptual processing of audiovisual speech can be gauged with a specific signature of neurophysiological activity...... on the auditory speech percept? In two experiments, which both combine behavioral and neurophysiological measures, an uncovering of the relation between perception of faces and of audiovisual integration is attempted. Behavioral findings suggest a strong effect of face perception, whereas the MMN results are less...

  9. Audiovisual integration of speech in a patient with Broca's Aphasia

    Science.gov (United States)

    Andersen, Tobias S.; Starrfelt, Randi

    2015-01-01

    Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca's aphasia. PMID:25972819

  10. Modeling the Development of Audiovisual Cue Integration in Speech Perception.

    Science.gov (United States)

    Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

    2017-03-21

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

  11. Audiovisual integration of speech falters under high attention demands.

    Science.gov (United States)

    Alsius, Agnès; Navarra, Jordi; Campbell, Ruth; Soto-Faraco, Salvador

    2005-05-10

    One of the most commonly cited examples of human multisensory integration occurs during exposure to natural speech, when the vocal and the visual aspects of the signal are integrated in a unitary percept. Audiovisual association of facial gestures and vocal sounds has been demonstrated in nonhuman primates and in prelinguistic children, arguing for a general basis for this capacity. One critical question, however, concerns the role of attention in such multisensory integration. Although both behavioral and neurophysiological studies have converged on a preattentive conceptualization of audiovisual speech integration, this mechanism has rarely been measured under conditions of high attentional load, when the observers' attention resources are depleted. We tested the extent to which audiovisual integration was modulated by the amount of available attentional resources by measuring the observers' susceptibility to the classic McGurk illusion in a dual-task paradigm. The proportion of visually influenced responses was severely, and selectively, reduced if participants were concurrently performing an unrelated visual or auditory task. In contrast with the assumption that crossmodal speech integration is automatic, our results suggest that these multisensory binding processes are subject to attentional demands.

  12. Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech and music

    Directory of Open Access Journals (Sweden)

    Hwee Ling eLee

    2014-08-01

    Full Text Available This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogues of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms. Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past three years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.

  13. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    DEFF Research Database (Denmark)

    Andersen, Tobias; Starrfelt, Randi

    2015-01-01

    's area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical......, which could be due to Broca's area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke's aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing...

  14. Atypical audiovisual speech integration in infants at risk for autism.

    Directory of Open Access Journals (Sweden)

    Jeanne A Guiraud

    Full Text Available The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/- audio/ba/and the congruent visual/ba/- audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/- audio/ga/display compared with the congruent visual/ga/- audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16 = 17.153, p = 0.001. The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25 = 0.09, p = 0.767, in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41 = 4.466, p = 0.041. In some cases this reduced ability might lead to the poor communication skills characteristic of autism.

  15. Audiovisual integration for speech during mid-childhood: Electrophysiological evidence

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer

    2014-01-01

    Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7–8-year-olds and 10–11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. PMID:25463815

  16. Neural dynamics of audiovisual speech integration under variable listening conditions: an individual participant analysis.

    Science.gov (United States)

    Altieri, Nicholas; Wenger, Michael J

    2013-01-01

    Speech perception engages both auditory and visual modalities. Limitations of traditional accuracy-only approaches in the investigation of audiovisual speech perception have motivated the use of new methodologies. In an audiovisual speech identification task, we utilized capacity (Townsend and Nozawa, 1995), a dynamic measure of efficiency, to quantify audiovisual integration. Capacity was used to compare RT distributions from audiovisual trials to RT distributions from auditory-only and visual-only trials across three listening conditions: clear auditory signal, S/N ratio of -12 dB, and S/N ratio of -18 dB. The purpose was to obtain EEG recordings in conjunction with capacity to investigate how a late ERP co-varies with integration efficiency. Results showed efficient audiovisual integration for low auditory S/N ratios, but inefficient audiovisual integration when the auditory signal was clear. The ERP analyses showed evidence for greater audiovisual amplitude compared to the unisensory signals for lower auditory S/N ratios (higher capacity/efficiency) compared to the high S/N ratio (low capacity/inefficient integration). The data are consistent with an interactive framework of integration, where auditory recognition is influenced by speech-reading as a function of signal clarity.

  17. Audiovisual integration in children listening to spectrally degraded speech.

    Science.gov (United States)

    Maidment, David W; Kang, Hi Jee; Stewart, Hannah J; Amitay, Sygal

    2015-02-01

    The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Children (n=69) and adults (n=15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in auditory-only or audiovisual conditions. The number of bands was adaptively varied to modulate the degradation of the auditory signal, with the number of bands required for approximately 79% correct identification calculated as the threshold. The youngest children (4- to 5-year-olds) did not benefit from accompanying visual information, in comparison to 6- to 11-year-old children and adults. Audiovisual gain also increased with age in the child sample. The current data suggest that children younger than 6 years of age do not fully utilize visual speech cues to enhance speech perception when the auditory signal is degraded. This evidence not only has implications for understanding the development of speech perception skills in children with normal hearing but may also inform the development of new treatment and intervention strategies that aim to remediate speech perception difficulties in pediatric cochlear implant users.

  18. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    DEFF Research Database (Denmark)

    Andersen, Tobias; Starrfelt, Randi

    2015-01-01

    perception. While these studies have focused on auditory speech perception other studies have shown that Broca's area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca's aphasia did not experience the McGurk illusion suggesting that an intact Broca......Lesions to Broca's area cause aphasia characterized by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca's area is also involved in speech......'s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca's aphasia who experienced the McGurk illusion. This indicates that an intact Broca's area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical...

  19. Audiovisual Integration of Speech in a Patient with Broca’s Aphasia

    Directory of Open Access Journals (Sweden)

    Tobias Søren Andersen

    2015-04-01

    Full Text Available Lesions to Broca’s area cause aphasia characterised by a severe impairment of the ability to speak, with comparatively intact speech perception. However, some studies have found effects on speech perception under adverse listening conditions, indicating that Broca’s area is also involved in speech perception. While these studies have focused on auditory speech perception other studies have shown that Broca’s area is activated by visual speech perception. Furthermore, one preliminary report found that a patient with Broca’s aphasia did not experience the McGurk illusion suggesting that an intact Broca’s area is necessary for audiovisual integration of speech. Here we describe a patient with Broca’s aphasia who experienced the McGurk illusion. This indicates that an intact Broca’s area is not necessary for audiovisual integration of speech. The McGurk illusions this patient experienced were atypical, which could be due to Broca’s area having a more subtle role in audiovisual integration of speech. The McGurk illusions of a control subject with Wernicke’s aphasia were, however, also atypical. This indicates that the atypical McGurk illusions were due to deficits in speech processing that are not specific to Broca’s aphasia.

  20. The level of audiovisual print-speech integration deficits in dyslexia.

    Science.gov (United States)

    Kronschnabel, Jens; Brem, Silvia; Maurer, Urs; Brandeis, Daniel

    2014-09-01

    The classical phonological deficit account of dyslexia is increasingly linked to impairments in grapho-phonological conversion, and to dysfunctions in superior temporal regions associated with audiovisual integration. The present study investigates mechanisms of audiovisual integration in typical and impaired readers at the critical developmental stage of adolescence. Congruent and incongruent audiovisual as well as unimodal (visual only and auditory only) material was presented. Audiovisual presentations were single letters and three-letter (consonant-vowel-consonant) stimuli accompanied by matching or mismatching speech sounds. Three-letter stimuli exhibited fast phonetic transitions as in real-life language processing and reading. Congruency effects, i.e. different brain responses to congruent and incongruent stimuli were taken as an indicator of audiovisual integration at a phonetic level (grapho-phonological conversion). Comparisons of unimodal and audiovisual stimuli revealed basic, more sensory aspects of audiovisual integration. By means of these two criteria of audiovisual integration, the generalizability of audiovisual deficits in dyslexia was tested. Moreover, it was expected that the more naturalistic three-letter stimuli are superior to single letters in revealing group differences. Electrophysiological and hemodynamic (EEG and fMRI) data were acquired simultaneously in a simple target detection task. Applying the same statistical models to event-related EEG potentials and fMRI responses allowed comparing the effects detected by the two techniques at a descriptive level. Group differences in congruency effects (congruent against incongruent) were observed in regions involved in grapho-phonological processing, including the left inferior frontal and angular gyri and the inferotemporal cortex. Importantly, such differences also emerged in superior temporal key regions. Three-letter stimuli revealed stronger group differences than single letters. No

  1. Electrophysiological evidence for a self-processing advantage during audiovisual speech integration.

    Science.gov (United States)

    Treille, Avril; Vilain, Coriandre; Kandel, Sonia; Sato, Marc

    2017-09-01

    Previous electrophysiological studies have provided strong evidence for early multisensory integrative mechanisms during audiovisual speech perception. From these studies, one unanswered issue is whether hearing our own voice and seeing our own articulatory gestures facilitate speech perception, possibly through a better processing and integration of sensory inputs with our own sensory-motor knowledge. The present EEG study examined the impact of self-knowledge during the perception of auditory (A), visual (V) and audiovisual (AV) speech stimuli that were previously recorded from the participant or from a speaker he/she had never met. Audiovisual interactions were estimated by comparing N1 and P2 auditory evoked potentials during the bimodal condition (AV) with the sum of those observed in the unimodal conditions (A + V). In line with previous EEG studies, our results revealed an amplitude decrease of P2 auditory evoked potentials in AV compared to A + V conditions. Crucially, a temporal facilitation of N1 responses was observed during the visual perception of self speech movements compared to those of another speaker. This facilitation was negatively correlated with the saliency of visual stimuli. These results provide evidence for a temporal facilitation of the integration of auditory and visual speech signals when the visual situation involves our own speech gestures.

  2. Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

    Science.gov (United States)

    Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

    2018-06-01

    Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. An ALE meta-analysis on the audiovisual integration of speech signals.

    Science.gov (United States)

    Erickson, Laura C; Heeg, Elizabeth; Rauschecker, Josef P; Turkeltaub, Peter E

    2014-11-01

    The brain improves speech processing through the integration of audiovisual (AV) signals. Situations involving AV speech integration may be crudely dichotomized into those where auditory and visual inputs contain (1) equivalent, complementary signals (validating AV speech) or (2) inconsistent, different signals (conflicting AV speech). This simple framework may allow the systematic examination of broad commonalities and differences between AV neural processes engaged by various experimental paradigms frequently used to study AV speech integration. We conducted an activation likelihood estimation metaanalysis of 22 functional imaging studies comprising 33 experiments, 311 subjects, and 347 foci examining "conflicting" versus "validating" AV speech. Experimental paradigms included content congruency, timing synchrony, and perceptual measures, such as the McGurk effect or synchrony judgments, across AV speech stimulus types (sublexical to sentence). Colocalization of conflicting AV speech experiments revealed consistency across at least two contrast types (e.g., synchrony and congruency) in a network of dorsal stream regions in the frontal, parietal, and temporal lobes. There was consistency across all contrast types (synchrony, congruency, and percept) in the bilateral posterior superior/middle temporal cortex. Although fewer studies were available, validating AV speech experiments were localized to other regions, such as ventral stream visual areas in the occipital and inferior temporal cortex. These results suggest that while equivalent, complementary AV speech signals may evoke activity in regions related to the corroboration of sensory input, conflicting AV speech signals recruit widespread dorsal stream areas likely involved in the resolution of conflicting sensory signals. Copyright © 2014 Wiley Periodicals, Inc.

  4. The early maximum likelihood estimation model of audiovisual integration in speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2015-01-01

    integration to speech perception along with three model variations. In early MLE, integration is based on a continuous internal representation before categorization, which can make the model more parsimonious by imposing constraints that reflect experimental designs. The study also shows that cross......Speech perception is facilitated by seeing the articulatory mouth movements of the talker. This is due to perceptual audiovisual integration, which also causes the McGurk−MacDonald illusion, and for which a comprehensive computational account is still lacking. Decades of research have largely......-validation can evaluate models of audiovisual integration based on typical data sets taking both goodness-of-fit and model flexibility into account. All models were tested on a published data set previously used for testing the FLMP. Cross-validation favored the early MLE while more conventional error measures...

  5. Audiovisual speech integration in the superior temporal region is dysfunctional in dyslexia.

    Science.gov (United States)

    Ye, Zheng; Rüsseler, Jascha; Gerth, Ivonne; Münte, Thomas F

    2017-07-25

    Dyslexia is an impairment of reading and spelling that affects both children and adults even after many years of schooling. Dyslexic readers have deficits in the integration of auditory and visual inputs but the neural mechanisms of the deficits are still unclear. This fMRI study examined the neural processing of auditorily presented German numbers 0-9 and videos of lip movements of a German native speaker voicing numbers 0-9 in unimodal (auditory or visual) and bimodal (always congruent) conditions in dyslexic readers and their matched fluent readers. We confirmed results of previous studies that the superior temporal gyrus/sulcus plays a critical role in audiovisual speech integration: fluent readers showed greater superior temporal activations for combined audiovisual stimuli than auditory-/visual-only stimuli. Importantly, such an enhancement effect was absent in dyslexic readers. Moreover, the auditory network (bilateral superior temporal regions plus medial PFC) was dynamically modulated during audiovisual integration in fluent, but not in dyslexic readers. These results suggest that superior temporal dysfunction may underly poor audiovisual speech integration in readers with dyslexia. Copyright © 2017 IBRO. Published by Elsevier Ltd. All rights reserved.

  6. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.

  7. Audiovisual Asynchrony Detection in Human Speech

    Science.gov (United States)

    Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

    2011-01-01

    Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

  8. Audiovisual Integration in Children Listening to Spectrally Degraded Speech

    Science.gov (United States)

    Maidment, David W.; Kang, Hi Jee; Stewart, Hannah J.; Amitay, Sygal

    2015-01-01

    Purpose: The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Method: Children (n = 69) and adults (n = 15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in…

  9. Audiovisual Discrimination between Laughter and Speech

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audiovisual approach to distinguishing laughter from speech and we show that integrating the information from audio and video leads to an improved reliability of audiovisual approach in

  10. Some Behavioral and Neurobiological Constraints on Theories of Audiovisual Speech Integration: A Review and Suggestions for New Directions

    Science.gov (United States)

    Altieri, Nicholas; Pisoni, David B.; Townsend, James T.

    2012-01-01

    Summerfield (1987) proposed several accounts of audiovisual speech perception, a field of research that has burgeoned in recent years. The proposed accounts included the integration of discrete phonetic features, vectors describing the values of independent acoustical and optical parameters, the filter function of the vocal tract, and articulatory dynamics of the vocal tract. The latter two accounts assume that the representations of audiovisual speech perception are based on abstract gestures, while the former two assume that the representations consist of symbolic or featural information obtained from visual and auditory modalities. Recent converging evidence from several different disciplines reveals that the general framework of Summerfield’s feature-based theories should be expanded. An updated framework building upon the feature-based theories is presented. We propose a processing model arguing that auditory and visual brain circuits provide facilitatory information when the inputs are correctly timed, and that auditory and visual speech representations do not necessarily undergo translation into a common code during information processing. Future research on multisensory processing in speech perception should investigate the connections between auditory and visual brain regions, and utilize dynamic modeling tools to further understand the timing and information processing mechanisms involved in audiovisual speech integration. PMID:21968081

  11. Ordinal models of audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias

    2011-01-01

    Audiovisual information is integrated in speech perception. One manifestation of this is the McGurk illusion in which watching the articulating face alters the auditory phonetic percept. Understanding this phenomenon fully requires a computational model with predictive power. Here, we describe...

  12. Specialization in audiovisual speech perception: a replication study

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by bimodal integration in the McGurk effect. This integration effect may be specific to speech or be applied to all stimuli in general. To investigate this, Tuomainen et al. (2005) used sine-wave speech, which naïve observers may perceive as non......-speech, but hear as speech once informed of the linguistic origin of the signal. Combinations of sine-wave speech and incongruent video of the talker elicited a McGurk effect only for informed observers. This indicates that the audiovisual integration effect is specific to speech perception. However, observers...... that observers did look near the mouth. We conclude that eye-movements did not influence the results of Tuomainen et al. and that their results thus can be taken as evidence of a speech specific mode of audiovisual integration underlying the McGurk illusion....

  13. Prediction and constraint in audiovisual speech perception.

    Science.gov (United States)

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  14. Prediction and constraint in audiovisual speech perception

    Science.gov (United States)

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  15. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study.

    Science.gov (United States)

    Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our

  16. The role of visual spatial attention in audiovisual speech perception

    DEFF Research Database (Denmark)

    Andersen, Tobias; Tiippana, K.; Laarni, J.

    2009-01-01

    Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive b......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre...... from each of the faces and from the voice on the auditory speech percept. We found that directing visual spatial attention towards a face increased the influence of that face on auditory perception. However, the influence of the voice on auditory perception did not change suggesting that audiovisual...... integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration...

  17. Audiovisual speech facilitates voice learning.

    Science.gov (United States)

    Sheffert, Sonya M; Olson, Elizabeth

    2004-02-01

    In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.

  18. Speech-specific audiovisual perception affects identification but not detection of speech

    DEFF Research Database (Denmark)

    Eskelund, Kasper; Andersen, Tobias

    Speech perception is audiovisual as evidenced by the McGurk effect in which watching incongruent articulatory mouth movements can change the phonetic auditory speech percept. This type of audiovisual integration may be specific to speech or be applied to all stimuli in general. To investigate...... of audiovisual integration specific to speech perception. However, the results of Tuomainen et al. might have been influenced by another effect. When observers were naïve, they had little motivation to look at the face. When informed, they knew that the face was relevant for the task and this could increase...... visual detection task. In our first experiment, observers presented with congruent and incongruent audiovisual sine-wave speech stimuli did only show a McGurk effect when informed of the speech nature of the stimulus. Performance on the secondary visual task was very good, thus supporting the finding...

  19. Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

    Science.gov (United States)

    Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

    2011-01-01

    Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…

  20. Audiovisual Speech Synchrony Measure: Application to Biometrics

    Directory of Open Access Journals (Sweden)

    Gérard Chollet

    2007-01-01

    Full Text Available Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, transformations performed on audio, visual, or joint audiovisual feature spaces, and the actual measure of correspondence between audio and visual speech. Finally, the use of synchrony measure for biometric identity verification based on talking faces is experimented on the BANCA database.

  1. Influences of selective adaptation on perception of audiovisual speech

    Science.gov (United States)

    Dias, James W.; Cook, Theresa C.; Rosenblum, Lawrence D.

    2016-01-01

    Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers “heard” the audio-visual stimulus as an integrated “va” percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual “va” percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. PMID:27041781

  2. Causal inference of asynchronous audiovisual speech

    Directory of Open Access Journals (Sweden)

    John F Magnotti

    2013-11-01

    Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.

  3. Lip movements affect infants' audiovisual speech perception.

    Science.gov (United States)

    Yeung, H Henny; Werker, Janet F

    2013-05-01

    Speech is robustly audiovisual from early in infancy. Here we show that audiovisual speech perception in 4.5-month-old infants is influenced by sensorimotor information related to the lip movements they make while chewing or sucking. Experiment 1 consisted of a classic audiovisual matching procedure, in which two simultaneously displayed talking faces (visual [i] and [u]) were presented with a synchronous vowel sound (audio /i/ or /u/). Infants' looking patterns were selectively biased away from the audiovisual matching face when the infants were producing lip movements similar to those needed to produce the heard vowel. Infants' looking patterns returned to those of a baseline condition (no lip movements, looking longer at the audiovisual matching face) when they were producing lip movements that did not match the heard vowel. Experiment 2 confirmed that these sensorimotor effects interacted with the heard vowel, as looking patterns differed when infants produced these same lip movements while seeing and hearing a talking face producing an unrelated vowel (audio /a/). These findings suggest that the development of speech perception and speech production may be mutually informative.

  4. Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

    Directory of Open Access Journals (Sweden)

    Elena V Kushnerenko

    2013-07-01

    Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.

  5. Psychophysics of the McGurk and Other Audiovisual Speech Integration Effects

    Science.gov (United States)

    Jiang, Jintao; Bernstein, Lynne E.

    2011-01-01

    When the auditory and visual components of spoken audiovisual nonsense syllables are mismatched, perceivers produce four different types of perceptual responses, auditory correct, visual correct, fusion (the so-called McGurk effect), and combination (i.e., two consonants are reported). Here, quantitative measures were developed to account for the distribution of types of perceptual responses to 384 different stimuli from four talkers. The measures included mutual information, the presented acoustic signal versus the acoustic signal recorded with the presented video, and the correlation between the presented acoustic and video stimuli. In Experiment 1, open-set perceptual responses were obtained for acoustic /bA/ or /lA/ dubbed to video /bA, dA, gA, vA, zA, lA, wA, ΔA/. The talker, the video syllable, and the acoustic syllable significantly influenced the type of response. In Experiment 2, the best predictors of response category proportions were a subset of the physical stimulus measures, with the variance accounted for in the perceptual response category proportions between 17% and 52%. That audiovisual stimulus relationships can account for response distributions supports the possibility that internal representations are based on modality-specific stimulus relationships. PMID:21574741

  6. Rapid, generalized adaptation to asynchronous audiovisual speech.

    Science.gov (United States)

    Van der Burg, Erik; Goodbourn, Patrick T

    2015-04-07

    The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity. © 2015 The Author(s) Published by the Royal Society. All rights reserved.

  7. Audiovisual Integration in High Functioning Adults with Autism

    Science.gov (United States)

    Keane, Brian P.; Rosenthal, Orna; Chun, Nicole H.; Shams, Ladan

    2010-01-01

    Autism involves various perceptual benefits and deficits, but it is unclear if the disorder also involves anomalous audiovisual integration. To address this issue, we compared the performance of high-functioning adults with autism and matched controls on experiments investigating the audiovisual integration of speech, spatiotemporal relations, and…

  8. Validating a Method to Assess Lipreading, Audiovisual Gain, and Integration During Speech Reception With Cochlear-Implanted and Normal-Hearing Subjects Using a Talking Head.

    Science.gov (United States)

    Schreitmüller, Stefan; Frenken, Miriam; Bentz, Lüder; Ortmann, Magdalene; Walger, Martin; Meister, Hartmut

    Watching a talker's mouth is beneficial for speech reception (SR) in many communication settings, especially in noise and when hearing is impaired. Measures for audiovisual (AV) SR can be valuable in the framework of diagnosing or treating hearing disorders. This study addresses the lack of standardized methods in many languages for assessing lipreading, AV gain, and integration. A new method is validated that supplements a German speech audiometric test with visualizations of the synthetic articulation of an avatar that was used, for it is feasible to lip-sync auditory speech in a highly standardized way. Three hypotheses were formed according to the literature on AV SR that used live or filmed talkers. It was tested whether respective effects could be reproduced with synthetic articulation: (1) cochlear implant (CI) users have a higher visual-only SR than normal-hearing (NH) individuals, and younger individuals obtain higher lipreading scores than older persons. (2) Both CI and NH gain from presenting AV over unimodal (auditory or visual) sentences in noise. (3) Both CI and NH listeners efficiently integrate complementary auditory and visual speech features. In a controlled, cross-sectional study with 14 experienced CI users (mean age 47.4) and 14 NH individuals (mean age 46.3, similar broad age distribution), lipreading, AV gain, and integration of a German matrix sentence test were assessed. Visual speech stimuli were synthesized by the articulation of the Talking Head system "MASSY" (Modular Audiovisual Speech Synthesizer), which displayed standardized articulation with respect to the visibility of German phones. In line with the hypotheses and previous literature, CI users had a higher mean visual-only SR than NH individuals (CI, 38%; NH, 12%; p < 0.001). Age was correlated with lipreading such that within each group, younger individuals obtained higher visual-only scores than older persons (rCI = -0.54; p = 0.046; rNH = -0.78; p < 0.001). Both CI and NH

  9. [Intermodal timing cues for audio-visual speech recognition].

    Science.gov (United States)

    Hashimoto, Masahiro; Kumashiro, Masaharu

    2004-06-01

    The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.

  10. Perception of Intersensory Synchrony in Audiovisual Speech: Not that Special

    Science.gov (United States)

    Vroomen, Jean; Stekelenburg, Jeroen J.

    2011-01-01

    Perception of intersensory temporal order is particularly difficult for (continuous) audiovisual speech, as perceivers may find it difficult to notice substantial timing differences between speech sounds and lip movements. Here we tested whether this occurs because audiovisual speech is strongly paired ("unity assumption"). Participants made…

  11. The natural statistics of audiovisual speech.

    Directory of Open Access Journals (Sweden)

    Chandramouli Chandrasekaran

    2009-07-01

    Full Text Available Humans, like other animals, are exposed to a continuous stream of signals, which are dynamic, multimodal, extended, and time varying in nature. This complex input space must be transduced and sampled by our sensory systems and transmitted to the brain where it can guide the selection of appropriate actions. To simplify this process, it's been suggested that the brain exploits statistical regularities in the stimulus space. Tests of this idea have largely been confined to unimodal signals and natural scenes. One important class of multisensory signals for which a quantitative input space characterization is unavailable is human speech. We do not understand what signals our brain has to actively piece together from an audiovisual speech stream to arrive at a percept versus what is already embedded in the signal structure of the stream itself. In essence, we do not have a clear understanding of the natural statistics of audiovisual speech. In the present study, we identified the following major statistical features of audiovisual speech. First, we observed robust correlations and close temporal correspondence between the area of the mouth opening and the acoustic envelope. Second, we found the strongest correlation between the area of the mouth opening and vocal tract resonances. Third, we observed that both area of the mouth opening and the voice envelope are temporally modulated in the 2-7 Hz frequency range. Finally, we show that the timing of mouth movements relative to the onset of the voice is consistently between 100 and 300 ms. We interpret these data in the context of recent neural theories of speech which suggest that speech communication is a reciprocally coupled, multisensory event, whereby the outputs of the signaler are matched to the neural processes of the receiver.

  12. Talker Variability in Audiovisual Speech Perception

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-07-01

    Full Text Available A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition. So far, this talker-variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target-word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  13. Neural correlates of audiovisual speech processing in a second language.

    Science.gov (United States)

    Barrós-Loscertales, Alfonso; Ventura-Campos, Noelia; Visser, Maya; Alsius, Agnès; Pallier, Christophe; Avila Rivera, César; Soto-Faraco, Salvador

    2013-09-01

    Neuroimaging studies of audiovisual speech processing have exclusively addressed listeners' native language (L1). Yet, several behavioural studies now show that AV processing plays an important role in non-native (L2) speech perception. The current fMRI study measured brain activity during auditory, visual, audiovisual congruent and audiovisual incongruent utterances in L1 and L2. BOLD responses to congruent AV speech in the pSTS were stronger than in either unimodal condition in both L1 and L2. Yet no differences in AV processing were expressed according to the language background in this area. Instead, the regions in the bilateral occipital lobe had a stronger congruency effect on the BOLD response (congruent higher than incongruent) in L2 as compared to L1. According to these results, language background differences are predominantly expressed in these unimodal regions, whereas the pSTS is similarly involved in AV integration regardless of language dominance. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception.

    Science.gov (United States)

    Baart, Martijn; Lindborg, Alma; Andersen, Tobias S

    2017-11-01

    Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech-induced suppression of P2 amplitude (which is generally taken as a measure of audiovisual integration) for fusions was similar to suppression obtained with fully congruent stimuli, whereas P2 suppression for combinations was larger. We argue that these effects arise because the phonetic incongruency is solved differently for both types of stimuli. © 2017 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

  15. Electrophysiological evidence for differences between fusion and combination illusions in audiovisual speech perception

    DEFF Research Database (Denmark)

    Baart, Martijn; Lindborg, Alma Cornelia; Andersen, Tobias S

    2017-01-01

    Incongruent audiovisual speech stimuli can lead to perceptual illusions such as fusions or combinations. Here, we investigated the underlying audiovisual integration process by measuring ERPs. We observed that visual speech-induced suppression of P2 amplitude (which is generally taken as a measure...... of audiovisual integration) for fusions was comparable to suppression obtained with fully congruent stimuli, whereas P2 suppression for combinations was larger. We argue that these effects arise because the phonetic incongruency is solved differently for both types of stimuli. This article is protected...

  16. Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech

    Science.gov (United States)

    Pilling, Michael; Thomas, Sharon

    2011-01-01

    Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…

  17. Audiovisual Temporal Recalibration for Speech in Synchrony Perception and Speech Identification

    Science.gov (United States)

    Asakawa, Kaori; Tanaka, Akihiro; Imai, Hisato

    We investigated whether audiovisual synchrony perception for speech could change after observation of the audiovisual temporal mismatch. Previous studies have revealed that audiovisual synchrony perception is re-calibrated after exposure to a constant timing difference between auditory and visual signals in non-speech. In the present study, we examined whether this audiovisual temporal recalibration occurs at the perceptual level even for speech (monosyllables). In Experiment 1, participants performed an audiovisual simultaneity judgment task (i.e., a direct measurement of the audiovisual synchrony perception) in terms of the speech signal after observation of the speech stimuli which had a constant audiovisual lag. The results showed that the “simultaneous” responses (i.e., proportion of responses for which participants judged the auditory and visual stimuli to be synchronous) at least partly depended on exposure lag. In Experiment 2, we adopted the McGurk identification task (i.e., an indirect measurement of the audiovisual synchrony perception) to exclude the possibility that this modulation of synchrony perception was solely attributable to the response strategy using stimuli identical to those of Experiment 1. The characteristics of the McGurk effect reported by participants depended on exposure lag. Thus, it was shown that audiovisual synchrony perception for speech could be modulated following exposure to constant lag both in direct and indirect measurement. Our results suggest that temporal recalibration occurs not only in non-speech signals but also in monosyllabic speech at the perceptual level.

  18. A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography.

    Science.gov (United States)

    Ozker, Muge; Schepers, Inga M; Magnotti, John F; Yoshor, Daniel; Beauchamp, Michael S

    2017-06-01

    Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.

  19. The contribution of dynamic visual cues to audiovisual speech perception.

    Science.gov (United States)

    Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

    2015-08-01

    Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Cortical Integration of Audio-Visual Information

    Science.gov (United States)

    Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.

    2013-01-01

    We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442

  1. Audiovisual speech perception development at varying levels of perceptual processing

    OpenAIRE

    Lalonde, Kaylah; Holt, Rachael Frush

    2016-01-01

    This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the le...

  2. Infants' preference for native audiovisual speech dissociated from congruency preference.

    Directory of Open Access Journals (Sweden)

    Kathleen Shaw

    Full Text Available Although infant speech perception in often studied in isolated modalities, infants' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces. Across two experiments, we tested infants' sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English and non-native (Spanish language. In Experiment 1, infants' looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.

  3. Brain responses to audiovisual speech mismatch in infants are associated with individual differences in looking behaviour.

    Science.gov (United States)

    Kushnerenko, Elena; Tomalski, Przemyslaw; Ballieux, Haiko; Ribeiro, Helena; Potton, Anita; Axelsson, Emma L; Murphy, Elizabeth; Moore, Derek G

    2013-11-01

    Research on audiovisual speech integration has reported high levels of individual variability, especially among young infants. In the present study we tested the hypothesis that this variability results from individual differences in the maturation of audiovisual speech processing during infancy. A developmental shift in selective attention to audiovisual speech has been demonstrated between 6 and 9 months with an increase in the time spent looking to articulating mouths as compared to eyes (Lewkowicz & Hansen-Tift. (2012) Proc. Natl Acad. Sci. USA, 109, 1431-1436; Tomalski et al. (2012) Eur. J. Dev. Psychol., 1-14). In the present study we tested whether these changes in behavioural maturational level are associated with differences in brain responses to audiovisual speech across this age range. We measured high-density event-related potentials (ERPs) in response to videos of audiovisually matching and mismatched syllables /ba/ and /ga/, and subsequently examined visual scanning of the same stimuli with eye-tracking. There were no clear age-specific changes in ERPs, but the amplitude of audiovisual mismatch response (AVMMR) to the combination of visual /ba/ and auditory /ga/ was strongly negatively associated with looking time to the mouth in the same condition. These results have significant implications for our understanding of individual differences in neural signatures of audiovisual speech processing in infants, suggesting that they are not strictly related to chronological age but instead associated with the maturation of looking behaviour, and develop at individual rates in the second half of the first year of life. © 2013 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

  4. Audiovisual discrimination between speech and laughter: Why and when visual information might help

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Past research on automatic laughter classification/detection has focused mainly on audio-based approaches. Here we present an audiovisual approach to distinguishing laughter from speech, and we show that integrating the information from audio and video channels may lead to improved performance over

  5. Modulations of 'late' event-related brain potentials in humans by dynamic audiovisual speech stimuli.

    Science.gov (United States)

    Lebib, Riadh; Papo, David; Douiri, Abdel; de Bode, Stella; Gillon Dowens, Margaret; Baudonnière, Pierre-Marie

    2004-11-30

    Lipreading reliably improve speech perception during face-to-face conversation. Within the range of good dubbing, however, adults tolerate some audiovisual (AV) discrepancies and lipreading, then, can give rise to confusion. We used event-related brain potentials (ERPs) to study the perceptual strategies governing the intermodal processing of dynamic and bimodal speech stimuli, either congruently dubbed or not. Electrophysiological analyses revealed that non-coherent audiovisual dubbings modulated in amplitude an endogenous ERP component, the N300, we compared to a 'N400-like effect' reflecting the difficulty to integrate these conflicting pieces of information. This result adds further support for the existence of a cerebral system underlying 'integrative processes' lato sensu. Further studies should take advantage of this 'N400-like effect' with AV speech stimuli to open new perspectives in the domain of psycholinguistics.

  6. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

    Directory of Open Access Journals (Sweden)

    Magnus eAlm

    2015-07-01

    Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.

  7. Audiovisual speech perception development at varying levels of perceptual processing.

    Science.gov (United States)

    Lalonde, Kaylah; Holt, Rachael Frush

    2016-04-01

    This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.

  8. A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

    Directory of Open Access Journals (Sweden)

    John F Magnotti

    2017-02-01

    Full Text Available Audiovisual speech integration combines information from auditory speech (talker's voice and visual speech (talker's mouth movements to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga, that are integrated to produce a fused percept ("da". This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba. We describe a simplified model of causal inference in multisensory speech perception (CIMS that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.

  9. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

    Directory of Open Access Journals (Sweden)

    Petar S. Aleksic

    2002-11-01

    Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0–30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.

  10. Degradation of labial information modifies audiovisual speech perception in cochlear-implanted children.

    Science.gov (United States)

    Huyse, Aurélie; Berthommier, Frédéric; Leybaert, Jacqueline

    2013-01-01

    The aim of the present study was to examine audiovisual speech integration in cochlear-implanted children and in normally hearing children exposed to degraded auditory stimuli. Previous studies have shown that speech perception in cochlear-implanted users is biased toward the visual modality when audition and vision provide conflicting information. Our main question was whether an experimentally designed degradation of the visual speech cue would increase the importance of audition in the response pattern. The impact of auditory proficiency was also investigated. A group of 31 children with cochlear implants and a group of 31 normally hearing children matched for chronological age were recruited. All children with cochlear implants had profound congenital deafness and had used their implants for at least 2 years. Participants had to perform an /aCa/ consonant-identification task in which stimuli were presented randomly in three conditions: auditory only, visual only, and audiovisual (congruent and incongruent McGurk stimuli). In half of the experiment, the visual speech cue was normal; in the other half (visual reduction) a degraded visual signal was presented, aimed at preventing lipreading of good quality. The normally hearing children received a spectrally reduced speech signal (simulating the input delivered by the cochlear implant). First, performance in visual-only and in congruent audiovisual modalities were decreased, showing that the visual reduction technique used here was efficient at degrading lipreading. Second, in the incongruent audiovisual trials, visual reduction led to a major increase in the number of auditory based responses in both groups. Differences between proficient and nonproficient children were found in both groups, with nonproficient children's responses being more visual and less auditory than those of proficient children. Further analysis revealed that differences between visually clear and visually reduced conditions and between

  11. No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag.

    Directory of Open Access Journals (Sweden)

    Jean-Luc Schwartz

    2014-07-01

    Full Text Available An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.

  12. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech.

    Science.gov (United States)

    Shahin, Antoine J; Shen, Stanley; Kerlin, Jess R

    2017-01-01

    We examined the relationship between tolerance for audiovisual onset asynchrony (AVOA) and the spectrotemporal fidelity of the spoken words and the speaker's mouth movements. In two experiments that only varied in the temporal order of sensory modality, visual speech leading (exp1) or lagging (exp2) acoustic speech, participants watched intact and blurred videos of a speaker uttering trisyllabic words and nonwords that were noise vocoded with 4-, 8-, 16-, and 32-channels. They judged whether the speaker's mouth movements and the speech sounds were in-sync or out-of-sync . Individuals perceived synchrony (tolerated AVOA) on more trials when the acoustic speech was more speech-like (8 channels and higher vs. 4 channels), and when visual speech was intact than blurred (exp1 only). These findings suggest that enhanced spectrotemporal fidelity of the audiovisual (AV) signal prompts the brain to widen the window of integration promoting the fusion of temporally distant AV percepts.

  13. Neurophysiology underlying influence of stimulus reliability on audiovisual integration.

    Science.gov (United States)

    Shatzer, Hannah; Shen, Stanley; Kerlin, Jess R; Pitt, Mark A; Shahin, Antoine J

    2018-01-24

    We tested the predictions of the dynamic reweighting model (DRM) of audiovisual (AV) speech integration, which posits that spectrotemporally reliable (informative) AV speech stimuli induce a reweighting of processing from low-level to high-level auditory networks. This reweighting decreases sensitivity to acoustic onsets and in turn increases tolerance to AV onset asynchronies (AVOA). EEG was recorded while subjects watched videos of a speaker uttering trisyllabic nonwords that varied in spectrotemporal reliability and asynchrony of the visual and auditory inputs. Subjects judged the stimuli as in-sync or out-of-sync. Results showed that subjects exhibited greater AVOA tolerance for non-blurred than blurred visual speech and for less than more degraded acoustic speech. Increased AVOA tolerance was reflected in reduced amplitude of the P1-P2 auditory evoked potentials, a neurophysiological indication of reduced sensitivity to acoustic onsets and successful AV integration. There was also sustained visual alpha band (8-14 Hz) suppression (desynchronization) following acoustic speech onsets for non-blurred vs. blurred visual speech, consistent with continuous engagement of the visual system as the speech unfolds. The current findings suggest that increased spectrotemporal reliability of acoustic and visual speech promotes robust AV integration, partly by suppressing sensitivity to acoustic onsets, in support of the DRM's reweighting mechanism. Increased visual signal reliability also sustains the engagement of the visual system with the auditory system to maintain alignment of information across modalities. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

  14. The effect of combined sensory and semantic components on audio-visual speech perception in older adults

    Directory of Open Access Journals (Sweden)

    Corrina eMaguinness

    2011-12-01

    Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.

  15. ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

    Directory of Open Access Journals (Sweden)

    D.V. Ivanko

    2016-05-01

    Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.

  16. Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception

    Science.gov (United States)

    Wilson, Amanda H.; Alsius, Agnès; Parè, Martin; Munhall, Kevin G.

    2016-01-01

    Purpose: The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method: We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent…

  17. Audiovisual Speech Perception and Eye Gaze Behavior of Adults with Asperger Syndrome

    Science.gov (United States)

    Saalasti, Satu; Katsyri, Jari; Tiippana, Kaisa; Laine-Hernandez, Mari; von Wendt, Lennart; Sams, Mikko

    2012-01-01

    Audiovisual speech perception was studied in adults with Asperger syndrome (AS), by utilizing the McGurk effect, in which conflicting visual articulation alters the perception of heard speech. The AS group perceived the audiovisual stimuli differently from age, sex and IQ matched controls. When a voice saying /p/ was presented with a face…

  18. Perception of audio-visual speech synchrony in Spanish-speaking children with and without specific language impairment.

    Science.gov (United States)

    Pons, Ferran; Andreu, Llorenç; Sanz-Torrent, Monica; Buil-Legaz, Lucía; Lewkowicz, David J

    2013-06-01

    Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666 ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component preceded [corrected] the visual component. None of the groups perceived an audiovisual asynchrony of 366 ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception.

  19. The organization and reorganization of audiovisual speech perception in the first year of life.

    Science.gov (United States)

    Danielson, D Kyle; Bruderer, Alison G; Kandhadai, Padmapriya; Vatikiotis-Bateson, Eric; Werker, Janet F

    2017-04-01

    The period between six and 12 months is a sensitive period for language learning during which infants undergo auditory perceptual attunement, and recent results indicate that this sensitive period may exist across sensory modalities. We tested infants at three stages of perceptual attunement (six, nine, and 11 months) to determine 1) whether they were sensitive to the congruence between heard and seen speech stimuli in an unfamiliar language, and 2) whether familiarization with congruent audiovisual speech could boost subsequent non-native auditory discrimination. Infants at six- and nine-, but not 11-months, detected audiovisual congruence of non-native syllables. Familiarization to incongruent, but not congruent, audiovisual speech changed auditory discrimination at test for six-month-olds but not nine- or 11-month-olds. These results advance the proposal that speech perception is audiovisual from early in ontogeny, and that the sensitive period for audiovisual speech perception may last somewhat longer than that for auditory perception alone.

  20. Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus.

    Science.gov (United States)

    Venezia, Jonathan H; Vaden, Kenneth I; Rong, Feng; Maddox, Dale; Saberi, Kourosh; Hickok, Gregory

    2017-01-01

    The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.

  1. Visual-Auditory Integration during Speech Imitation in Autism

    Science.gov (United States)

    Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

    2004-01-01

    Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…

  2. Neural initialization of audiovisual integration in prereaders at varying risk for developmental dyslexia.

    Science.gov (United States)

    I Karipidis, Iliana; Pleisch, Georgette; Röthlisberger, Martina; Hofstetter, Christoph; Dornbierer, Dario; Stämpfli, Philipp; Brem, Silvia

    2017-02-01

    Learning letter-speech sound correspondences is a major step in reading acquisition and is severely impaired in children with dyslexia. Up to now, it remains largely unknown how quickly neural networks adopt specific functions during audiovisual integration of linguistic information when prereading children learn letter-speech sound correspondences. Here, we simulated the process of learning letter-speech sound correspondences in 20 prereading children (6.13-7.17 years) at varying risk for dyslexia by training artificial letter-speech sound correspondences within a single experimental session. Subsequently, we acquired simultaneously event-related potentials (ERP) and functional magnetic resonance imaging (fMRI) scans during implicit audiovisual presentation of trained and untrained pairs. Audiovisual integration of trained pairs correlated with individual learning rates in right superior temporal, left inferior temporal, and bilateral parietal areas and with phonological awareness in left temporal areas. In correspondence, a differential left-lateralized parietooccipitotemporal ERP at 400 ms for trained pairs correlated with learning achievement and familial risk. Finally, a late (650 ms) posterior negativity indicating audiovisual congruency of trained pairs was associated with increased fMRI activation in the left occipital cortex. Taken together, a short (audiovisual integration in neural systems that are responsible for processing linguistic information in proficient readers. To conclude, the ability to learn grapheme-phoneme correspondences, the familial history of reading disability, and phonological awareness of prereading children account for the degree of audiovisual integration in a distributed brain network. Such findings on emerging linguistic audiovisual integration could allow for distinguishing between children with typical and atypical reading development. Hum Brain Mapp 38:1038-1055, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals

  3. Distributed neural signatures of natural audiovisual speech and music in the human auditory cortex.

    Science.gov (United States)

    Salmi, Juha; Koistinen, Olli-Pekka; Glerean, Enrico; Jylänki, Pasi; Vehtari, Aki; Jääskeläinen, Iiro P; Mäkelä, Sasu; Nummenmaa, Lauri; Nummi-Kuisma, Katarina; Nummi, Ilari; Sams, Mikko

    2017-08-15

    During a conversation or when listening to music, auditory and visual information are combined automatically into audiovisual objects. However, it is still poorly understood how specific type of visual information shapes neural processing of sounds in lifelike stimulus environments. Here we applied multi-voxel pattern analysis to investigate how naturally matching visual input modulates supratemporal cortex activity during processing of naturalistic acoustic speech, singing and instrumental music. Bayesian logistic regression classifiers with sparsity-promoting priors were trained to predict whether the stimulus was audiovisual or auditory, and whether it contained piano playing, speech, or singing. The predictive performances of the classifiers were tested by leaving one participant at a time for testing and training the model using the remaining 15 participants. The signature patterns associated with unimodal auditory stimuli encompassed distributed locations mostly in the middle and superior temporal gyrus (STG/MTG). A pattern regression analysis, based on a continuous acoustic model, revealed that activity in some of these MTG and STG areas were associated with acoustic features present in speech and music stimuli. Concurrent visual stimulus modulated activity in bilateral MTG (speech), lateral aspect of right anterior STG (singing), and bilateral parietal opercular cortex (piano). Our results suggest that specific supratemporal brain areas are involved in processing complex natural speech, singing, and piano playing, and other brain areas located in anterior (facial speech) and posterior (music-related hand actions) supratemporal cortex are influenced by related visual information. Those anterior and posterior supratemporal areas have been linked to stimulus identification and sensory-motor integration, respectively. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Crossmodal and incremental perception of audiovisual cues to emotional speech.

    Science.gov (United States)

    Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

    2010-01-01

    In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores

  5. Effect of attentional load on audiovisual speech perception: Evidence from ERPs

    Directory of Open Access Journals (Sweden)

    Agnès eAlsius

    2014-07-01

    Full Text Available Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e. a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.

  6. Effect of attentional load on audiovisual speech perception: evidence from ERPs.

    Science.gov (United States)

    Alsius, Agnès; Möttönen, Riikka; Sams, Mikko E; Soto-Faraco, Salvador; Tiippana, Kaisa

    2014-01-01

    Seeing articulatory movements influences perception of auditory speech. This is often reflected in a shortened latency of auditory event-related potentials (ERPs) generated in the auditory cortex. The present study addressed whether this early neural correlate of audiovisual interaction is modulated by attention. We recorded ERPs in 15 subjects while they were presented with auditory, visual, and audiovisual spoken syllables. Audiovisual stimuli consisted of incongruent auditory and visual components known to elicit a McGurk effect, i.e., a visually driven alteration in the auditory speech percept. In a Dual task condition, participants were asked to identify spoken syllables whilst monitoring a rapid visual stream of pictures for targets, i.e., they had to divide their attention. In a Single task condition, participants identified the syllables without any other tasks, i.e., they were asked to ignore the pictures and focus their attention fully on the spoken syllables. The McGurk effect was weaker in the Dual task than in the Single task condition, indicating an effect of attentional load on audiovisual speech perception. Early auditory ERP components, N1 and P2, peaked earlier to audiovisual stimuli than to auditory stimuli when attention was fully focused on syllables, indicating neurophysiological audiovisual interaction. This latency decrement was reduced when attention was loaded, suggesting that attention influences early neural processing of audiovisual speech. We conclude that reduced attention weakens the interaction between vision and audition in speech.

  7. BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

    Directory of Open Access Journals (Sweden)

    A. A. Karpov

    2014-09-01

    Full Text Available We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating all the components for generation of audio, visual and signed speech. The proposed system performs automatic translation of input textual information into speech (audio information and gestures (video information, information fusion and its output in the form of multimedia information. A user can input any grammatically correct text in Russian or Czech languages to the system; it is analyzed by the text processor to detect sentences, words and characters. Then this textual information is converted into symbols of the sign language notation. We apply international «Hamburg Notation System» - HamNoSys, which describes the main differential features of each manual sign: hand shape, hand orientation, place and type of movement. On their basis the 3D signing avatar displays the elements of the sign language. The virtual 3D model of human’s head and upper body has been created using VRML virtual reality modeling language, and it is controlled by the software based on OpenGL graphical library. The developed multimodal synthesis system is a universal one since it is oriented for both regular users and disabled people (in particular, for the hard-of-hearing and visually impaired, and it serves for multimedia output (by audio and visual modalities of input textual information.

  8. Effects of Audio-Visual Information on the Intelligibility of Alaryngeal Speech

    Science.gov (United States)

    Evitts, Paul M.; Portugal, Lindsay; Van Dine, Ami; Holler, Aline

    2010-01-01

    Background: There is minimal research on the contribution of visual information on speech intelligibility for individuals with a laryngectomy (IWL). Aims: The purpose of this project was to determine the effects of mode of presentation (audio-only, audio-visual) on alaryngeal speech intelligibility. Method: Twenty-three naive listeners were…

  9. Atypical audio-visual speech perception and McGurk effects in children with specific language impairment.

    Science.gov (United States)

    Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric

    2014-01-01

    Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed.

  10. Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition.

    Science.gov (United States)

    Stevenson, Ryan A; Nelms, Caitlin E; Baum, Sarah H; Zurkovsky, Lilia; Barense, Morgan D; Newhouse, Paul A; Wallace, Mark T

    2015-01-01

    Over the next 2 decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio. For whole-word recognition, older adults relative to younger adults showed greater multisensory gains at intermediate SNRs but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as signal-to-noise ratio decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy-aging populations and that deficits begin to emerge only at the more complex word-recognition level of speech signals. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex.

    Science.gov (United States)

    Rhone, Ariane E; Nourski, Kirill V; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A; McMurray, Bob

    In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.

  12. Robust audio-visual speech recognition under noisy audio-video conditions.

    Science.gov (United States)

    Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji

    2014-02-01

    This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.

  13. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

    Science.gov (United States)

    Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

    2015-01-01

    Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.

  14. Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

    2016-01-01

    Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…

  15. Selective Audiovisual Semantic Integration Enabled by Feature-Selective Attention.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Li, Peijun; Fang, Fang; Sun, Pei

    2016-01-13

    An audiovisual object may contain multiple semantic features, such as the gender and emotional features of the speaker. Feature-selective attention and audiovisual semantic integration are two brain functions involved in the recognition of audiovisual objects. Humans often selectively attend to one or several features while ignoring the other features of an audiovisual object. Meanwhile, the human brain integrates semantic information from the visual and auditory modalities. However, how these two brain functions correlate with each other remains to be elucidated. In this functional magnetic resonance imaging (fMRI) study, we explored the neural mechanism by which feature-selective attention modulates audiovisual semantic integration. During the fMRI experiment, the subjects were presented with visual-only, auditory-only, or audiovisual dynamical facial stimuli and performed several feature-selective attention tasks. Our results revealed that a distribution of areas, including heteromodal areas and brain areas encoding attended features, may be involved in audiovisual semantic integration. Through feature-selective attention, the human brain may selectively integrate audiovisual semantic information from attended features by enhancing functional connectivity and thus regulating information flows from heteromodal areas to brain areas encoding the attended features.

  16. Design and realisation of an audiovisual speech activity detector

    NARCIS (Netherlands)

    Van Bree, K.C.

    2006-01-01

    For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will givefalse positives when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach

  17. Language/Culture Modulates Brain and Gaze Processes in Audiovisual Speech Perception.

    Science.gov (United States)

    Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki

    2016-10-13

    Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs' response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs' early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.

  18. Text-to-audiovisual speech synthesizer for children with learning disabilities.

    Science.gov (United States)

    Mendi, Engin; Bayrak, Coskun

    2013-01-01

    Learning disabilities affect the ability of children to learn, despite their having normal intelligence. Assistive tools can highly increase functional capabilities of children with learning disorders such as writing, reading, or listening. In this article, we describe a text-to-audiovisual synthesizer that can serve as an assistive tool for such children. The system automatically converts an input text to audiovisual speech, providing synchronization of the head, eye, and lip movements of the three-dimensional face model with appropriate facial expressions and word flow of the text. The proposed system can enhance speech perception and help children having learning deficits to improve their chances of success.

  19. Absent Audiovisual Integration Elicited by Peripheral Stimuli in Parkinson's Disease.

    Science.gov (United States)

    Ren, Yanna; Suzuki, Keisuke; Yang, Weiping; Ren, Yanling; Wu, Fengxia; Yang, Jiajia; Takahashi, Satoshi; Ejima, Yoshimichi; Wu, Jinglong; Hirata, Koichi

    2018-01-01

    The basal ganglia, which have been shown to be a significant multisensory hub, are disordered in Parkinson's disease (PD). This study was to investigate the audiovisual integration of peripheral stimuli in PD patients with/without sleep disturbances. Thirty-six age-matched normal controls (NC) and 30 PD patients were recruited for an auditory/visual discrimination experiment. The mean response times for each participant were analyzed using repeated measures ANOVA and race model. The results showed that the response to all stimuli was significantly delayed for PD compared to NC (all p audiovisual stimuli was significantly faster than that to unimodal stimuli in both NC and PD ( p audiovisual integration was absent in PD; however, it did occur in NC. Further analysis showed that there was no significant audiovisual integration in PD with/without cognitive impairment or in PD with/without sleep disturbances. Furthermore, audiovisual facilitation was not associated with Hoehn and Yahr stage, disease duration, or the presence of sleep disturbances (all p > 0.05). The current results showed that audiovisual multisensory integration for peripheral stimuli is absent in PD regardless of sleep disturbances and further suggested the abnormal audiovisual integration might be a potential early manifestation of PD.

  20. Bimodal bilingualism as multisensory training?: Evidence for improved audiovisual speech perception after sign language exposure.

    Science.gov (United States)

    Williams, Joshua T; Darcy, Isabelle; Newman, Sharlene D

    2016-02-15

    The aim of the present study was to characterize effects of learning a sign language on the processing of a spoken language. Specifically, audiovisual phoneme comprehension was assessed before and after 13 weeks of sign language exposure. L2 ASL learners performed this task in the fMRI scanner. Results indicated that L2 American Sign Language (ASL) learners' behavioral classification of the speech sounds improved with time compared to hearing nonsigners. Results indicated increased activation in the supramarginal gyrus (SMG) after sign language exposure, which suggests concomitant increased phonological processing of speech. A multiple regression analysis indicated that learner's rating on co-sign speech use and lipreading ability was correlated with SMG activation. This pattern of results indicates that the increased use of mouthing and possibly lipreading during sign language acquisition may concurrently improve audiovisual speech processing in budding hearing bimodal bilinguals. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals

    Science.gov (United States)

    Lidestam, Björn; Rönnberg, Jerker

    2016-01-01

    The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. PMID:27317667

  2. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

    Science.gov (United States)

    Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

    2016-06-17

    The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. © The Author(s) 2016.

  3. Self-organizing maps for measuring similarity of audiovisual speech percepts

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich

    The goal of this work is to find a way to measure similarity of audiovisual speech percepts. Phoneme-related self-organizing maps (SOM) with a rectangular basis are trained with data material from a (labeled) video film. For the training, a combination of auditory speech features and corresponding....... Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled...... sentences in German with a balanced phoneme repertoire. As a result it can be stated that (i) the SOM can be trained to map auditory and visual features in a topology-preserving way and (ii) they show strain due to the influence of other audio-visual units. The SOM can be used to measure similarity amongst...

  4. On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

    Directory of Open Access Journals (Sweden)

    Wesley Mattheyses

    2009-01-01

    Full Text Available Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.

  5. Brief Report: Arrested Development of Audiovisual Speech Perception in Autism Spectrum Disorders

    Science.gov (United States)

    Stevenson, Ryan A.; Siemann, Justin K.; Woynaroski, Tiffany G.; Schneider, Brittany C.; Eberly, Haley E.; Camarata, Stephen M.; Wallace, Mark T.

    2014-01-01

    Atypical communicative abilities are a core marker of Autism Spectrum Disorders (ASD). A number of studies have shown that, in addition to auditory comprehension differences, individuals with autism frequently show atypical responses to audiovisual speech, suggesting a multisensory contribution to these communicative differences from their…

  6. Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training.

    Science.gov (United States)

    Bernstein, Lynne E; Auer, Edward T; Eberhardt, Silvio P; Jiang, Jintao

    2013-01-01

    Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called "reverse hierarchy theory" of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.

  7. Timing in audiovisual speech perception: A mini review and new psychophysical data.

    Science.gov (United States)

    Venezia, Jonathan H; Thurman, Steven M; Matchin, William; George, Sahara E; Hickok, Gregory

    2016-02-01

    Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (~35 % identification of /apa/ compared to ~5 % in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (~130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content.

  8. Timing in Audiovisual Speech Perception: A Mini Review and New Psychophysical Data

    Science.gov (United States)

    Venezia, Jonathan H.; Thurman, Steven M.; Matchin, William; George, Sahara E.; Hickok, Gregory

    2015-01-01

    Recent influential models of audiovisual speech perception suggest that visual speech aids perception by generating predictions about the identity of upcoming speech sounds. These models place stock in the assumption that visual speech leads auditory speech in time. However, it is unclear whether and to what extent temporally-leading visual speech information contributes to perception. Previous studies exploring audiovisual-speech timing have relied upon psychophysical procedures that require artificial manipulation of cross-modal alignment or stimulus duration. We introduce a classification procedure that tracks perceptually-relevant visual speech information in time without requiring such manipulations. Participants were shown videos of a McGurk syllable (auditory /apa/ + visual /aka/ = perceptual /ata/) and asked to perform phoneme identification (/apa/ yes-no). The mouth region of the visual stimulus was overlaid with a dynamic transparency mask that obscured visual speech in some frames but not others randomly across trials. Variability in participants' responses (∼35% identification of /apa/ compared to ∼5% in the absence of the masker) served as the basis for classification analysis. The outcome was a high resolution spatiotemporal map of perceptually-relevant visual features. We produced these maps for McGurk stimuli at different audiovisual temporal offsets (natural timing, 50-ms visual lead, and 100-ms visual lead). Briefly, temporally-leading (∼130 ms) visual information did influence auditory perception. Moreover, several visual features influenced perception of a single speech sound, with the relative influence of each feature depending on both its temporal relation to the auditory signal and its informational content. PMID:26669309

  9. Audiovisual integration facilitates monkeys' short-term memory.

    Science.gov (United States)

    Bigelow, James; Poremba, Amy

    2016-07-01

    Many human behaviors are known to benefit from audiovisual integration, including language and communication, recognizing individuals, social decision making, and memory. Exceptionally little is known about the contributions of audiovisual integration to behavior in other primates. The current experiment investigated whether short-term memory in nonhuman primates is facilitated by the audiovisual presentation format. Three macaque monkeys that had previously learned an auditory delayed matching-to-sample (DMS) task were trained to perform a similar visual task, after which they were tested with a concurrent audiovisual DMS task with equal proportions of auditory, visual, and audiovisual trials. Parallel to outcomes in human studies, accuracy was higher and response times were faster on audiovisual trials than either unisensory trial type. Unexpectedly, two subjects exhibited superior unimodal performance on auditory trials, a finding that contrasts with previous studies, but likely reflects their training history. Our results provide the first demonstration of a bimodal memory advantage in nonhuman primates, lending further validation to their use as a model for understanding audiovisual integration and memory processing in humans.

  10. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Directory of Open Access Journals (Sweden)

    Yue Zhao

    2012-12-01

    Full Text Available Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.

  11. Gated audiovisual speech identification in silence vs. noise: effects on time and accuracy

    Science.gov (United States)

    Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

    2013-01-01

    This study investigated the degree to which audiovisual presentation (compared to auditory-only presentation) affected isolation point (IPs, the amount of time required for the correct identification of speech stimuli using a gating paradigm) in silence and noise conditions. The study expanded on the findings of Moradi et al. (under revision), using the same stimuli, but presented in an audiovisual instead of an auditory-only manner. The results showed that noise impeded the identification of consonants and words (i.e., delayed IPs and lowered accuracy), but not the identification of final words in sentences. In comparison with the previous study by Moradi et al., it can be concluded that the provision of visual cues expedited IPs and increased the accuracy of speech stimuli identification in both silence and noise. The implication of the results is discussed in terms of models for speech understanding. PMID:23801980

  12. Audio-visual speech perception in prelingually deafened Japanese children following sequential bilateral cochlear implantation.

    Science.gov (United States)

    Yamamoto, Ryosuke; Naito, Yasushi; Tona, Risa; Moroto, Saburo; Tamaya, Rinko; Fujiwara, Keizo; Shinohara, Shogo; Takebayashi, Shinji; Kikuchi, Masahiro; Michida, Tetsuhiko

    2017-11-01

    An effect of audio-visual (AV) integration is observed when the auditory and visual stimuli are incongruent (the McGurk effect). In general, AV integration is helpful especially in subjects wearing hearing aids or cochlear implants (CIs). However, the influence of AV integration on spoken word recognition in individuals with bilateral CIs (Bi-CIs) has not been fully investigated so far. In this study, we investigated AV integration in children with Bi-CIs. The study sample included thirty one prelingually deafened children who underwent sequential bilateral cochlear implantation. We assessed their responses to congruent and incongruent AV stimuli with three CI-listening modes: only the 1st CI, only the 2nd CI, and Bi-CIs. The responses were assessed in the whole group as well as in two sub-groups: a proficient group (syllable intelligibility ≥80% with the 1st CI) and a non-proficient group (syllable intelligibility effect in each of the three CI-listening modes. AV integration responses were observed in a subset of incongruent AV stimuli, and the patterns observed with the 1st CI and with Bi-CIs were similar. In the proficient group, the responses with the 2nd CI were not significantly different from those with the 1st CI whereas in the non-proficient group the responses with the 2nd CI were driven by visual stimuli more than those with the 1st CI. Our results suggested that prelingually deafened Japanese children who underwent sequential bilateral cochlear implantation exhibit AV integration abilities, both in monaural listening as well as in binaural listening. We also observed a higher influence of visual stimuli on speech perception with the 2nd CI in the non-proficient group, suggesting that Bi-CIs listeners with poorer speech recognition rely on visual information more compared to the proficient subjects to compensate for poorer auditory input. Nevertheless, poorer quality auditory input with the 2nd CI did not interfere with AV integration with binaural

  13. Audio-Visual Speech Perception: A Developmental ERP Investigation

    Science.gov (United States)

    Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…

  14. Classifying laughter and speech using audio-visual feature prediction

    NARCIS (Netherlands)

    Petridis, Stavros; Asghar, Ali; Pantic, Maja

    2010-01-01

    In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and

  15. Audio-Visual Speech in Noise Perception in Dyslexia

    Science.gov (United States)

    van Laarhoven, Thijs; Keetels, Mirjam; Schakel, Lemmy; Vroomen, Jean

    2018-01-01

    Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and adults with DD. We found that children with a…

  16. Multisensory integration in complete unawareness: evidence from audiovisual congruency priming.

    Science.gov (United States)

    Faivre, Nathan; Mudrik, Liad; Schwartz, Naama; Koch, Christof

    2014-11-01

    Multisensory integration is thought to require conscious perception. Although previous studies have shown that an invisible stimulus could be integrated with an audible one, none have demonstrated integration of two subliminal stimuli of different modalities. Here, pairs of identical or different audiovisual target letters (the sound /b/ with the written letter "b" or "m," respectively) were preceded by pairs of masked identical or different audiovisual prime digits (the sound /6/ with the written digit "6" or "8," respectively). In three experiments, awareness of the audiovisual digit primes was manipulated, such that participants were either unaware of the visual digit, the auditory digit, or both. Priming of the semantic relations between the auditory and visual digits was found in all experiments. Moreover, a further experiment showed that unconscious multisensory integration was not obtained when participants did not undergo prior conscious training of the task. This suggests that following conscious learning, unconscious processing suffices for multisensory integration. © The Author(s) 2014.

  17. Perception of the multisensory coherence of fluent audiovisual speech in infancy: its emergence and the role of experience.

    Science.gov (United States)

    Lewkowicz, David J; Minar, Nicholas J; Tift, Amy H; Brandon, Melissa

    2015-02-01

    To investigate the developmental emergence of the perception of the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8- to 10-, and 12- to 14-month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor 8- to 10-month-old infants exhibited audiovisual matching in that they did not look longer at the matching monologue. In contrast, the 12- to 14-month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, perceived the multisensory coherence of native-language monologues earlier in the test trials than that of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12- to 14-month-olds did not depend on audiovisual synchrony, whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audiovisual synchrony cues are more important in the perception of the multisensory coherence of non-native speech than that of native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. Perception of the Multisensory Coherence of Fluent Audiovisual Speech in Infancy: Its Emergence & the Role of Experience

    Science.gov (United States)

    Lewkowicz, David J.; Minar, Nicholas J.; Tift, Amy H.; Brandon, Melissa

    2014-01-01

    To investigate the developmental emergence of the ability to perceive the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8–10, and 12–14 month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor the 8–10 month-old infants exhibited audio-visual matching in that neither group exhibited greater looking at the matching monologue. In contrast, the 12–14 month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, they perceived the multisensory coherence of native-language monologues earlier in the test trials than of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12–14 month olds did not depend on audio-visual synchrony whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audio-visual synchrony cues are more important in the perception of the multisensory coherence of non-native than native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. PMID:25462038

  19. Effect of hearing loss on semantic access by auditory and audiovisual speech in children.

    Science.gov (United States)

    Jerger, Susan; Tye-Murray, Nancy; Damian, Markus F; Abdi, Hervé

    2013-01-01

    This research studied whether the mode of input (auditory versus audiovisual) influenced semantic access by speech in children with sensorineural hearing impairment (HI). Participants, 31 children with HI and 62 children with normal hearing (NH), were tested with the authors' new multimodal picture word task. Children were instructed to name pictures displayed on a monitor and ignore auditory or audiovisual speech distractors. The semantic content of the distractors was varied to be related versus unrelated to the pictures (e.g., picture distractor of dog-bear versus dog-cheese, respectively). In children with NH, picture-naming times were slower in the presence of semantically related distractors. This slowing, called semantic interference, is attributed to the meaning-related picture-distractor entries competing for selection and control of the response (the lexical selection by competition hypothesis). Recently, a modification of the lexical selection by competition hypothesis, called the competition threshold (CT) hypothesis, proposed that (1) the competition between the picture-distractor entries is determined by a threshold, and (2) distractors with experimentally reduced fidelity cannot reach the CT. Thus, semantically related distractors with reduced fidelity do not produce the normal interference effect, but instead no effect or semantic facilitation (faster picture naming times for semantically related versus unrelated distractors). Facilitation occurs because the activation level of the semantically related distractor with reduced fidelity (1) is not sufficient to exceed the CT and produce interference but (2) is sufficient to activate its concept, which then strengthens the activation of the picture and facilitates naming. This research investigated whether the proposals of the CT hypothesis generalize to the auditory domain, to the natural degradation of speech due to HI, and to participants who are children. Our multimodal picture word task allowed us

  20. Skill dependent audiovisual integration in the fusiform induces repetition suppression.

    Science.gov (United States)

    McNorgan, Chris; Booth, James R

    2015-02-01

    Learning to read entails mapping existing phonological representations to novel orthographic representations and is thus an ideal context for investigating experience driven audiovisual integration. Because two dominant brain-based theories of reading development hinge on the sensitivity of the visual-object processing stream to phonological information, we were interested in how reading skill relates to audiovisual integration in this area. Thirty-two children between 8 and 13 years of age spanning a range of reading skill participated in a functional magnetic resonance imaging experiment. Participants completed a rhyme judgment task to word pairs presented unimodally (auditory- or visual-only) and cross-modally (auditory followed by visual). Skill-dependent sub-additive audiovisual modulation was found in left fusiform gyrus, extending into the putative visual word form area, and was correlated with behavioral orthographic priming. These results suggest learning to read promotes facilitatory audiovisual integration in the ventral visual-object processing stream and may optimize this region for orthographic processing. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Temporal Ventriloquism Reveals Intact Audiovisual Temporal Integration in Amblyopia.

    Science.gov (United States)

    Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

    2018-02-01

    We have shown previously that amblyopia involves impaired detection of asynchrony between auditory and visual events. To distinguish whether this impairment represents a defect in temporal integration or nonintegrative multisensory processing (e.g., cross-modal matching), we used the temporal ventriloquism effect in which visual temporal order judgment (TOJ) is normally enhanced by a lagging auditory click. Participants with amblyopia (n = 9) and normally sighted controls (n = 9) performed a visual TOJ task. Pairs of clicks accompanied the two lights such that the first click preceded the first light, or second click lagged the second light by 100, 200, or 450 ms. Baseline audiovisual synchrony and visual-only conditions also were tested. Within both groups, just noticeable differences for the visual TOJ task were significantly reduced compared with baseline in the 100- and 200-ms click lag conditions. Within the amblyopia group, poorer stereo acuity and poorer visual acuity in the amblyopic eye were significantly associated with greater enhancement in visual TOJ performance in the 200-ms click lag condition. Audiovisual temporal integration is intact in amblyopia, as indicated by perceptual enhancement in the temporal ventriloquism effect. Furthermore, poorer stereo acuity and poorer visual acuity in the amblyopic eye are associated with a widened temporal binding window for the effect. These findings suggest that previously reported abnormalities in audiovisual multisensory processing may result from impaired cross-modal matching rather than a diminished capacity for temporal audiovisual integration.

  2. Skill Dependent Audiovisual Integration in the Fusiform Induces Repetition Suppression

    Science.gov (United States)

    McNorgan, Chris; Booth, James R.

    2015-01-01

    Learning to read entails mapping existing phonological representations to novel orthographic representations and is thus an ideal context for investigating experience driven audiovisual integration. Because two dominant brain-based theories of reading development hinge on the sensitivity of the visual-object processing stream to phonological information, we were interested in how reading skill relates to audiovisual integration in this area. Thirty-two children between 8 and 13 years of age spanning a range of reading skill participated in a functional magnetic resonance imaging experiment. Participants completed a rhyme judgment task to word pairs presented unimodally (auditory- or visual-only) and cross-modally (auditory followed by visual). Skill-dependent sub-additive audiovisual modulation was found in left fusiform gyrus, extending into the putative visual word form area, and was correlated with behavioral orthographic priming. These results suggest learning to read promotes facilitatory audiovisual integration in the ventral visual-object processing stream and may optimize this region for orthographic processing. PMID:25585276

  3. Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

    Directory of Open Access Journals (Sweden)

    Warrick Roseboom

    2011-04-01

    Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.

  4. Selective Attention and Audiovisual Integration: Is Attending to Both Modalities a Prerequisite for Early Integration?

    NARCIS (Netherlands)

    Talsma, D.; Doty, Tracy J.; Woldorff, Marty G.

    2007-01-01

    Interactions between multisensory integration and attention were studied using a combined audiovisual streaming design and a rapid serial visual presentation paradigm. Event-related potentials (ERPs) following audiovisual objects (AV) were compared with the sum of the ERPs following auditory (A) and

  5. Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration.

    Science.gov (United States)

    Stropahl, Maren; Debener, Stefan

    2017-01-01

    There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI) users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users ( n  = 18), untreated mild to moderately hearing impaired individuals (n = 18) and normal hearing controls ( n  = 17). Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the auditory system

  6. Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration

    Directory of Open Access Journals (Sweden)

    Maren Stropahl

    2017-01-01

    Full Text Available There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users (n = 18, untreated mild to moderately hearing impaired individuals (n = 18 and normal hearing controls (n = 17. Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the

  7. Context-specific effects of musical expertise on audiovisual integration

    Science.gov (United States)

    Bishop, Laura; Goebl, Werner

    2014-01-01

    Ensemble musicians exchange auditory and visual signals that can facilitate interpersonal synchronization. Musical expertise improves how precisely auditory and visual signals are perceptually integrated and increases sensitivity to asynchrony between them. Whether expertise improves sensitivity to audiovisual asynchrony in all instrumental contexts or only in those using sound-producing gestures that are within an observer's own motor repertoire is unclear. This study tested the hypothesis that musicians are more sensitive to audiovisual asynchrony in performances featuring their own instrument than in performances featuring other instruments. Short clips were extracted from audio-video recordings of clarinet, piano, and violin performances and presented to highly-skilled clarinetists, pianists, and violinists. Clips either maintained the audiovisual synchrony present in the original recording or were modified so that the video led or lagged behind the audio. Participants indicated whether the audio and video channels in each clip were synchronized. The range of asynchronies most often endorsed as synchronized was assessed as a measure of participants' sensitivities to audiovisual asynchrony. A positive relationship was observed between musical training and sensitivity, with data pooled across stimuli. While participants across expertise groups detected asynchronies most readily in piano stimuli and least readily in violin stimuli, pianists showed significantly better performance for piano stimuli than for either clarinet or violin. These findings suggest that, to an extent, the effects of expertise on audiovisual integration can be instrument-specific; however, the nature of the sound-producing gestures that are observed has a substantial effect on how readily asynchrony is detected as well. PMID:25324819

  8. Audiovisual Perception of Noise Vocoded Speech in Dyslexic and Non-Dyslexic Adults: The Role of Low-Frequency Visual Modulations

    Science.gov (United States)

    Megnin-Viggars, Odette; Goswami, Usha

    2013-01-01

    Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…

  9. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  10. The effect of varying talker identity and listening conditions on gaze behavior during audiovisual speech perception.

    Science.gov (United States)

    Buchan, Julie N; Paré, Martin; Munhall, Kevin G

    2008-11-25

    During face-to-face conversation the face provides auditory and visual linguistic information, and also conveys information about the identity of the speaker. This study investigated behavioral strategies involved in gathering visual information while watching talking faces. The effects of varying talker identity and varying the intelligibility of speech (by adding acoustic noise) on gaze behavior were measured with an eyetracker. Varying the intelligibility of the speech by adding noise had a noticeable effect on the location and duration of fixations. When noise was present subjects adopted a vantage point that was more centralized on the face by reducing the frequency of the fixations on the eyes and mouth and lengthening the duration of their gaze fixations on the nose and mouth. Varying talker identity resulted in a more modest change in gaze behavior that was modulated by the intelligibility of the speech. Although subjects generally used similar strategies to extract visual information in both talker variability conditions, when noise was absent there were more fixations on the mouth when viewing a different talker every trial as opposed to the same talker every trial. These findings provide a useful baseline for studies examining gaze behavior during audiovisual speech perception and perception of dynamic faces.

  11. Conditioning Influences Audio-Visual Integration by Increasing Sound Saliency

    Directory of Open Access Journals (Sweden)

    Fabrizio Leo

    2011-10-01

    Full Text Available We investigated the effect of prior conditioning of an auditory stimulus on audiovisual integration in a series of four psychophysical experiments. The experiments factorially manipulated the conditioning procedure (picture vs monetary conditioning and multisensory paradigm (2AFC visual detection vs redundant target paradigm. In the conditioning sessions, subjects were presented with three pure tones (= conditioned stimulus, CS that were paired with neutral, positive, or negative unconditioned stimuli (US, monetary: +50 euro cents,.–50 cents, 0 cents; pictures: highly pleasant, unpleasant, and neutral IAPS. In a 2AFC visual selective attention paradigm, detection of near-threshold Gabors was improved by concurrent sounds that had previously been paired with a positive (monetary or negative (picture outcome relative to neutral sounds. In the redundant target paradigm, sounds previously paired with positive (monetary or negative (picture outcomes increased response speed to both auditory and audiovisual targets similarly. Importantly, prior conditioning did not increase the multisensory response facilitation (ie, (A + V/2 – AV or the race model violation. Collectively, our results suggest that prior conditioning primarily increases the saliency of the auditory stimulus per se rather than influencing audiovisual integration directly. In turn, conditioned sounds are rendered more potent for increasing response accuracy or speed in detection of visual targets.

  12. Dissociating Cortical Activity during Processing of Native and Non-Native Audiovisual Speech from Early to Late Infancy

    Directory of Open Access Journals (Sweden)

    Eswen Fava

    2014-08-01

    Full Text Available Initially, infants are capable of discriminating phonetic contrasts across the world’s languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech. Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity.

  13. How can audiovisual pathways enhance the temporal resolution of time-compressed speech in blind subjects?

    Directory of Open Access Journals (Sweden)

    Ingo eHertrich

    2013-08-01

    Full Text Available In blind people, the visual channel cannot assist face-to-face communication via lipreading or visual prosody. Nevertheless, the visual system may enhance the evaluation of auditory information due to its cross-links to (1 the auditory system, (2 supramodal representations, and (3 frontal action-related areas. Apart from feedback or top-down support of, for example, the processing of spatial or phonological representations, experimental data have shown that the visual system can impact auditory perception at more basic computational stages such as temporal resolution. For example, blind as compared to sighted subjects are more resistant against backward masking, and this ability appears to be associated with activity in visual cortex. Regarding the comprehension of continuous speech, blind subjects can learn to use accelerated text-to-speech systems for "reading" texts at ultra-fast speaking rates (> 16 syllables/s, exceeding by far the normal range of 6 syllables/s. An fMRI study has shown that this ability, among other brain regions, significantly covaries with BOLD responses in bilateral pulvinar, right visual cortex, and left supplementary motor area. Furthermore, magnetoencephalographic (MEG measurements revealed a particular component in right occipital cortex phase-locked to the syllable onsets of accelerated speech. In sighted people, the "bottleneck" for understanding time-compressed speech seems related to a demand for buffering phonological material and is, presumably, linked to frontal brain structures. On the other hand, the neurophysiological correlates of functions overcoming this bottleneck, seem to depend upon early visual cortex activity. The present Hypothesis and Theory paper outlines a model that aims at binding these data together, based on early cross-modal pathways that are already known from various audiovisual experiments considering cross-modal adjustments in space, time, and object recognition.

  14. The role of emotion in dynamic audiovisual integration of faces and voices.

    Science.gov (United States)

    Kokinous, Jenny; Kotz, Sonja A; Tavano, Alessandro; Schröger, Erich

    2015-05-01

    We used human electroencephalogram to study early audiovisual integration of dynamic angry and neutral expressions. An auditory-only condition served as a baseline for the interpretation of integration effects. In the audiovisual conditions, the validity of visual information was manipulated using facial expressions that were either emotionally congruent or incongruent with the vocal expressions. First, we report an N1 suppression effect for angry compared with neutral vocalizations in the auditory-only condition. Second, we confirm early integration of congruent visual and auditory information as indexed by a suppression of the auditory N1 and P2 components in the audiovisual compared with the auditory-only condition. Third, audiovisual N1 suppression was modulated by audiovisual congruency in interaction with emotion: for neutral vocalizations, there was N1 suppression in both the congruent and the incongruent audiovisual conditions. For angry vocalizations, there was N1 suppression only in the congruent but not in the incongruent condition. Extending previous findings of dynamic audiovisual integration, the current results suggest that audiovisual N1 suppression is congruency- and emotion-specific and indicate that dynamic emotional expressions compared with non-emotional expressions are preferentially processed in early audiovisual integration. © The Author (2014). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  15. Read My Lips: Brain Dynamics Associated with Audiovisual Integration and Deviance Detection.

    Science.gov (United States)

    Tse, Chun-Yu; Gratton, Gabriele; Garnsey, Susan M; Novak, Michael A; Fabiani, Monica

    2015-09-01

    Information from different modalities is initially processed in different brain areas, yet real-world perception often requires the integration of multisensory signals into a single percept. An example is the McGurk effect, in which people viewing a speaker whose lip movements do not match the utterance perceive the spoken sounds incorrectly, hearing them as more similar to those signaled by the visual rather than the auditory input. This indicates that audiovisual integration is important for generating the phoneme percept. Here we asked when and where the audiovisual integration process occurs, providing spatial and temporal boundaries for the processes generating phoneme perception. Specifically, we wanted to separate audiovisual integration from other processes, such as simple deviance detection. Building on previous work employing ERPs, we used an oddball paradigm in which task-irrelevant audiovisually deviant stimuli were embedded in strings of non-deviant stimuli. We also recorded the event-related optical signal, an imaging method combining spatial and temporal resolution, to investigate the time course and neuroanatomical substrate of audiovisual integration. We found that audiovisual deviants elicit a short duration response in the middle/superior temporal gyrus, whereas audiovisual integration elicits a more extended response involving also inferior frontal and occipital regions. Interactions between audiovisual integration and deviance detection processes were observed in the posterior/superior temporal gyrus. These data suggest that dynamic interactions between inferior frontal cortex and sensory regions play a significant role in multimodal integration.

  16. Greater BOLD variability in older compared with younger adults during audiovisual speech perception.

    Directory of Open Access Journals (Sweden)

    Sarah H Baum

    Full Text Available Older adults exhibit decreased performance and increased trial-to-trial variability on a range of cognitive tasks, including speech perception. We used blood oxygen level dependent functional magnetic resonance imaging (BOLD fMRI to search for neural correlates of these behavioral phenomena. We compared brain responses to simple speech stimuli (audiovisual syllables in 24 healthy older adults (53 to 70 years old and 14 younger adults (23 to 39 years old using two independent analysis strategies: region-of-interest (ROI and voxel-wise whole-brain analysis. While mean response amplitudes were moderately greater in younger adults, older adults had much greater within-subject variability. The greatly increased variability in older adults was observed for both individual voxels in the whole-brain analysis and for ROIs in the left superior temporal sulcus, the left auditory cortex, and the left visual cortex. Increased variability in older adults could not be attributed to differences in head movements between the groups. Increased neural variability may be related to the performance declines and increased behavioral variability that occur with aging.

  17. Effects of noise and audiovisual cues on speech processing in adults with and without ADHD.

    Science.gov (United States)

    Michalek, Anne M P; Watson, Silvana M; Ash, Ivan; Ringleb, Stacie; Raymer, Anastasia

    2014-03-01

    This study examined the interplay among internal (e.g. attention, working memory abilities) and external (e.g. background noise, visual information) factors in individuals with and without ADHD. A 2 × 2 × 6 mixed design with correlational analyses was used to compare participant results on a standardized listening in noise sentence repetition task (QuickSin; Killion et al, 2004 ), presented in an auditory and an audiovisual condition as signal-to-noise ratio (SNR) varied from 25-0 dB and to determine individual differences in working memory capacity and short-term recall. Thirty-eight young adults without ADHD and twenty-five young adults with ADHD. Diagnosis, modality, and signal-to-noise ratio all affected the ability to process speech in noise. The interaction between the diagnosis of ADHD, the presence of visual cues, and the level of noise had an effect on a person's ability to process speech in noise. conclusion: Young adults with ADHD benefited less from visual information during noise than young adults without ADHD, an effect influenced by working memory abilities.

  18. Contribution of Prosody in Audio-Visual Integration to Emotional Perception of Virtual Characters

    Directory of Open Access Journals (Sweden)

    Ekaterina Volkova

    2011-10-01

    Full Text Available Recent technology provides us with realistic looking virtual characters. Motion capture and elaborate mathematical models supply data for natural looking, controllable facial and bodily animations. With the help of computational linguistics and artificial intelligence, we can automatically assign emotional categories to appropriate stretches of text for a simulation of those social scenarios where verbal communication is important. All this makes virtual characters a valuable tool for creation of versatile stimuli for research on the integration of emotion information from different modalities. We conducted an audio-visual experiment to investigate the differential contributions of emotional speech and facial expressions on emotion identification. We used recorded and synthesized speech as well as dynamic virtual faces, all enhanced for seven emotional categories. The participants were asked to recognize the prevalent emotion of paired faces and audio. Results showed that when the voice was recorded, the vocalized emotion influenced participants' emotion identification more than the facial expression. However, when the voice was synthesized, facial expression influenced participants' emotion identification more than vocalized emotion. Additionally, individuals did worse on identifying either the facial expression or vocalized emotion when the voice was synthesized. Our experimental method can help to determine how to improve synthesized emotional speech.

  19. Audiovisual Integration Delayed by Stimulus Onset Asynchrony Between Auditory and Visual Stimuli in Older Adults.

    Science.gov (United States)

    Ren, Yanna; Yang, Weiping; Nakahashi, Kohei; Takahashi, Satoshi; Wu, Jinglong

    2017-02-01

    Although neuronal studies have shown that audiovisual integration is regulated by temporal factors, there is still little knowledge about the impact of temporal factors on audiovisual integration in older adults. To clarify how stimulus onset asynchrony (SOA) between auditory and visual stimuli modulates age-related audiovisual integration, 20 younger adults (21-24 years) and 20 older adults (61-80 years) were instructed to perform an auditory or visual stimuli discrimination experiment. The results showed that in younger adults, audiovisual integration was altered from an enhancement (AV, A ± 50 V) to a depression (A ± 150 V). In older adults, the alterative pattern was similar to that for younger adults with the expansion of SOA; however, older adults showed significantly delayed onset for the time-window-of-integration and peak latency in all conditions, which further demonstrated that audiovisual integration was delayed more severely with the expansion of SOA, especially in the peak latency for V-preceded-A conditions in older adults. Our study suggested that audiovisual facilitative integration occurs only within a certain SOA range (e.g., -50 to 50 ms) in both younger and older adults. Moreover, our results confirm that the response for older adults was slowed and provided empirical evidence that integration ability is much more sensitive to the temporal alignment of audiovisual stimuli in older adults.

  20. Common variation in the autism risk gene CNTNAP2, brain structural connectivity and multisensory speech integration.

    Science.gov (United States)

    Ross, Lars A; Del Bene, Victor A; Molholm, Sophie; Jae Woo, Young; Andrade, Gizely N; Abrahams, Brett S; Foxe, John J

    2017-11-01

    Three lines of evidence motivated this study. 1) CNTNAP2 variation is associated with autism risk and speech-language development. 2) CNTNAP2 variations are associated with differences in white matter (WM) tracts comprising the speech-language circuitry. 3) Children with autism show impairment in multisensory speech perception. Here, we asked whether an autism risk-associated CNTNAP2 single nucleotide polymorphism in neurotypical adults was associated with multisensory speech perception performance, and whether such a genotype-phenotype association was mediated through white matter tract integrity in speech-language circuitry. Risk genotype at rs7794745 was associated with decreased benefit from visual speech and lower fractional anisotropy (FA) in several WM tracts (right precentral gyrus, left anterior corona radiata, right retrolenticular internal capsule). These structural connectivity differences were found to mediate the effect of genotype on audiovisual speech perception, shedding light on possible pathogenic pathways in autism and biological sources of inter-individual variation in audiovisual speech processing in neurotypicals. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Audiovisual speech perception at various presentation levels in Mandarin-speaking adults with cochlear implants.

    Directory of Open Access Journals (Sweden)

    Shu-Yu Liu

    Full Text Available (1 To evaluate the recognition of words, phonemes and lexical tones in audiovisual (AV and auditory-only (AO modes in Mandarin-speaking adults with cochlear implants (CIs; (2 to understand the effect of presentation levels on AV speech perception; (3 to learn the effect of hearing experience on AV speech perception.Thirteen deaf adults (age = 29.1±13.5 years; 8 male, 5 female who had used CIs for >6 months and 10 normal-hearing (NH adults participated in this study. Seven of them were prelingually deaf, and 6 postlingually deaf. The Mandarin Monosyllablic Word Recognition Test was used to assess recognition of words, phonemes and lexical tones in AV and AO conditions at 3 presentation levels: speech detection threshold (SDT, speech recognition threshold (SRT and 10 dB SL (re:SRT.The prelingual group had better phoneme recognition in the AV mode than in the AO mode at SDT and SRT (both p = 0.016, and so did the NH group at SDT (p = 0.004. Mode difference was not noted in the postlingual group. None of the groups had significantly different tone recognition in the 2 modes. The prelingual and postlingual groups had significantly better phoneme and tone recognition than the NH one at SDT in the AO mode (p = 0.016 and p = 0.002 for phonemes; p = 0.001 and p<0.001 for tones but were outperformed by the NH group at 10 dB SL (re:SRT in both modes (both p<0.001 for phonemes; p<0.001 and p = 0.002 for tones. The recognition scores had a significant correlation with group with age and sex controlled (p<0.001.Visual input may help prelingually deaf implantees to recognize phonemes but may not augment Mandarin tone recognition. The effect of presentation level seems minimal on CI users' AV perception. This indicates special considerations in developing audiological assessment protocols and rehabilitation strategies for implantees who speak tonal languages.

  2. Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss.

    Science.gov (United States)

    Brooks, Cassandra J; Chan, Yu Man; Anderson, Andrew J; McKendrick, Allison M

    2018-01-01

    Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information.

  3. Audiovisual Temporal Perception in Aging: The Role of Multisensory Integration and Age-Related Sensory Loss

    Science.gov (United States)

    Brooks, Cassandra J.; Chan, Yu Man; Anderson, Andrew J.; McKendrick, Allison M.

    2018-01-01

    Within each sensory modality, age-related deficits in temporal perception contribute to the difficulties older adults experience when performing everyday tasks. Since perceptual experience is inherently multisensory, older adults also face the added challenge of appropriately integrating or segregating the auditory and visual cues present in our dynamic environment into coherent representations of distinct objects. As such, many studies have investigated how older adults perform when integrating temporal information across audition and vision. This review covers both direct judgments about temporal information (the sound-induced flash illusion, temporal order, perceived synchrony, and temporal rate discrimination) and judgments regarding stimuli containing temporal information (the audiovisual bounce effect and speech perception). Although an age-related increase in integration has been demonstrated on a variety of tasks, research specifically investigating the ability of older adults to integrate temporal auditory and visual cues has produced disparate results. In this short review, we explore what factors could underlie these divergent findings. We conclude that both task-specific differences and age-related sensory loss play a role in the reported disparity in age-related effects on the integration of auditory and visual temporal information. PMID:29867415

  4. Aging Effect on Audiovisual Integrative Processing in Spatial Discrimination Task

    Directory of Open Access Journals (Sweden)

    Zhi Zou

    2017-11-01

    Full Text Available Multisensory integration is an essential process that people employ daily, from conversing in social gatherings to navigating the nearby environment. The aim of this study was to investigate the impact of aging on modulating multisensory integrative processes using event-related potential (ERP, and the validity of the study was improved by including “noise” in the contrast conditions. Older and younger participants were involved in perceiving visual and/or auditory stimuli that contained spatial information. The participants responded by indicating the spatial direction (far vs. near and left vs. right conveyed in the stimuli using different wrist movements. electroencephalograms (EEGs were captured in each task trial, along with the accuracy and reaction time of the participants’ motor responses. Older participants showed a greater extent of behavioral improvements in the multisensory (as opposed to unisensory condition compared to their younger counterparts. Older participants were found to have fronto-centrally distributed super-additive P2, which was not the case for the younger participants. The P2 amplitude difference between the multisensory condition and the sum of the unisensory conditions was found to correlate significantly with performance on spatial discrimination. The results indicated that the age-related effect modulated the integrative process in the perceptual and feedback stages, particularly the evaluation of auditory stimuli. Audiovisual (AV integration may also serve a functional role during spatial-discrimination processes to compensate for the compromised attention function caused by aging.

  5. Neural Correlates of Audiovisual Integration of Semantic Category Information

    Science.gov (United States)

    Hu, Zhonghua; Zhang, Ruiling; Zhang, Qinglin; Liu, Qiang; Li, Hong

    2012-01-01

    Previous studies have found a late frontal-central audiovisual interaction during the time period about 150-220 ms post-stimulus. However, it is unclear to which process is this audiovisual interaction related: to processing of acoustical features or to classification of stimuli? To investigate this question, event-related potentials were recorded…

  6. Audiovisual alignment of co-speech gestures to speech supports word learning in 2-year-olds.

    Science.gov (United States)

    Jesse, Alexandra; Johnson, Elizabeth K

    2016-05-01

    Analyses of caregiver-child communication suggest that an adult tends to highlight objects in a child's visual scene by moving them in a manner that is temporally aligned with the adult's speech productions. Here, we used the looking-while-listening paradigm to examine whether 25-month-olds use audiovisual temporal alignment to disambiguate and learn novel word-referent mappings in a difficult word-learning task. Videos of two equally interesting and animated novel objects were simultaneously presented to children, but the movement of only one of the objects was aligned with an accompanying object-labeling audio track. No social cues (e.g., pointing, eye gaze, touch) were available to the children because the speaker was edited out of the videos. Immediately afterward, toddlers were presented with still images of the two objects and asked to look at one or the other. Toddlers looked reliably longer to the labeled object, demonstrating their acquisition of the novel word-referent mapping. A control condition showed that children's performance was not solely due to the single unambiguous labeling that had occurred at experiment onset. We conclude that the temporal link between a speaker's utterances and the motion they imposed on the referent object helps toddlers to deduce a speaker's intended reference in a difficult word-learning scenario. In combination with our previous work, these findings suggest that intersensory redundancy is a source of information used by language users of all ages. That is, intersensory redundancy is not just a word-learning tool used by young infants. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Integration of auditory and visual speech information

    NARCIS (Netherlands)

    Hall, M.; Smeele, P.M.T.; Kuhl, P.K.

    1998-01-01

    The integration of auditory and visual speech is observed when modes specify different places of articulation. Influences of auditory variation on integration were examined using consonant identifi-cation, plus quality and similarity ratings. Auditory identification predicted auditory-visual

  8. Causal inference and temporal predictions in audiovisual perception of speech and music.

    Science.gov (United States)

    Noppeney, Uta; Lee, Hwee Ling

    2018-03-31

    To form a coherent percept of the environment, the brain must integrate sensory signals emanating from a common source but segregate those from different sources. Temporal regularities are prominent cues for multisensory integration, particularly for speech and music perception. In line with models of predictive coding, we suggest that the brain adapts an internal model to the statistical regularities in its environment. This internal model enables cross-sensory and sensorimotor temporal predictions as a mechanism to arbitrate between integration and segregation of signals from different senses. © 2018 New York Academy of Sciences.

  9. The Influence of Selective and Divided Attention on Audiovisual Integration in Children.

    Science.gov (United States)

    Yang, Weiping; Ren, Yanna; Yang, Dan Ou; Yuan, Xue; Wu, Jinglong

    2016-01-24

    This article aims to investigate whether there is a difference in audiovisual integration in school-aged children (aged 6 to 13 years; mean age = 9.9 years) between the selective attention condition and divided attention condition. We designed a visual and/or auditory detection task that included three blocks (divided attention, visual-selective attention, and auditory-selective attention). The results showed that the response to bimodal audiovisual stimuli was faster than to unimodal auditory or visual stimuli under both divided attention and auditory-selective attention conditions. However, in the visual-selective attention condition, no significant difference was found between the unimodal visual and bimodal audiovisual stimuli in response speed. Moreover, audiovisual behavioral facilitation effects were compared between divided attention and selective attention (auditory or visual attention). In doing so, we found that audiovisual behavioral facilitation was significantly difference between divided attention and selective attention. The results indicated that audiovisual integration was stronger in the divided attention condition than that in the selective attention condition in children. Our findings objectively support the notion that attention can modulate audiovisual integration in school-aged children. Our study might offer a new perspective for identifying children with conditions that are associated with sustained attention deficit, such as attention-deficit hyperactivity disorder. © The Author(s) 2016.

  10. Modelling audiovisual integration of affect from videos and music.

    Science.gov (United States)

    Gao, Chuanji; Wedell, Douglas H; Kim, Jongwan; Weber, Christine E; Shinkareva, Svetlana V

    2018-05-01

    Two experiments examined how affective values from visual and auditory modalities are integrated. Experiment 1 paired music and videos drawn from three levels of valence while holding arousal constant. Experiment 2 included a parallel combination of three levels of arousal while holding valence constant. In each experiment, participants rated their affective states after unimodal and multimodal presentations. Experiment 1 revealed a congruency effect in which stimulus combinations of the same extreme valence resulted in more extreme state ratings than component stimuli presented in isolation. An interaction between music and video valence reflected the greater influence of negative affect. Video valence was found to have a significantly greater effect on combined ratings than music valence. The pattern of data was explained by a five parameter differential weight averaging model that attributed greater weight to the visual modality and increased weight with decreasing values of valence. Experiment 2 revealed a congruency effect only for high arousal combinations and no interaction effects. This pattern was explained by a three parameter constant weight averaging model with greater weight for the auditory modality and a very low arousal value for the initial state. These results demonstrate key differences in audiovisual integration between valence and arousal.

  11. Spectral integration in speech and non-speech sounds

    Science.gov (United States)

    Jacewicz, Ewa

    2005-04-01

    Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.

  12. Attenuated audiovisual integration in middle-aged adults in a discrimination task.

    Science.gov (United States)

    Yang, Weiping; Ren, Yanna

    2018-02-01

    Numerous studies have focused on the diversity of audiovisual integration between younger and older adults. However, consecutive trends in audiovisual integration throughout life are still unclear. In the present study, to clarify audiovisual integration characteristics in middle-aged adults, we instructed younger and middle-aged adults to conduct an auditory/visual stimuli discrimination experiment. Randomized streams of unimodal auditory (A), unimodal visual (V) or audiovisual stimuli were presented on the left or right hemispace of the central fixation point, and subjects were instructed to respond to the target stimuli rapidly and accurately. Our results demonstrated that the responses of middle-aged adults to all unimodal and bimodal stimuli were significantly slower than those of younger adults (p Audiovisual integration was markedly delayed (onset time 360 ms) and weaker (peak 3.97%) in middle-aged adults than in younger adults (onset time 260 ms, peak 11.86%). The results suggested that audiovisual integration was attenuated in middle-aged adults and further confirmed age-related decline in information processing.

  13. Detecting Functional Connectivity During Audiovisual Integration with MEG: A Comparison of Connectivity Metrics.

    Science.gov (United States)

    Ard, Tyler; Carver, Frederick W; Holroyd, Tom; Horwitz, Barry; Coppola, Richard

    2015-08-01

    In typical magnetoencephalography and/or electroencephalography functional connectivity analysis, researchers select one of several methods that measure a relationship between regions to determine connectivity, such as coherence, power correlations, and others. However, it is largely unknown if some are more suited than others for various types of investigations. In this study, the authors investigate seven connectivity metrics to evaluate which, if any, are sensitive to audiovisual integration by contrasting connectivity when tracking an audiovisual object versus connectivity when tracking a visual object uncorrelated with the auditory stimulus. The authors are able to assess the metrics' performances at detecting audiovisual integration by investigating connectivity between auditory and visual areas. Critically, the authors perform their investigation on a whole-cortex all-to-all mapping, avoiding confounds introduced in seed selection. The authors find that amplitude-based connectivity measures in the beta band detect strong connections between visual and auditory areas during audiovisual integration, specifically between V4/V5 and auditory cortices in the right hemisphere. Conversely, phase-based connectivity measures in the beta band as well as phase and power measures in alpha, gamma, and theta do not show connectivity between audiovisual areas. The authors postulate that while beta power correlations detect audiovisual integration in the current experimental context, it may not always be the best measure to detect connectivity. Instead, it is likely that the brain utilizes a variety of mechanisms in neuronal communication that may produce differential types of temporal relationships.

  14. Optimal Audiovisual Integration in the Ventriloquism Effect But Pervasive Deficits in Unisensory Spatial Localization in Amblyopia.

    Science.gov (United States)

    Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

    2018-01-01

    Classically understood as a deficit in spatial vision, amblyopia is increasingly recognized to also impair audiovisual multisensory processing. Studies to date, however, have not determined whether the audiovisual abnormalities reflect a failure of multisensory integration, or an optimal strategy in the face of unisensory impairment. We use the ventriloquism effect and the maximum-likelihood estimation (MLE) model of optimal integration to investigate integration of audiovisual spatial information in amblyopia. Participants with unilateral amblyopia (n = 14; mean age 28.8 years; 7 anisometropic, 3 strabismic, 4 mixed mechanism) and visually normal controls (n = 16, mean age 29.2 years) localized brief unimodal auditory, unimodal visual, and bimodal (audiovisual) stimuli during binocular viewing using a location discrimination task. A subset of bimodal trials involved the ventriloquism effect, an illusion in which auditory and visual stimuli originating from different locations are perceived as originating from a single location. Localization precision and bias were determined by psychometric curve fitting, and the observed parameters were compared with predictions from the MLE model. Spatial localization precision was significantly reduced in the amblyopia group compared with the control group for unimodal visual, unimodal auditory, and bimodal stimuli. Analyses of localization precision and bias for bimodal stimuli showed no significant deviations from the MLE model in either the amblyopia group or the control group. Despite pervasive deficits in localization precision for visual, auditory, and audiovisual stimuli, audiovisual integration remains intact and optimal in unilateral amblyopia.

  15. Crossmodal deficit in dyslexic children: practice affects the neural timing of letter-speech sound integration

    Directory of Open Access Journals (Sweden)

    Gojko eŽarić

    2015-06-01

    Full Text Available A failure to build solid letter-speech sound associations may contribute to reading impairments in developmental dyslexia. Whether this reduced neural integration of letters and speech sounds changes over time within individual children and how this relates to behavioral gains in reading skills remains unknown. In this research, we examined changes in event-related potential (ERP measures of letter-speech sound integration over a 6-month period during which 9-year-old dyslexic readers (n=17 followed a training in letter-speech sound coupling next to their regular reading curriculum. We presented the Dutch spoken vowels /a/ and /o/ as standard and deviant stimuli in one auditory and two audiovisual oddball conditions. In one audiovisual condition (AV0, the letter ‘a’ was presented simultaneously with the vowels, while in the other (AV200 it was preceding vowel onset for 200 ms. Prior to the training (T1, dyslexic readers showed the expected pattern of typical auditory mismatch responses, together with the absence of letter-speech sound effects in a late negativity (LN window. After the training (T2, our results showed earlier (and enhanced crossmodal effects in the LN window. Most interestingly, earlier LN latency at T2 was significantly related to higher behavioral accuracy in letter-speech sound coupling. On a more general level, the timing of the earlier mismatch negativity (MMN in the simultaneous condition (AV0 measured at T1, significantly related to reading fluency at both T1 and T2 as well as with reading gains. Our findings suggest that the reduced neural integration of letters and speech sounds in dyslexic children may show moderate improvement with reading instruction and training and that behavioral improvements relate especially to individual differences in the timing of this neural integration.

  16. Does hearing aid use affect audiovisual integration in mild hearing impairment?

    Science.gov (United States)

    Gieseler, Anja; Tahden, Maike A S; Thiel, Christiane M; Colonius, Hans

    2018-04-01

    There is converging evidence for altered audiovisual integration abilities in hearing-impaired individuals and those with profound hearing loss who are provided with cochlear implants, compared to normal-hearing adults. Still, little is known on the effects of hearing aid use on audiovisual integration in mild hearing loss, although this constitutes one of the most prevalent conditions in the elderly and, yet, often remains untreated in its early stages. This study investigated differences in the strength of audiovisual integration between elderly hearing aid users and those with the same degree of mild hearing loss who were not using hearing aids, the non-users, by measuring their susceptibility to the sound-induced flash illusion. We also explored the corresponding window of integration by varying the stimulus onset asynchronies. To examine general group differences that are not attributable to specific hearing aid settings but rather reflect overall changes associated with habitual hearing aid use, the group of hearing aid users was tested unaided while individually controlling for audibility. We found greater audiovisual integration together with a wider window of integration in hearing aid users compared to their age-matched untreated peers. Signal detection analyses indicate that a change in perceptual sensitivity as well as in bias may underlie the observed effects. Our results and comparisons with other studies in normal-hearing older adults suggest that both mild hearing impairment and hearing aid use seem to affect audiovisual integration, possibly in the sense that hearing aid use may reverse the effects of hearing loss on audiovisual integration. We suggest that these findings may be particularly important for auditory rehabilitation and call for a longitudinal study.

  17. Absent Audiovisual Integration Elicited by Peripheral Stimuli in Parkinson’s Disease

    Directory of Open Access Journals (Sweden)

    Yanna Ren

    2018-01-01

    Full Text Available The basal ganglia, which have been shown to be a significant multisensory hub, are disordered in Parkinson’s disease (PD. This study was to investigate the audiovisual integration of peripheral stimuli in PD patients with/without sleep disturbances. Thirty-six age-matched normal controls (NC and 30 PD patients were recruited for an auditory/visual discrimination experiment. The mean response times for each participant were analyzed using repeated measures ANOVA and race model. The results showed that the response to all stimuli was significantly delayed for PD compared to NC (all p0.05. The current results showed that audiovisual multisensory integration for peripheral stimuli is absent in PD regardless of sleep disturbances and further suggested the abnormal audiovisual integration might be a potential early manifestation of PD.

  18. The effect of a concurrent working memory task and temporal offsets on the integration of auditory and visual speech information.

    Science.gov (United States)

    Buchan, Julie N; Munhall, Kevin G

    2012-01-01

    Audiovisual speech perception is an everyday occurrence of multisensory integration. Conflicting visual speech information can influence the perception of acoustic speech (namely the McGurk effect), and auditory and visual speech are integrated over a rather wide range of temporal offsets. This research examined whether the addition of a concurrent cognitive load task would affect the audiovisual integration in a McGurk speech task and whether the cognitive load task would cause more interference at increasing offsets. The amount of integration was measured by the proportion of responses in incongruent trials that did not correspond to the audio (McGurk response). An eye-tracker was also used to examine whether the amount of temporal offset and the presence of a concurrent cognitive load task would influence gaze behavior. Results from this experiment show a very modest but statistically significant decrease in the number of McGurk responses when subjects also perform a cognitive load task, and that this effect is relatively constant across the various temporal offsets. Participant's gaze behavior was also influenced by the addition of a cognitive load task. Gaze was less centralized on the face, less time was spent looking at the mouth and more time was spent looking at the eyes, when a concurrent cognitive load task was added to the speech task.

  19. Integration of speech and gesture in aphasia.

    Science.gov (United States)

    Cocks, Naomi; Byrne, Suzanne; Pritchard, Madeleine; Morgan, Gary; Dipper, Lucy

    2018-02-07

    Information from speech and gesture is often integrated to comprehend a message. This integration process requires the appropriate allocation of cognitive resources to both the gesture and speech modalities. People with aphasia are likely to find integration of gesture and speech difficult. This is due to a reduction in cognitive resources, a difficulty with resource allocation or a combination of the two. Despite it being likely that people who have aphasia will have difficulty with integration, empirical evidence describing this difficulty is limited. Such a difficulty was found in a single case study by Cocks et al. in 2009, and is replicated here with a greater number of participants. To determine whether individuals with aphasia have difficulties understanding messages in which they have to integrate speech and gesture. Thirty-one participants with aphasia (PWA) and 30 control participants watched videos of an actor communicating a message in three different conditions: verbal only, gesture only, and verbal and gesture message combined. The message related to an action in which the name of the action (e.g., 'eat') was provided verbally and the manner of the action (e.g., hands in a position as though eating a burger) was provided gesturally. Participants then selected a picture that 'best matched' the message conveyed from a choice of four pictures which represented a gesture match only (G match), a verbal match only (V match), an integrated verbal-gesture match (Target) and an unrelated foil (UR). To determine the gain that participants obtained from integrating gesture and speech, a measure of multimodal gain (MMG) was calculated. The PWA were less able to integrate gesture and speech than the control participants and had significantly lower MMG scores. When the PWA had difficulty integrating, they more frequently selected the verbal match. The findings suggest that people with aphasia can have difficulty integrating speech and gesture in order to obtain

  20. A Comparison of the Development of Audiovisual Integration in Children with Autism Spectrum Disorders and Typically Developing Children

    Science.gov (United States)

    Taylor, Natalie; Isaac, Claire; Milne, Elizabeth

    2010-01-01

    This study aimed to investigate the development of audiovisual integration in children with Autism Spectrum Disorder (ASD). Audiovisual integration was measured using the McGurk effect in children with ASD aged 7-16 years and typically developing children (control group) matched approximately for age, sex, nonverbal ability and verbal ability.…

  1. Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

    Directory of Open Access Journals (Sweden)

    José Antonio Palao Errando

    2007-10-01

    Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.

  2. Effects of Sound Frequency on Audiovisual Integration: An Event-Related Potential Study.

    Science.gov (United States)

    Yang, Weiping; Yang, Jingjing; Gao, Yulin; Tang, Xiaoyu; Ren, Yanna; Takahashi, Satoshi; Wu, Jinglong

    2015-01-01

    A combination of signals across modalities can facilitate sensory perception. The audiovisual facilitative effect strongly depends on the features of the stimulus. Here, we investigated how sound frequency, which is one of basic features of an auditory signal, modulates audiovisual integration. In this study, the task of the participant was to respond to a visual target stimulus by pressing a key while ignoring auditory stimuli, comprising of tones of different frequencies (0.5, 1, 2.5 and 5 kHz). A significant facilitation of reaction times was obtained following audiovisual stimulation, irrespective of whether the task-irrelevant sounds were low or high frequency. Using event-related potential (ERP), audiovisual integration was found over the occipital area for 0.5 kHz auditory stimuli from 190-210 ms, for 1 kHz stimuli from 170-200 ms, for 2.5 kHz stimuli from 140-200 ms, 5 kHz stimuli from 100-200 ms. These findings suggest that a higher frequency sound signal paired with visual stimuli might be early processed or integrated despite the auditory stimuli being task-irrelevant information. Furthermore, audiovisual integration in late latency (300-340 ms) ERPs with fronto-central topography was found for auditory stimuli of lower frequencies (0.5, 1 and 2.5 kHz). Our results confirmed that audiovisual integration is affected by the frequency of an auditory stimulus. Taken together, the neurophysiological results provide unique insight into how the brain processes a multisensory visual signal and auditory stimuli of different frequencies.

  3. Crossmodal integration enhances neural representation of task-relevant features in audiovisual face perception.

    Science.gov (United States)

    Li, Yuanqing; Long, Jinyi; Huang, Biao; Yu, Tianyou; Wu, Wei; Liu, Yongjian; Liang, Changhong; Sun, Pei

    2015-02-01

    Previous studies have shown that audiovisual integration improves identification performance and enhances neural activity in heteromodal brain areas, for example, the posterior superior temporal sulcus/middle temporal gyrus (pSTS/MTG). Furthermore, it has also been demonstrated that attention plays an important role in crossmodal integration. In this study, we considered crossmodal integration in audiovisual facial perception and explored its effect on the neural representation of features. The audiovisual stimuli in the experiment consisted of facial movie clips that could be classified into 2 gender categories (male vs. female) or 2 emotion categories (crying vs. laughing). The visual/auditory-only stimuli were created from these movie clips by removing the auditory/visual contents. The subjects needed to make a judgment about the gender/emotion category for each movie clip in the audiovisual, visual-only, or auditory-only stimulus condition as functional magnetic resonance imaging (fMRI) signals were recorded. The neural representation of the gender/emotion feature was assessed using the decoding accuracy and the brain pattern-related reproducibility indices, obtained by a multivariate pattern analysis method from the fMRI data. In comparison to the visual-only and auditory-only stimulus conditions, we found that audiovisual integration enhanced the neural representation of task-relevant features and that feature-selective attention might play a role of modulation in the audiovisual integration. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Looking Behavior and Audiovisual Speech Understanding in Children With Normal Hearing and Children With Mild Bilateral or Unilateral Hearing Loss.

    Science.gov (United States)

    Lewis, Dawna E; Smith, Nicholas A; Spalding, Jody L; Valente, Daniel L

    Visual information from talkers facilitates speech intelligibility for listeners when audibility is challenged by environmental noise and hearing loss. Less is known about how listeners actively process and attend to visual information from different talkers in complex multi-talker environments. This study tracked looking behavior in children with normal hearing (NH), mild bilateral hearing loss (MBHL), and unilateral hearing loss (UHL) in a complex multi-talker environment to examine the extent to which children look at talkers and whether looking patterns relate to performance on a speech-understanding task. It was hypothesized that performance would decrease as perceptual complexity increased and that children with hearing loss would perform more poorly than their peers with NH. Children with MBHL or UHL were expected to demonstrate greater attention to individual talkers during multi-talker exchanges, indicating that they were more likely to attempt to use visual information from talkers to assist in speech understanding in adverse acoustics. It also was of interest to examine whether MBHL, versus UHL, would differentially affect performance and looking behavior. Eighteen children with NH, eight children with MBHL, and 10 children with UHL participated (8-12 years). They followed audiovisual instructions for placing objects on a mat under three conditions: a single talker providing instructions via a video monitor, four possible talkers alternately providing instructions on separate monitors in front of the listener, and the same four talkers providing both target and nontarget information. Multi-talker background noise was presented at a 5 dB signal-to-noise ratio during testing. An eye tracker monitored looking behavior while children performed the experimental task. Behavioral task performance was higher for children with NH than for either group of children with hearing loss. There were no differences in performance between children with UHL and children

  5. "Audio-visuel Integre" et Communication(s) ("Integrated Audiovisual" and Communication)

    Science.gov (United States)

    Moirand, Sophie

    1974-01-01

    This article examines the usefullness of the audiovisual method in teaching communication competence, and calls for research in audiovisual methods as well as in communication theory for improvement in these areas. (Text is in French.) (AM)

  6. Audiovisual integration increases the intentional step synchronization of side-by-side walkers.

    Science.gov (United States)

    Noy, Dominic; Mouta, Sandra; Lamas, Joao; Basso, Daniel; Silva, Carlos; Santos, Jorge A

    2017-12-01

    When people walk side-by-side, they often synchronize their steps. To achieve this, individuals might cross-modally match audiovisual signals from the movements of the partner and kinesthetic, cutaneous, visual and auditory signals from their own movements. Because signals from different sensory systems are processed with noise and asynchronously, the challenge of the CNS is to derive the best estimate based on this conflicting information. This is currently thought to be done by a mechanism operating as a Maximum Likelihood Estimator (MLE). The present work investigated whether audiovisual signals from the partner are integrated according to MLE in order to synchronize steps during walking. Three experiments were conducted in which the sensory cues from a walking partner were virtually simulated. In Experiment 1 seven participants were instructed to synchronize with human-sized Point Light Walkers and/or footstep sounds. Results revealed highest synchronization performance with auditory and audiovisual cues. This was quantified by the time to achieve synchronization and by synchronization variability. However, this auditory dominance effect might have been due to artifacts of the setup. Therefore, in Experiment 2 human-sized virtual mannequins were implemented. Also, audiovisual stimuli were rendered in real-time and thus were synchronous and co-localized. All four participants synchronized best with audiovisual cues. For three of the four participants results point toward their optimal integration consistent with the MLE model. Experiment 3 yielded performance decrements for all three participants when the cues were incongruent. Overall, these findings suggest that individuals might optimally integrate audiovisual cues to synchronize steps during side-by-side walking. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Audiovisual laughter detection based on temporal features

    NARCIS (Netherlands)

    Petridis, Stavros; Nijholt, Antinus; Nijholt, A.; Pantic, M.; Pantic, Maja; Poel, Mannes; Poel, M.; Hondorp, G.H.W.

    2008-01-01

    Previous research on automatic laughter detection has mainly been focused on audio-based detection. In this study we present an audiovisual approach to distinguishing laughter from speech based on temporal features and we show that the integration of audio and visual information leads to improved

  8. The contribution of perceptual factors and training on varying audiovisual integration capacity.

    Science.gov (United States)

    Wilbiks, Jonathan M P; Dyson, Benjamin J

    2018-06-01

    The suggestion that the capacity of audiovisual integration has an upper limit of 1 was challenged in 4 experiments using perceptual factors and training to enhance the binding of auditory and visual information. Participants were required to note a number of specific visual dot locations that changed in polarity when a critical auditory stimulus was presented, under relatively fast (200-ms stimulus onset asynchrony [SOA]) and slow (700-ms SOA) rates of presentation. In Experiment 1, transient cross-modal congruency between the brightness of polarity change and pitch of the auditory tone was manipulated. In Experiment 2, sustained chunking was enabled on certain trials by connecting varying dot locations with vertices. In Experiment 3, training was employed to determine if capacity would increase through repeated experience with an intermediate presentation rate (450 ms). Estimates of audiovisual integration capacity (K) were larger than 1 during cross-modal congruency at slow presentation rates (Experiment 1), during perceptual chunking at slow and fast presentation rates (Experiment 2), and, during an intermediate presentation rate posttraining (Experiment 3). Finally, Experiment 4 showed a linear increase in K using SOAs ranging from 100 to 600 ms, suggestive of quantitative rather than qualitative changes in the mechanisms in audiovisual integration as a function of presentation rate. The data compromise the suggestion that the capacity of audiovisual integration is limited to 1 and suggest that the ability to bind sounds to sights is contingent on individual and environmental factors. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  9. Visemic Processing in Audiovisual Discrimination of Natural Speech: A Simultaneous fMRI-EEG Study

    Science.gov (United States)

    Dubois, Cyril; Otzenberger, Helene; Gounot, Daniel; Sock, Rudolph; Metz-Lutz, Marie-Noelle

    2012-01-01

    In a noisy environment, visual perception of articulatory movements improves natural speech intelligibility. Parallel to phonemic processing based on auditory signal, visemic processing constitutes a counterpart based on "visemes", the distinctive visual units of speech. Aiming at investigating the neural substrates of visemic processing in a…

  10. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    Directory of Open Access Journals (Sweden)

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  11. Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

    Science.gov (United States)

    D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

    2016-08-01

    Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Comparison for younger and older adults: Stimulus temporal asynchrony modulates audiovisual integration.

    Science.gov (United States)

    Ren, Yanna; Ren, Yanling; Yang, Weiping; Tang, Xiaoyu; Wu, Fengxia; Wu, Qiong; Takahashi, Satoshi; Ejima, Yoshimichi; Wu, Jinglong

    2018-02-01

    Recent research has shown that the magnitudes of responses to multisensory information are highly dependent on the stimulus structure. The temporal proximity of multiple signal inputs is a critical determinant for cross-modal integration. Here, we investigated the influence that temporal asynchrony has on audiovisual integration in both younger and older adults using event-related potentials (ERP). Our results showed that in the simultaneous audiovisual condition, except for the earliest integration (80-110ms), which occurred in the occipital region for older adults was absent for younger adults, early integration was similar for the younger and older groups. Additionally, late integration was delayed in older adults (280-300ms) compared to younger adults (210-240ms). In audition‑leading vision conditions, the earliest integration (80-110ms) was absent in younger adults but did occur in older adults. Additionally, after increasing the temporal disparity from 50ms to 100ms, late integration was delayed in both younger (from 230 to 290ms to 280-300ms) and older (from 210 to 240ms to 280-300ms) adults. In the audition-lagging vision conditions, integration only occurred in the A100V condition for younger adults and in the A50V condition for older adults. The current results suggested that the audiovisual temporal integration pattern differed between the audition‑leading and audition-lagging vision conditions and further revealed the varying effect of temporal asynchrony on audiovisual integration in younger and older adults. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

    Directory of Open Access Journals (Sweden)

    Alan James Power

    2012-07-01

    Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling

  14. The Efficacy of Short-term Gated Audiovisual Speech Training for Improving Auditory Sentence Identification in Noise in Elderly Hearing Aid Users

    Science.gov (United States)

    Moradi, Shahram; Wahlin, Anna; Hällgren, Mathias; Rönnberg, Jerker; Lidestam, Björn

    2017-01-01

    This study aimed to examine the efficacy and maintenance of short-term (one-session) gated audiovisual speech training for improving auditory sentence identification in noise in experienced elderly hearing-aid users. Twenty-five hearing aid users (16 men and 9 women), with an average age of 70.8 years, were randomly divided into an experimental (audiovisual training, n = 14) and a control (auditory training, n = 11) group. Participants underwent gated speech identification tasks comprising Swedish consonants and words presented at 65 dB sound pressure level with a 0 dB signal-to-noise ratio (steady-state broadband noise), in audiovisual or auditory-only training conditions. The Hearing-in-Noise Test was employed to measure participants’ auditory sentence identification in noise before the training (pre-test), promptly after training (post-test), and 1 month after training (one-month follow-up). The results showed that audiovisual training improved auditory sentence identification in noise promptly after the training (post-test vs. pre-test scores); furthermore, this improvement was maintained 1 month after the training (one-month follow-up vs. pre-test scores). Such improvement was not observed in the control group, neither promptly after the training nor at the one-month follow-up. However, no significant between-groups difference nor an interaction between groups and session was observed. Conclusion: Audiovisual training may be considered in aural rehabilitation of hearing aid users to improve listening capabilities in noisy conditions. However, the lack of a significant between-groups effect (audiovisual vs. auditory) or an interaction between group and session calls for further research. PMID:28348542

  15. Reduced neural integration of letters and speech sounds in dyslexic children scales with individual differences in reading fluency.

    Directory of Open Access Journals (Sweden)

    Gojko Žarić

    Full Text Available The acquisition of letter-speech sound associations is one of the basic requirements for fluent reading acquisition and its failure may contribute to reading difficulties in developmental dyslexia. Here we investigated event-related potential (ERP measures of letter-speech sound integration in 9-year-old typical and dyslexic readers and specifically test their relation to individual differences in reading fluency. We employed an audiovisual oddball paradigm in typical readers (n = 20, dysfluent (n = 18 and severely dysfluent (n = 18 dyslexic children. In one auditory and two audiovisual conditions the Dutch spoken vowels/a/and/o/were presented as standard and deviant stimuli. In audiovisual blocks, the letter 'a' was presented either simultaneously (AV0, or 200 ms before (AV200 vowel sound onset. Across the three children groups, vowel deviancy in auditory blocks elicited comparable mismatch negativity (MMN and late negativity (LN responses. In typical readers, both audiovisual conditions (AV0 and AV200 led to enhanced MMN and LN amplitudes. In both dyslexic groups, the audiovisual LN effects were mildly reduced. Most interestingly, individual differences in reading fluency were correlated with MMN latency in the AV0 condition. A further analysis revealed that this effect was driven by a short-lived MMN effect encompassing only the N1 window in severely dysfluent dyslexics versus a longer MMN effect encompassing both the N1 and P2 windows in the other two groups. Our results confirm and extend previous findings in dyslexic children by demonstrating a deficient pattern of letter-speech sound integration depending on the level of reading dysfluency. These findings underscore the importance of considering individual differences across the entire spectrum of reading skills in addition to group differences between typical and dyslexic readers.

  16. Audiovisual integration in hemianopia: A neurocomputational account based on cortico-collicular interaction.

    Science.gov (United States)

    Magosso, Elisa; Bertini, Caterina; Cuppini, Cristiano; Ursino, Mauro

    2016-10-01

    Hemianopic patients retain some abilities to integrate audiovisual stimuli in the blind hemifield, showing both modulation of visual perception by auditory stimuli and modulation of auditory perception by visual stimuli. Indeed, conscious detection of a visual target in the blind hemifield can be improved by a spatially coincident auditory stimulus (auditory enhancement of visual detection), while a visual stimulus in the blind hemifield can improve localization of a spatially coincident auditory stimulus (visual enhancement of auditory localization). To gain more insight into the neural mechanisms underlying these two perceptual phenomena, we propose a neural network model including areas of neurons representing the retina, primary visual cortex (V1), extrastriate visual cortex, auditory cortex and the Superior Colliculus (SC). The visual and auditory modalities in the network interact via both direct cortical-cortical connections and subcortical-cortical connections involving the SC; the latter, in particular, integrates visual and auditory information and projects back to the cortices. Hemianopic patients were simulated by unilaterally lesioning V1, and preserving spared islands of V1 tissue within the lesion, to analyze the role of residual V1 neurons in mediating audiovisual integration. The network is able to reproduce the audiovisual phenomena in hemianopic patients, linking perceptions to neural activations, and disentangles the individual contribution of specific neural circuits and areas via sensitivity analyses. The study suggests i) a common key role of SC-cortical connections in mediating the two audiovisual phenomena; ii) a different role of visual cortices in the two phenomena: auditory enhancement of conscious visual detection being conditional on surviving V1 islands, while visual enhancement of auditory localization persisting even after complete V1 damage. The present study may contribute to advance understanding of the audiovisual dialogue

  17. Immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention: a mismatch negativity study.

    Science.gov (United States)

    Li, X; Yang, Y; Ren, G

    2009-06-16

    Language is often perceived together with visual information. Recent experimental evidences indicated that, during spoken language comprehension, the brain can immediately integrate visual information with semantic or syntactic information from speech. Here we used the mismatch negativity to further investigate whether prosodic information from speech could be immediately integrated into a visual scene context or not, and especially the time course and automaticity of this integration process. Sixteen Chinese native speakers participated in the study. The materials included Chinese spoken sentences and picture pairs. In the audiovisual situation, relative to the concomitant pictures, the spoken sentence was appropriately accented in the standard stimuli, but inappropriately accented in the two kinds of deviant stimuli. In the purely auditory situation, the speech sentences were presented without pictures. It was found that the deviants evoked mismatch responses in both audiovisual and purely auditory situations; the mismatch negativity in the purely auditory situation peaked at the same time as, but was weaker than that evoked by the same deviant speech sounds in the audiovisual situation. This pattern of results suggested immediate integration of prosodic information from speech and visual information from pictures in the absence of focused attention.

  18. Audio-Visual Integration Modifies Emotional Judgment in Music

    Directory of Open Access Journals (Sweden)

    Shen-Yuan Su

    2011-10-01

    Full Text Available The conventional view that perceived emotion in music is derived mainly from auditory signals has led to neglect of the contribution of visual image. In this study, we manipulated mode (major vs. minor and examined the influence of a video image on emotional judgment in music. Melodies in either major or minor mode were controlled for tempo and rhythm and played to the participants. We found that Taiwanese participants, like Westerners, judged major melodies as expressing positive, and minor melodies negative, emotions. The major or minor melodies were then paired with video images of the singers, which were either emotionally congruent or incongruent with their modes. Results showed that participants perceived stronger positive or negative emotions with congruent audio-visual stimuli. Compared to listening to music alone, stronger emotions were perceived when an emotionally congruent video image was added and weaker emotions were perceived when an incongruent image was added. We therefore demonstrate that mode is important to perceive the emotional valence in music and that treating musical art as a purely auditory event might lose the enhanced emotional strength perceived in music, since going to a concert may lead to stronger perceived emotion than listening to the CD at home.

  19. Audiovisual integration in depth: multisensory binding and gain as a function of distance.

    Science.gov (United States)

    Noel, Jean-Paul; Modi, Kahan; Wallace, Mark T; Van der Stoep, Nathan

    2018-07-01

    The integration of information across sensory modalities is dependent on the spatiotemporal characteristics of the stimuli that are paired. Despite large variation in the distance over which events occur in our environment, relatively little is known regarding how stimulus-observer distance affects multisensory integration. Prior work has suggested that exteroceptive stimuli are integrated over larger temporal intervals in near relative to far space, and that larger multisensory facilitations are evident in far relative to near space. Here, we sought to examine the interrelationship between these previously established distance-related features of multisensory processing. Participants performed an audiovisual simultaneity judgment and redundant target task in near and far space, while audiovisual stimuli were presented at a range of temporal delays (i.e., stimulus onset asynchronies). In line with the previous findings, temporal acuity was poorer in near relative to far space. Furthermore, reaction time to asynchronously presented audiovisual targets suggested a temporal window for fast detection-a range of stimuli asynchronies that was also larger in near as compared to far space. However, the range of reaction times over which multisensory response enhancement was observed was limited to a restricted range of relatively small (i.e., 150 ms) asynchronies, and did not differ significantly between near and far space. Furthermore, for synchronous presentations, these distance-related (i.e., near vs. far) modulations in temporal acuity and multisensory gain correlated negatively at an individual subject level. Thus, the findings support the conclusion that multisensory temporal binding and gain are asymmetrically modulated as a function of distance from the observer, and specifies that this relationship is specific for temporally synchronous audiovisual stimulus presentations.

  20. Audiovisual perception in amblyopia: A review and synthesis.

    Science.gov (United States)

    Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

    2018-05-17

    Amblyopia is a common developmental sensory disorder that has been extensively and systematically investigated as a unisensory visual impairment. However, its effects are increasingly recognized to extend beyond vision to the multisensory domain. Indeed, amblyopia is associated with altered cross-modal interactions in audiovisual temporal perception, audiovisual spatial perception, and audiovisual speech perception. Furthermore, although the visual impairment in amblyopia is typically unilateral, the multisensory abnormalities tend to persist even when viewing with both eyes. Knowledge of the extent and mechanisms of the audiovisual impairments in amblyopia, however, remains in its infancy. This work aims to review our current understanding of audiovisual processing and integration deficits in amblyopia, and considers the possible mechanisms underlying these abnormalities. Copyright © 2018. Published by Elsevier Ltd.

  1. Reliability-Weighted Integration of Audiovisual Signals Can Be Modulated by Top-down Attention

    Science.gov (United States)

    Noppeney, Uta

    2018-01-01

    Abstract Behaviorally, it is well established that human observers integrate signals near-optimally weighted in proportion to their reliabilities as predicted by maximum likelihood estimation. Yet, despite abundant behavioral evidence, it is unclear how the human brain accomplishes this feat. In a spatial ventriloquist paradigm, participants were presented with auditory, visual, and audiovisual signals and reported the location of the auditory or the visual signal. Combining psychophysics, multivariate functional MRI (fMRI) decoding, and models of maximum likelihood estimation (MLE), we characterized the computational operations underlying audiovisual integration at distinct cortical levels. We estimated observers’ behavioral weights by fitting psychometric functions to participants’ localization responses. Likewise, we estimated the neural weights by fitting neurometric functions to spatial locations decoded from regional fMRI activation patterns. Our results demonstrate that low-level auditory and visual areas encode predominantly the spatial location of the signal component of a region’s preferred auditory (or visual) modality. By contrast, intraparietal sulcus forms spatial representations by integrating auditory and visual signals weighted by their reliabilities. Critically, the neural and behavioral weights and the variance of the spatial representations depended not only on the sensory reliabilities as predicted by the MLE model but also on participants’ modality-specific attention and report (i.e., visual vs. auditory). These results suggest that audiovisual integration is not exclusively determined by bottom-up sensory reliabilities. Instead, modality-specific attention and report can flexibly modulate how intraparietal sulcus integrates sensory signals into spatial representations to guide behavioral responses (e.g., localization and orienting). PMID:29527567

  2. Multisensory integration of speech sounds with letters vs. visual speech : only visual speech induces the mismatch negativity

    NARCIS (Netherlands)

    Stekelenburg, J.J.; Keetels, M.N.; Vroomen, J.H.M.

    2018-01-01

    Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect.

  3. An audiovisual emotion recognition system

    Science.gov (United States)

    Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun

    2007-12-01

    Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.

  4. Spatio-temporal distribution of brain activity associated with audio-visually congruent and incongruent speech and the McGurk Effect.

    Science.gov (United States)

    Pratt, Hillel; Bleich, Naomi; Mittelman, Nomi

    2015-11-01

    Spatio-temporal distributions of cortical activity to audio-visual presentations of meaningless vowel-consonant-vowels and the effects of audio-visual congruence/incongruence, with emphasis on the McGurk effect, were studied. The McGurk effect occurs when a clearly audible syllable with one consonant, is presented simultaneously with a visual presentation of a face articulating a syllable with a different consonant and the resulting percept is a syllable with a consonant other than the auditorily presented one. Twenty subjects listened to pairs of audio-visually congruent or incongruent utterances and indicated whether pair members were the same or not. Source current densities of event-related potentials to the first utterance in the pair were estimated and effects of stimulus-response combinations, brain area, hemisphere, and clarity of visual articulation were assessed. Auditory cortex, superior parietal cortex, and middle temporal cortex were the most consistently involved areas across experimental conditions. Early (visual cortex. Clarity of visual articulation impacted activity in secondary visual cortex and Wernicke's area. McGurk perception was associated with decreased activity in primary and secondary auditory cortices and Wernicke's area before 100 msec, increased activity around 100 msec which decreased again around 180 msec. Activity in Broca's area was unaffected by McGurk perception and was only increased to congruent audio-visual stimuli 30-70 msec following consonant onset. The results suggest left hemisphere prominence in the effects of stimulus and response conditions on eight brain areas involved in dynamically distributed parallel processing of audio-visual integration. Initially (30-70 msec) subcortical contributions to auditory cortex, superior parietal cortex, and middle temporal cortex occur. During 100-140 msec, peristriate visual influences and Wernicke's area join in the processing. Resolution of incongruent audio-visual inputs is then

  5. Gone in a Flash: Manipulation of Audiovisual Temporal Integration Using Transcranial Magnetic Stimulation

    Directory of Open Access Journals (Sweden)

    Roy eHamilton

    2013-09-01

    Full Text Available While converging evidence implicates the right inferior parietal lobule in audiovisual integration, its role has not been fully elucidated by direct manipulation of cortical activity. Replicating and extending an experiment initially reported by Kamke, Vieth, Cottrell, and Mattingley (2012, we employed the sound-induced flash illusion, in which a single visual flash, when accompanied by two auditory tones, is misperceived as multiple flashes (Wilson, 1987; Shams, et al., 2000. Slow repetitive (1Hz TMS administered to the right angular gyrus, but not the right supramarginal gyrus, induced a transient decrease in the Peak Perceived Flashes (PPF, reflecting reduced susceptibility to the illusion. This finding independently confirms that perturbation of networks involved in multisensory integration can result in a more veridical representation of asynchronous auditory and visual events and that cross-modal integration is an active process in which the objective is the identification of a meaningful constellation of inputs, at times at the expense of accuracy.

  6. PRACTICING SPEECH THERAPY INTERVENTION FOR SOCIAL INTEGRATION OF CHILDREN WITH SPEECH DISORDERS

    Directory of Open Access Journals (Sweden)

    Martin Ofelia POPESCU

    2016-11-01

    Full Text Available The article presents a concise speech correction intervention program in of dyslalia in conjunction with capacity development of intra, interpersonal and social integration of children with speech disorders. The program main objectives represent: the potential increasing of individual social integration by correcting speech disorders in conjunction with intra- and interpersonal capacity, the potential growth of children and community groups for social integration by optimizing the socio-relational context of children with speech disorder. In the program were included 60 children / students with dyslalia speech disorders (monomorphic and polymorphic dyslalia, from 11 educational institutions - 6 kindergartens and 5 schools / secondary schools, joined with inter-school logopedic centre (CLI from Targu Jiu city and areas of Gorj district. The program was implemented under the assumption that therapeutic-formative intervention to correct speech disorders and facilitate the social integration will lead, in combination with correct pronunciation disorders, to social integration optimization of children with speech disorders. The results conirm the hypothesis and gives facts about the intervention program eficiency.

  7. Effects of auditory stimuli in the horizontal plane on audiovisual integration: an event-related potential study.

    Science.gov (United States)

    Yang, Weiping; Li, Qi; Ochi, Tatsuya; Yang, Jingjing; Gao, Yulin; Tang, Xiaoyu; Takahashi, Satoshi; Wu, Jinglong

    2013-01-01

    This article aims to investigate whether auditory stimuli in the horizontal plane, particularly originating from behind the participant, affect audiovisual integration by using behavioral and event-related potential (ERP) measurements. In this study, visual stimuli were presented directly in front of the participants, auditory stimuli were presented at one location in an equidistant horizontal plane at the front (0°, the fixation point), right (90°), back (180°), or left (270°) of the participants, and audiovisual stimuli that include both visual stimuli and auditory stimuli originating from one of the four locations were simultaneously presented. These stimuli were presented randomly with equal probability; during this time, participants were asked to attend to the visual stimulus and respond promptly only to visual target stimuli (a unimodal visual target stimulus and the visual target of the audiovisual stimulus). A significant facilitation of reaction times and hit rates was obtained following audiovisual stimulation, irrespective of whether the auditory stimuli were presented in the front or back of the participant. However, no significant interactions were found between visual stimuli and auditory stimuli from the right or left. Two main ERP components related to audiovisual integration were found: first, auditory stimuli from the front location produced an ERP reaction over the right temporal area and right occipital area at approximately 160-200 milliseconds; second, auditory stimuli from the back produced a reaction over the parietal and occipital areas at approximately 360-400 milliseconds. Our results confirmed that audiovisual integration was also elicited, even though auditory stimuli were presented behind the participant, but no integration occurred when auditory stimuli were presented in the right or left spaces, suggesting that the human brain might be particularly sensitive to information received from behind than both sides.

  8. 'When birds of a feather flock together': synesthetic correspondences modulate audiovisual integration in non-synesthetes.

    Directory of Open Access Journals (Sweden)

    Cesare Valerio Parise

    Full Text Available BACKGROUND: Synesthesia is a condition in which the stimulation of one sense elicits an additional experience, often in a different (i.e., unstimulated sense. Although only a small proportion of the population is synesthetic, there is growing evidence to suggest that neurocognitively-normal individuals also experience some form of synesthetic association between the stimuli presented to different sensory modalities (i.e., between auditory pitch and visual size, where lower frequency tones are associated with large objects and higher frequency tones with small objects. While previous research has highlighted crossmodal interactions between synesthetically corresponding dimensions, the possible role of synesthetic associations in multisensory integration has not been considered previously. METHODOLOGY: Here we investigate the effects of synesthetic associations by presenting pairs of asynchronous or spatially discrepant visual and auditory stimuli that were either synesthetically matched or mismatched. In a series of three psychophysical experiments, participants reported the relative temporal order of presentation or the relative spatial locations of the two stimuli. PRINCIPAL FINDINGS: The reliability of non-synesthetic participants' estimates of both audiovisual temporal asynchrony and spatial discrepancy were lower for pairs of synesthetically matched as compared to synesthetically mismatched audiovisual stimuli. CONCLUSIONS: Recent studies of multisensory integration have shown that the reduced reliability of perceptual estimates regarding intersensory conflicts constitutes the marker of a stronger coupling between the unisensory signals. Our results therefore indicate a stronger coupling of synesthetically matched vs. mismatched stimuli and provide the first psychophysical evidence that synesthetic congruency can promote multisensory integration. Synesthetic crossmodal correspondences therefore appear to play a crucial (if unacknowledged

  9. The Dynamics and Neural Correlates of Audio-Visual Integration Capacity as Determined by Temporal Unpredictability, Proactive Interference, and SOA.

    Directory of Open Access Journals (Sweden)

    Jonathan M P Wilbiks

    Full Text Available Over 5 experiments, we challenge the idea that the capacity of audio-visual integration need be fixed at 1 item. We observe that the conditions under which audio-visual integration is most likely to exceed 1 occur when stimulus change operates at a slow rather than fast rate of presentation and when the task is of intermediate difficulty such as when low levels of proactive interference (3 rather than 8 interfering visual presentations are combined with the temporal unpredictability of the critical frame (Experiment 2, or, high levels of proactive interference are combined with the temporal predictability of the critical frame (Experiment 4. Neural data suggest that capacity might also be determined by the quality of perceptual information entering working memory. Experiment 5 supported the proposition that audio-visual integration was at play during the previous experiments. The data are consistent with the dynamic nature usually associated with cross-modal binding, and while audio-visual integration capacity likely cannot exceed uni-modal capacity estimates, performance may be better than being able to associate only one visual stimulus with one auditory stimulus.

  10. The Dynamics and Neural Correlates of Audio-Visual Integration Capacity as Determined by Temporal Unpredictability, Proactive Interference, and SOA.

    Science.gov (United States)

    Wilbiks, Jonathan M P; Dyson, Benjamin J

    2016-01-01

    Over 5 experiments, we challenge the idea that the capacity of audio-visual integration need be fixed at 1 item. We observe that the conditions under which audio-visual integration is most likely to exceed 1 occur when stimulus change operates at a slow rather than fast rate of presentation and when the task is of intermediate difficulty such as when low levels of proactive interference (3 rather than 8 interfering visual presentations) are combined with the temporal unpredictability of the critical frame (Experiment 2), or, high levels of proactive interference are combined with the temporal predictability of the critical frame (Experiment 4). Neural data suggest that capacity might also be determined by the quality of perceptual information entering working memory. Experiment 5 supported the proposition that audio-visual integration was at play during the previous experiments. The data are consistent with the dynamic nature usually associated with cross-modal binding, and while audio-visual integration capacity likely cannot exceed uni-modal capacity estimates, performance may be better than being able to associate only one visual stimulus with one auditory stimulus.

  11. The duration of uncertain times: audiovisual information about intervals is integrated in a statistically optimal fashion.

    Directory of Open Access Journals (Sweden)

    Jess Hartcher-O'Brien

    Full Text Available Often multisensory information is integrated in a statistically optimal fashion where each sensory source is weighted according to its precision. This integration scheme isstatistically optimal because it theoretically results in unbiased perceptual estimates with the highest precisionpossible.There is a current lack of consensus about how the nervous system processes multiple sensory cues to elapsed time.In order to shed light upon this, we adopt a computational approach to pinpoint the integration strategy underlying duration estimationof audio/visual stimuli. One of the assumptions of our computational approach is that the multisensory signals redundantly specify the same stimulus property. Our results clearly show that despite claims to the contrary, perceived duration is the result of an optimal weighting process, similar to that adopted for estimates of space. That is, participants weight the audio and visual information to arrive at the most precise, single duration estimate possible. The work also disentangles how different integration strategies - i.e. consideringthe time of onset/offset ofsignals - might alter the final estimate. As such we provide the first concrete evidence of an optimal integration strategy in human duration estimates.

  12. Aging and Spectro-Temporal Integration of Speech

    Directory of Open Access Journals (Sweden)

    John H. Grose

    2016-10-01

    Full Text Available The purpose of this study was to determine the effects of age on the spectro-temporal integration of speech. The hypothesis was that the integration of speech fragments distributed over frequency, time, and ear of presentation is reduced in older listeners—even for those with good audiometric hearing. Younger, middle-aged, and older listeners (10 per group with good audiometric hearing participated. They were each tested under seven conditions that encompassed combinations of spectral, temporal, and binaural integration. Sentences were filtered into two bands centered at 500 Hz and 2500 Hz, with criterion bandwidth tailored for each participant. In some conditions, the speech bands were individually square wave interrupted at a rate of 10 Hz. Configurations of uninterrupted, synchronously interrupted, and asynchronously interrupted frequency bands were constructed that constituted speech fragments distributed across frequency, time, and ear of presentation. The over-arching finding was that, for most configurations, performance was not differentially affected by listener age. Although speech intelligibility varied across condition, there was no evidence of performance deficits in older listeners in any condition. This study indicates that age, per se, does not necessarily undermine the ability to integrate fragments of speech dispersed across frequency and time.

  13. Audiovisual Capture with Ambiguous Audiovisual Stimuli

    Directory of Open Access Journals (Sweden)

    Jean-Michel Hupé

    2011-10-01

    Full Text Available Audiovisual capture happens when information across modalities get fused into a coherent percept. Ambiguous multi-modal stimuli have the potential to be powerful tools to observe such effects. We used such stimuli made of temporally synchronized and spatially co-localized visual flashes and auditory tones. The flashes produced bistable apparent motion and the tones produced ambiguous streaming. We measured strong interferences between perceptual decisions in each modality, a case of audiovisual capture. However, does this mean that audiovisual capture occurs before bistable decision? We argue that this is not the case, as the interference had a slow temporal dynamics and was modulated by audiovisual congruence, suggestive of high-level factors such as attention or intention. We propose a framework to integrate bistability and audiovisual capture, which distinguishes between “what” competes and “how” it competes (Hupé et al., 2008. The audiovisual interactions may be the result of contextual influences on neural representations (“what” competes, quite independent from the causal mechanisms of perceptual switches (“how” it competes. This framework predicts that audiovisual capture can bias bistability especially if modalities are congruent (Sato et al., 2007, but that is fundamentally distinct in nature from the bistable competition mechanism.

  14. Cholinergic Potentiation and Audiovisual Repetition-Imitation Therapy Improve Speech Production and Communication Deficits in a Person with Crossed Aphasia by Inducing Structural Plasticity in White Matter Tracts.

    Science.gov (United States)

    Berthier, Marcelo L; De-Torres, Irene; Paredes-Pacheco, José; Roé-Vellvé, Núria; Thurnhofer-Hemsi, Karl; Torres-Prioris, María J; Alfaro, Francisco; Moreno-Torres, Ignacio; López-Barroso, Diana; Dávila, Guadalupe

    2017-01-01

    Donepezil (DP), a cognitive-enhancing drug targeting the cholinergic system, combined with massed sentence repetition training augmented and speeded up recovery of speech production deficits in patients with chronic conduction aphasia and extensive left hemisphere infarctions (Berthier et al., 2014). Nevertheless, a still unsettled question is whether such improvements correlate with restorative structural changes in gray matter and white matter pathways mediating speech production. In the present study, we used pharmacological magnetic resonance imaging to study treatment-induced brain changes in gray matter and white matter tracts in a right-handed male with chronic conduction aphasia and a right subcortical lesion (crossed aphasia). A single-patient, open-label multiple-baseline design incorporating two different treatments and two post-treatment evaluations was used. The patient received an initial dose of DP (5 mg/day) which was maintained during 4 weeks and then titrated up to 10 mg/day and administered alone (without aphasia therapy) during 8 weeks (Endpoint 1). Thereafter, the drug was combined with an audiovisual repetition-imitation therapy (Look-Listen-Repeat, LLR) during 3 months (Endpoint 2). Language evaluations, diffusion weighted imaging (DWI), and voxel-based morphometry (VBM) were performed at baseline and at both endpoints in JAM and once in 21 healthy control males. Treatment with DP alone and combined with LLR therapy induced marked improvement in aphasia and communication deficits as well as in selected measures of connected speech production, and phrase repetition. The obtained gains in speech production remained well-above baseline scores even 4 months after ending combined therapy. Longitudinal DWI showed structural plasticity in the right frontal aslant tract and direct segment of the arcuate fasciculus with both interventions. VBM revealed no structural changes in other white matter tracts nor in cortical areas linked by these tracts. In

  15. Audiovisual integration of emotional signals from music improvisation does not depend on temporal correspondence.

    Science.gov (United States)

    Petrini, Karin; McAleer, Phil; Pollick, Frank

    2010-04-06

    In the present study we applied a paradigm often used in face-voice affect perception to solo music improvisation to examine how the emotional valence of sound and gesture are integrated when perceiving an emotion. Three brief excerpts expressing emotion produced by a drummer and three by a saxophonist were selected. From these bimodal congruent displays the audio-only, visual-only, and audiovisually incongruent conditions (obtained by combining the two signals both within and between instruments) were derived. In Experiment 1 twenty musical novices judged the perceived emotion and rated the strength of each emotion. The results indicate that sound dominated the visual signal in the perception of affective expression, though this was more evident for the saxophone. In Experiment 2 a further sixteen musical novices were asked to either pay attention to the musicians' movements or to the sound when judging the perceived emotions. The results showed no effect of visual information when judging the sound. On the contrary, when judging the emotional content of the visual information, a worsening in performance was obtained for the incongruent condition that combined different emotional auditory and visual information for the same instrument. The effect of emotionally discordant information thus became evident only when the auditory and visual signals belonged to the same categorical event despite their temporal mismatch. This suggests that the integration of emotional information may be reinforced by its semantic attributes but might be independent from temporal features. Copyright 2010 Elsevier B.V. All rights reserved.

  16. Semantics and the multisensory brain: how meaning modulates processes of audio-visual integration.

    Science.gov (United States)

    Doehrmann, Oliver; Naumer, Marcus J

    2008-11-25

    By using meaningful stimuli, multisensory research has recently started to investigate the impact of stimulus content on crossmodal integration. Variations in this respect have often been termed as "semantic". In this paper we will review work related to the question for which tasks the influence of semantic factors has been found and which cortical networks are most likely to mediate these effects. More specifically, the focus of this paper will be on processing of object stimuli presented in the auditory and visual sensory modalities. Furthermore, we will investigate which cortical regions are particularly responsive to experimental variations of content by comparing semantically matching ("congruent") and mismatching ("incongruent") experimental conditions. In this context, recent neuroimaging studies point toward a possible functional differentiation of temporal and frontal cortical regions, with the former being more responsive to semantically congruent and the latter to semantically incongruent audio-visual (AV) stimulation. To account for these differential effects, we will suggest in the final section of this paper a possible synthesis of these data on semantic modulation of AV integration with findings from neuroimaging studies and theoretical accounts of semantic memory.

  17. Integrated Phoneme Subspace Method for Speech Feature Extraction

    Directory of Open Access Journals (Sweden)

    Park Hyunsin

    2009-01-01

    Full Text Available Speech feature extraction has been a key focus in robust speech recognition research. In this work, we discuss data-driven linear feature transformations applied to feature vectors in the logarithmic mel-frequency filter bank domain. Transformations are based on principal component analysis (PCA, independent component analysis (ICA, and linear discriminant analysis (LDA. Furthermore, this paper introduces a new feature extraction technique that collects the correlation information among phoneme subspaces and reconstructs feature space for representing phonemic information efficiently. The proposed speech feature vector is generated by projecting an observed vector onto an integrated phoneme subspace (IPS based on PCA or ICA. The performance of the new feature was evaluated for isolated word speech recognition. The proposed method provided higher recognition accuracy than conventional methods in clean and reverberant environments.

  18. Content congruency and its interplay with temporal synchrony modulate integration between rhythmic audiovisual streams

    Directory of Open Access Journals (Sweden)

    Yi-Huang eSu

    2014-12-01

    Full Text Available Both lower-level stimulus factors (e.g., temporal proximity and higher-level cognitive factors (e.g., content congruency are known to influence multisensory integration. The former can direct attention in a converging manner, and the latter can indicate whether information from the two modalities belongs together. The present research investigated whether and how these two factors interacted in the perception of rhythmic, audiovisual streams derived from a human movement scenario. Congruency here was based on sensorimotor correspondence pertaining to rhythm perception. Participants attended to bimodal stimuli consisting of a humanlike figure moving regularly to a sequence of auditory beat, and detected a possible auditory temporal deviant. The figure moved either downwards (congruently or upwards (incongruently to the downbeat, while in both situations the movement was either synchronous with the beat, or lagging behind it. Greater cross-modal binding was expected to hinder deviant detection. Results revealed poorer detection for congruent than for incongruent streams, suggesting stronger integration in the former. False alarms increased in asynchronous stimuli only for congruent streams, indicating greater tendency for deviant report due to visual capture of asynchronous auditory events. In addition, a greater increase in perceived synchrony was associated with a greater reduction in false alarms for congruent streams, while the pattern was reversed for incongruent ones. These results demonstrate that content congruency as a top-down factor not only promotes integration, but also modulates bottom-up effects of synchrony. Results are also discussed regarding how theories of integration and attentional entrainment may be combined in the context of rhythmic multisensory stimuli.

  19. Content congruency and its interplay with temporal synchrony modulate integration between rhythmic audiovisual streams.

    Science.gov (United States)

    Su, Yi-Huang

    2014-01-01

    Both lower-level stimulus factors (e.g., temporal proximity) and higher-level cognitive factors (e.g., content congruency) are known to influence multisensory integration. The former can direct attention in a converging manner, and the latter can indicate whether information from the two modalities belongs together. The present research investigated whether and how these two factors interacted in the perception of rhythmic, audiovisual (AV) streams derived from a human movement scenario. Congruency here was based on sensorimotor correspondence pertaining to rhythm perception. Participants attended to bimodal stimuli consisting of a humanlike figure moving regularly to a sequence of auditory beat, and detected a possible auditory temporal deviant. The figure moved either downwards (congruently) or upwards (incongruently) to the downbeat, while in both situations the movement was either synchronous with the beat, or lagging behind it. Greater cross-modal binding was expected to hinder deviant detection. Results revealed poorer detection for congruent than for incongruent streams, suggesting stronger integration in the former. False alarms increased in asynchronous stimuli only for congruent streams, indicating greater tendency for deviant report due to visual capture of asynchronous auditory events. In addition, a greater increase in perceived synchrony was associated with a greater reduction in false alarms for congruent streams, while the pattern was reversed for incongruent ones. These results demonstrate that content congruency as a top-down factor not only promotes integration, but also modulates bottom-up effects of synchrony. Results are also discussed regarding how theories of integration and attentional entrainment may be combined in the context of rhythmic multisensory stimuli.

  20. Patients with hippocampal amnesia successfully integrate gesture and speech.

    Science.gov (United States)

    Hilverman, Caitlin; Clough, Sharice; Duff, Melissa C; Cook, Susan Wagner

    2018-06-19

    During conversation, people integrate information from co-speech hand gestures with information in spoken language. For example, after hearing the sentence, "A piece of the log flew up and hit Carl in the face" while viewing a gesture directed at the nose, people tend to later report that the log hit Carl in the nose (information only in gesture) rather than in the face (information in speech). The cognitive and neural mechanisms that support the integration of gesture with speech are unclear. One possibility is that the hippocampus - known for its role in relational memory and information integration - is necessary for integrating gesture and speech. To test this possibility, we examined how patients with hippocampal amnesia and healthy and brain-damaged comparison participants express information from gesture in a narrative retelling task. Participants watched videos of an experimenter telling narratives that included hand gestures that contained supplementary information. Participants were asked to retell the narratives and their spoken retellings were assessed for the presence of information from gesture. For features that had been accompanied by supplementary gesture, patients with amnesia retold fewer of these features overall and fewer retellings that matched the speech from the narrative. Yet their retellings included features that contained information that had been present uniquely in gesture in amounts that were not reliably different from comparison groups. Thus, a functioning hippocampus is not necessary for gesture-speech integration over short timescales. Providing unique information in gesture may enhance communication for individuals with declarative memory impairment, possibly via non-declarative memory mechanisms. Copyright © 2018. Published by Elsevier Ltd.

  1. Toward Speech and Nonverbal Behaviors Integration for Humanoid Robot

    Directory of Open Access Journals (Sweden)

    Wei Wang

    2012-09-01

    Full Text Available It is essential to integrate speeches and nonverbal behaviors for a humanoid robot in human-robot interaction. This paper presents an approach using multi-object genetic algorithm to match the speeches and behaviors automatically. Firstly, with humanoid robot's emotion status, we construct a hierarchical structure to link voice characteristics and nonverbal behaviors. Secondly, these behaviors corresponding to speeches are matched and integrated into an action sequence based on genetic algorithm, so the robot can consistently speak and perform emotional behaviors. Our approach takes advantage of relevant knowledge described by psychologists and nonverbal communication. And from experiment results, our ultimate goal, implementing an affective robot to act and speak with partners vividly and fluently, could be achieved.

  2. Non-fluent speech following stroke is caused by impaired efference copy.

    Science.gov (United States)

    Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

    2017-09-01

    Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.

  3. Audiovisual functional magnetic resonance imaging adaptation reveals multisensory integration effects in object-related sensory cortices.

    Science.gov (United States)

    Doehrmann, Oliver; Weigelt, Sarah; Altmann, Christian F; Kaiser, Jochen; Naumer, Marcus J

    2010-03-03

    Information integration across different sensory modalities contributes to object recognition, the generation of associations and long-term memory representations. Here, we used functional magnetic resonance imaging adaptation to investigate the presence of sensory integrative effects at cortical levels as early as nonprimary auditory and extrastriate visual cortices, which are implicated in intermediate stages of object processing. Stimulation consisted of an adapting audiovisual stimulus S(1) and a subsequent stimulus S(2) from the same basic-level category (e.g., cat). The stimuli were carefully balanced with respect to stimulus complexity and semantic congruency and presented in four experimental conditions: (1) the same image and vocalization for S(1) and S(2), (2) the same image and a different vocalization, (3) different images and the same vocalization, or (4) different images and vocalizations. This two-by-two factorial design allowed us to assess the contributions of auditory and visual stimulus repetitions and changes in a statistically orthogonal manner. Responses in visual regions of right fusiform gyrus and right lateral occipital cortex were reduced for repeated visual stimuli (repetition suppression). Surprisingly, left lateral occipital cortex showed stronger responses to repeated auditory stimuli (repetition enhancement). Similarly, auditory regions of interest of the right middle superior temporal gyrus and sulcus exhibited repetition suppression to auditory repetitions and repetition enhancement to visual repetitions. Our findings of crossmodal repetition-related effects in cortices of the respective other sensory modality add to the emerging view that in human subjects sensory integrative mechanisms operate on earlier cortical processing levels than previously assumed.

  4. Gesture-speech integration in children with specific language impairment.

    Science.gov (United States)

    Mainela-Arnold, Elina; Alibali, Martha W; Hostetter, Autumn B; Evans, Julia L

    2014-11-01

    Previous research suggests that speakers are especially likely to produce manual communicative gestures when they have relative ease in thinking about the spatial elements of what they are describing, paired with relative difficulty organizing those elements into appropriate spoken language. Children with specific language impairment (SLI) exhibit poor expressive language abilities together with within-normal-range nonverbal IQs. This study investigated whether weak spoken language abilities in children with SLI influence their reliance on gestures to express information. We hypothesized that these children would rely on communicative gestures to express information more often than their age-matched typically developing (TD) peers, and that they would sometimes express information in gestures that they do not express in the accompanying speech. Participants were 15 children with SLI (aged 5;6-10;0) and 18 age-matched TD controls. Children viewed a wordless cartoon and retold the story to a listener unfamiliar with the story. Children's gestures were identified and coded for meaning using a previously established system. Speech-gesture combinations were coded as redundant if the information conveyed in speech and gesture was the same, and non-redundant if the information conveyed in speech was different from the information conveyed in gesture. Children with SLI produced more gestures than children in the TD group; however, the likelihood that speech-gesture combinations were non-redundant did not differ significantly across the SLI and TD groups. In both groups, younger children were significantly more likely to produce non-redundant speech-gesture combinations than older children. The gesture-speech integration system functions similarly in children with SLI and TD, but children with SLI rely more on gesture to help formulate, conceptualize or express the messages they want to convey. This provides motivation for future research examining whether interventions

  5. Severe Speech Sound Disorders: An Integrated Multimodal Intervention

    Science.gov (United States)

    King, Amie M.; Hengst, Julie A.; DeThorne, Laura S.

    2013-01-01

    Purpose: This study introduces an integrated multimodal intervention (IMI) and examines its effectiveness for the treatment of persistent and severe speech sound disorders (SSD) in young children. The IMI is an activity-based intervention that focuses simultaneously on increasing the "quantity" of a child's meaningful productions of target words…

  6. Integrating speech in time depends on temporal expectancies and attention.

    Science.gov (United States)

    Scharinger, Mathias; Steinberg, Johanna; Tavano, Alessandro

    2017-08-01

    Sensory information that unfolds in time, such as in speech perception, relies on efficient chunking mechanisms in order to yield optimally-sized units for further processing. Whether or not two successive acoustic events receive a one-unit or a two-unit interpretation seems to depend on the fit between their temporal extent and a stipulated temporal window of integration. However, there is ongoing debate on how flexible this temporal window of integration should be, especially for the processing of speech sounds. Furthermore, there is no direct evidence of whether attention may modulate the temporal constraints on the integration window. For this reason, we here examine how different word durations, which lead to different temporal separations of sound onsets, interact with attention. In an Electroencephalography (EEG) study, participants actively and passively listened to words where word-final consonants were occasionally omitted. Words had either a natural duration or were artificially prolonged in order to increase the separation of speech sound onsets. Omission responses to incomplete speech input, originating in left temporal cortex, decreased when the critical speech sound was separated from previous sounds by more than 250 msec, i.e., when the separation was larger than the stipulated temporal window of integration (125-150 msec). Attention, on the other hand, only increased omission responses for stimuli with natural durations. We complemented the event-related potential (ERP) analyses by a frequency-domain analysis on the stimulus presentation rate. Notably, the power of stimulation frequency showed the same duration and attention effects than the omission responses. We interpret these findings on the background of existing research on temporal integration windows and further suggest that our findings may be accounted for within the framework of predictive coding. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Cholinergic Potentiation and Audiovisual Repetition-Imitation Therapy Improve Speech Production and Communication Deficits in a Person with Crossed Aphasia by Inducing Structural Plasticity in White Matter Tracts

    Directory of Open Access Journals (Sweden)

    Marcelo L. Berthier

    2017-06-01

    Full Text Available Donepezil (DP, a cognitive-enhancing drug targeting the cholinergic system, combined with massed sentence repetition training augmented and speeded up recovery of speech production deficits in patients with chronic conduction aphasia and extensive left hemisphere infarctions (Berthier et al., 2014. Nevertheless, a still unsettled question is whether such improvements correlate with restorative structural changes in gray matter and white matter pathways mediating speech production. In the present study, we used pharmacological magnetic resonance imaging to study treatment-induced brain changes in gray matter and white matter tracts in a right-handed male with chronic conduction aphasia and a right subcortical lesion (crossed aphasia. A single-patient, open-label multiple-baseline design incorporating two different treatments and two post-treatment evaluations was used. The patient received an initial dose of DP (5 mg/day which was maintained during 4 weeks and then titrated up to 10 mg/day and administered alone (without aphasia therapy during 8 weeks (Endpoint 1. Thereafter, the drug was combined with an audiovisual repetition-imitation therapy (Look-Listen-Repeat, LLR during 3 months (Endpoint 2. Language evaluations, diffusion weighted imaging (DWI, and voxel-based morphometry (VBM were performed at baseline and at both endpoints in JAM and once in 21 healthy control males. Treatment with DP alone and combined with LLR therapy induced marked improvement in aphasia and communication deficits as well as in selected measures of connected speech production, and phrase repetition. The obtained gains in speech production remained well-above baseline scores even 4 months after ending combined therapy. Longitudinal DWI showed structural plasticity in the right frontal aslant tract and direct segment of the arcuate fasciculus with both interventions. VBM revealed no structural changes in other white matter tracts nor in cortical areas linked by these

  8. Basic to Applied Research: The Benefits of Audio-Visual Speech Perception Research in Teaching Foreign Languages

    Science.gov (United States)

    Erdener, Dogu

    2016-01-01

    Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…

  9. Gesture and Speech Integration: An Exploratory Study of a Man with Aphasia

    Science.gov (United States)

    Cocks, Naomi; Sautin, Laetitia; Kita, Sotaro; Morgan, Gary; Zlotowitz, Sally

    2009-01-01

    Background: In order to comprehend fully a speaker's intention in everyday communication, information is integrated from multiple sources, including gesture and speech. There are no published studies that have explored the impact of aphasia on iconic co-speech gesture and speech integration. Aims: To explore the impact of aphasia on co-speech…

  10. Integration of asynchronous knowledge sources in a novel speech recognition framework

    OpenAIRE

    Van hamme, Hugo

    2008-01-01

    Van hamme H., ''Integration of asynchronous knowledge sources in a novel speech recognition framework'', Proceedings ITRW on speech analysis and processing for knowledge discovery, 4 pp., June 2008, Aalborg, Denmark.

  11. Early and late beta-band power reflect audiovisual perception in the McGurk illusion.

    Science.gov (United States)

    Roa Romero, Yadira; Senkowski, Daniel; Keil, Julian

    2015-04-01

    The McGurk illusion is a prominent example of audiovisual speech perception and the influence that visual stimuli can have on auditory perception. In this illusion, a visual speech stimulus influences the perception of an incongruent auditory stimulus, resulting in a fused novel percept. In this high-density electroencephalography (EEG) study, we were interested in the neural signatures of the subjective percept of the McGurk illusion as a phenomenon of speech-specific multisensory integration. Therefore, we examined the role of cortical oscillations and event-related responses in the perception of congruent and incongruent audiovisual speech. We compared the cortical activity elicited by objectively congruent syllables with incongruent audiovisual stimuli. Importantly, the latter elicited a subjectively congruent percept: the McGurk illusion. We found that early event-related responses (N1) to audiovisual stimuli were reduced during the perception of the McGurk illusion compared with congruent stimuli. Most interestingly, our study showed a stronger poststimulus suppression of beta-band power (13-30 Hz) at short (0-500 ms) and long (500-800 ms) latencies during the perception of the McGurk illusion compared with congruent stimuli. Our study demonstrates that auditory perception is influenced by visual context and that the subsequent formation of a McGurk illusion requires stronger audiovisual integration even at early processing stages. Our results provide evidence that beta-band suppression at early stages reflects stronger stimulus processing in the McGurk illusion. Moreover, stronger late beta-band suppression in McGurk illusion indicates the resolution of incongruent physical audiovisual input and the formation of a coherent, illusory multisensory percept. Copyright © 2015 the American Physiological Society.

  12. Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect.

    Science.gov (United States)

    Van Engen, Kristin J; Xie, Zilong; Chandrasekaran, Bharath

    2017-02-01

    In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners' auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants' susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners' McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.

  13. Audiovisual segregation in cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Simon Landry

    Full Text Available It has traditionally been assumed that cochlear implant users de facto perform atypically in audiovisual tasks. However, a recent study that combined an auditory task with visual distractors suggests that only those cochlear implant users that are not proficient at recognizing speech sounds might show abnormal audiovisual interactions. The present study aims at reinforcing this notion by investigating the audiovisual segregation abilities of cochlear implant users in a visual task with auditory distractors. Speechreading was assessed in two groups of cochlear implant users (proficient and non-proficient at sound recognition, as well as in normal controls. A visual speech recognition task (i.e. speechreading was administered either in silence or in combination with three types of auditory distractors: i noise ii reverse speech sound and iii non-altered speech sound. Cochlear implant users proficient at speech recognition performed like normal controls in all conditions, whereas non-proficient users showed significantly different audiovisual segregation patterns in both speech conditions. These results confirm that normal-like audiovisual segregation is possible in highly skilled cochlear implant users and, consequently, that proficient and non-proficient CI users cannot be lumped into a single group. This important feature must be taken into account in further studies of audiovisual interactions in cochlear implant users.

  14. Iconic Gestures for Robot Avatars, Recognition and Integration with Speech

    Science.gov (United States)

    Bremner, Paul; Leonards, Ute

    2016-01-01

    Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realized remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances. PMID:26925010

  15. Iconic Gestures for Robot Avatars, Recognition and Integration with Speech

    Directory of Open Access Journals (Sweden)

    Paul Adam Bremner

    2016-02-01

    Full Text Available Co-verbal gestures are an important part of human communication, improving its efficiency and efficacy for information conveyance. One possible means by which such multi-modal communication might be realised remotely is through the use of a tele-operated humanoid robot avatar. Such avatars have been previously shown to enhance social presence and operator salience. We present a motion tracking based tele-operation system for the NAO robot platform that allows direct transmission of speech and gestures produced by the operator. To assess the capabilities of this system for transmitting multi-modal communication, we have conducted a user study that investigated if robot-produced iconic gestures are comprehensible, and are integrated with speech. Robot performed gesture outcomes were compared directly to those for gestures produced by a human actor, using a within participant experimental design. We show that iconic gestures produced by a tele-operated robot are understood by participants when presented alone, almost as well as when produced by a human. More importantly, we show that gestures are integrated with speech when presented as part of a multi-modal communication equally well for human and robot performances.

  16. The Role of Temporal Disparity on Audiovisual Integration in Low-Vision Individuals.

    Science.gov (United States)

    Targher, Stefano; Micciolo, Rocco; Occelli, Valeria; Zampini, Massimiliano

    2017-12-01

    Recent findings have shown that sounds improve visual detection in low vision individuals when the audiovisual stimuli pairs of stimuli are presented simultaneously and from the same spatial position. The present study purports to investigate the temporal aspects of the audiovisual enhancement effect previously reported. Low vision participants were asked to detect the presence of a visual stimulus (yes/no task) presented either alone or together with an auditory stimulus at different stimulus onset asynchronies (SOAs). In the first experiment, the sound was presented either simultaneously or before the visual stimulus (i.e., SOAs 0, 100, 250, 400 ms). The results show that the presence of a task-irrelevant auditory stimulus produced a significant visual detection enhancement in all the conditions. In the second experiment, the sound was either synchronized with, or randomly preceded/lagged behind the visual stimulus (i.e., SOAs 0, ± 250, ± 400 ms). The visual detection enhancement was reduced in magnitude and limited only to the synchronous condition and to the condition in which the sound stimulus was presented 250 ms before the visual stimulus. Taken together, the evidence of the present study seems to suggest that audiovisual interaction in low vision individuals is highly modulated by top-down mechanisms.

  17. Perceived synchrony for realistic and dynamic audiovisual events.

    Science.gov (United States)

    Eg, Ragnhild; Behne, Dawn M

    2015-01-01

    In well-controlled laboratory experiments, researchers have found that humans can perceive delays between auditory and visual signals as short as 20 ms. Conversely, other experiments have shown that humans can tolerate audiovisual asynchrony that exceeds 200 ms. This seeming contradiction in human temporal sensitivity can be attributed to a number of factors such as experimental approaches and precedence of the asynchronous signals, along with the nature, duration, location, complexity and repetitiveness of the audiovisual stimuli, and even individual differences. In order to better understand how temporal integration of audiovisual events occurs in the real world, we need to close the gap between the experimental setting and the complex setting of everyday life. With this work, we aimed to contribute one brick to the bridge that will close this gap. We compared perceived synchrony for long-running and eventful audiovisual sequences to shorter sequences that contain a single audiovisual event, for three types of content: action, music, and speech. The resulting windows of temporal integration showed that participants were better at detecting asynchrony for the longer stimuli, possibly because the long-running sequences contain multiple corresponding events that offer audiovisual timing cues. Moreover, the points of subjective simultaneity differ between content types, suggesting that the nature of a visual scene could influence the temporal perception of events. An expected outcome from this type of experiment was the rich variation among participants' distributions and the derived points of subjective simultaneity. Hence, the designs of similar experiments call for more participants than traditional psychophysical studies. Heeding this caution, we conclude that existing theories on multisensory perception are ready to be tested on more natural and representative stimuli.

  18. Assessing the effect of physical differences in the articulation of consonants and vowels on audiovisual temporal perception

    Science.gov (United States)

    Vatakis, Argiro; Maragos, Petros; Rodomagoulakis, Isidoros; Spence, Charles

    2012-01-01

    We investigated how the physical differences associated with the articulation of speech affect the temporal aspects of audiovisual speech perception. Video clips of consonants and vowels uttered by three different speakers were presented. The video clips were analyzed using an auditory-visual signal saliency model in order to compare signal saliency and behavioral data. Participants made temporal order judgments (TOJs) regarding which speech-stream (auditory or visual) had been presented first. The sensitivity of participants' TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. We expected that in the case of the place of articulation and roundedness, where the visual-speech signal is more salient, temporal perception of speech would be modulated by the visual-speech signal. No such effect was expected for the manner of articulation or height. The results demonstrate that for place and manner of articulation, participants' temporal percept was affected (although not always significantly) by highly-salient speech-signals with the visual-signals requiring smaller visual-leads at the PSS. This was not the case when height was evaluated. These findings suggest that in the case of audiovisual speech perception, a highly salient visual-speech signal may lead to higher probabilities regarding the identity of the auditory-signal that modulate the temporal window of multisensory integration of the speech-stimulus. PMID:23060756

  19. Audiovisual Script Writing.

    Science.gov (United States)

    Parker, Norton S.

    In audiovisual writing the writer must first learn to think in terms of moving visual presentation. The writer must research his script, organize it, and adapt it to a limited running time. By use of a pleasant-sounding narrator and well-written narration, the visual and narrative can be successfully integrated. There are two types of script…

  20. Neural Correlates of Temporal Complexity and Synchrony during Audiovisual Correspondence Detection.

    Science.gov (United States)

    Baumann, Oliver; Vromen, Joyce M G; Cheung, Allen; McFadyen, Jessica; Ren, Yudan; Guo, Christine C

    2018-01-01

    We often perceive real-life objects as multisensory cues through space and time. A key challenge for audiovisual integration is to match neural signals that not only originate from different sensory modalities but also that typically reach the observer at slightly different times. In humans, complex, unpredictable audiovisual streams lead to higher levels of perceptual coherence than predictable, rhythmic streams. In addition, perceptual coherence for complex signals seems less affected by increased asynchrony between visual and auditory modalities than for simple signals. Here, we used functional magnetic resonance imaging to determine the human neural correlates of audiovisual signals with different levels of temporal complexity and synchrony. Our study demonstrated that greater perceptual asynchrony and lower signal complexity impaired performance in an audiovisual coherence-matching task. Differences in asynchrony and complexity were also underpinned by a partially different set of brain regions. In particular, our results suggest that, while regions in the dorsolateral prefrontal cortex (DLPFC) were modulated by differences in memory load due to stimulus asynchrony, areas traditionally thought to be involved in speech production and recognition, such as the inferior frontal and superior temporal cortex, were modulated by the temporal complexity of the audiovisual signals. Our results, therefore, indicate specific processing roles for different subregions of the fronto-temporal cortex during audiovisual coherence detection.

  1. Multisensory integration: the case of a time window of gesture-speech integration.

    Science.gov (United States)

    Obermeier, Christian; Gunter, Thomas C

    2015-02-01

    This experiment investigates the integration of gesture and speech from a multisensory perspective. In a disambiguation paradigm, participants were presented with short videos of an actress uttering sentences like "She was impressed by the BALL, because the GAME/DANCE...." The ambiguous noun (BALL) was accompanied by an iconic gesture fragment containing information to disambiguate the noun toward its dominant or subordinate meaning. We used four different temporal alignments between noun and gesture fragment: the identification point (IP) of the noun was either prior to (+120 msec), synchronous with (0 msec), or lagging behind the end of the gesture fragment (-200 and -600 msec). ERPs triggered to the IP of the noun showed significant differences for the integration of dominant and subordinate gesture fragments in the -200, 0, and +120 msec conditions. The outcome of this integration was revealed at the target words. These data suggest a time window for direct semantic gesture-speech integration ranging from at least -200 up to +120 msec. Although the -600 msec condition did not show any signs of direct integration at the homonym, significant disambiguation was found at the target word. An explorative analysis suggested that gesture information was directly integrated at the verb, indicating that there are multiple positions in a sentence where direct gesture-speech integration takes place. Ultimately, this would implicate that in natural communication, where a gesture lasts for some time, several aspects of that gesture will have their specific and possibly distinct impact on different positions in an utterance.

  2. Orthographic dependency in the neural correlates of reading: Evidence from audiovisual integration in English readers

    NARCIS (Netherlands)

    Holloway, I.; van Atteveldt, N.M.; Blomert, L.; Ansari, D.

    2015-01-01

    Reading skills are indispensible in modern technological societies. In transparent alphabetic orthographies, such as Dutch, reading skills build on associations between letters and speech sounds (LS pairs). Previously, we showed that the superior temporal cortex (STC) of Dutch readers is sensitive

  3. The development of audiovisual multisensory integration across childhood and early adolescence: a high-density electrical mapping study.

    Science.gov (United States)

    Brandwein, Alice B; Foxe, John J; Russo, Natalie N; Altschuler, Ted S; Gomes, Hilary; Molholm, Sophie

    2011-05-01

    The integration of multisensory information is essential to forming meaningful representations of the environment. Adults benefit from related multisensory stimuli but the extent to which the ability to optimally integrate multisensory inputs for functional purposes is present in children has not been extensively examined. Using a cross-sectional approach, high-density electrical mapping of event-related potentials (ERPs) was combined with behavioral measures to characterize neurodevelopmental changes in basic audiovisual (AV) integration from middle childhood through early adulthood. The data indicated a gradual fine-tuning of multisensory facilitation of performance on an AV simple reaction time task (as indexed by race model violation), which reaches mature levels by about 14 years of age. They also revealed a systematic relationship between age and the brain processes underlying multisensory integration (MSI) in the time frame of the auditory N1 ERP component (∼ 120 ms). A significant positive correlation between behavioral and neurophysiological measures of MSI suggested that the underlying brain processes contributed to the fine-tuning of multisensory facilitation of behavior that was observed over middle childhood. These findings are consistent with protracted plasticity in a dynamic system and provide a starting point from which future studies can begin to examine the developmental course of multisensory processing in clinical populations.

  4. Speech recognition by means of a three-integrated-circuit set

    Energy Technology Data Exchange (ETDEWEB)

    Zoicas, A.

    1983-11-03

    The author uses pattern recognition methods for detecting word boundaries, and monitors incoming speech at 12 millisecond intervals. Frequency is divided into eight bands and analysis is achieved in an analogue interface integrated circuit, a pipeline digital processor and a control integrated circuit. Applications are suggested, including speech input to personal computers. 3 references.

  5. Audiovisual Interaction

    DEFF Research Database (Denmark)

    Karandreas, Theodoros-Alexandros

    in a manner that allowed the subjective audiovisual evaluation of loudspeakers under controlled conditions. Additionally, unimodal audio and visual evaluations were used as a baseline for comparison. The same procedure was applied in the investigation of the validity of less than optimal stimuli presentations...

  6. Audiovisual Review

    Science.gov (United States)

    Physiology Teacher, 1976

    1976-01-01

    Lists and reviews recent audiovisual materials in areas of medical, dental, nursing and allied health, and veterinary medicine; undergraduate, and high school studies. Each is classified as to level, type of instruction, usefulness, and source of availability. Topics include respiration, renal physiology, muscle mechanics, anatomy, evolution,…

  7. fMR-adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex.

    Science.gov (United States)

    van Atteveldt, Nienke M; Blau, Vera C; Blomert, Leo; Goebel, Rainer

    2010-02-02

    Efficient multisensory integration is of vital importance for adequate interaction with the environment. In addition to basic binding cues like temporal and spatial coherence, meaningful multisensory information is also bound together by content-based associations. Many functional Magnetic Resonance Imaging (fMRI) studies propose the (posterior) superior temporal cortex (STC) as the key structure for integrating meaningful multisensory information. However, a still unanswered question is how superior temporal cortex encodes content-based associations, especially in light of inconsistent results from studies comparing brain activation to semantically matching (congruent) versus nonmatching (incongruent) multisensory inputs. Here, we used fMR-adaptation (fMR-A) in order to circumvent potential problems with standard fMRI approaches, including spatial averaging and amplitude saturation confounds. We presented repetitions of audiovisual stimuli (letter-speech sound pairs) and manipulated the associative relation between the auditory and visual inputs (congruent/incongruent pairs). We predicted that if multisensory neuronal populations exist in STC and encode audiovisual content relatedness, adaptation should be affected by the manipulated audiovisual relation. The results revealed an occipital-temporal network that adapted independently of the audiovisual relation. Interestingly, several smaller clusters distributed over superior temporal cortex within that network, adapted stronger to congruent than to incongruent audiovisual repetitions, indicating sensitivity to content congruency. These results suggest that the revealed clusters contain multisensory neuronal populations that encode content relatedness by selectively responding to congruent audiovisual inputs, since unisensory neuronal populations are assumed to be insensitive to the audiovisual relation. These findings extend our previously revealed mechanism for the integration of letters and speech sounds and

  8. fMR-adaptation indicates selectivity to audiovisual content congruency in distributed clusters in human superior temporal cortex

    Directory of Open Access Journals (Sweden)

    Blomert Leo

    2010-02-01

    Full Text Available Abstract Background Efficient multisensory integration is of vital importance for adequate interaction with the environment. In addition to basic binding cues like temporal and spatial coherence, meaningful multisensory information is also bound together by content-based associations. Many functional Magnetic Resonance Imaging (fMRI studies propose the (posterior superior temporal cortex (STC as the key structure for integrating meaningful multisensory information. However, a still unanswered question is how superior temporal cortex encodes content-based associations, especially in light of inconsistent results from studies comparing brain activation to semantically matching (congruent versus nonmatching (incongruent multisensory inputs. Here, we used fMR-adaptation (fMR-A in order to circumvent potential problems with standard fMRI approaches, including spatial averaging and amplitude saturation confounds. We presented repetitions of audiovisual stimuli (letter-speech sound pairs and manipulated the associative relation between the auditory and visual inputs (congruent/incongruent pairs. We predicted that if multisensory neuronal populations exist in STC and encode audiovisual content relatedness, adaptation should be affected by the manipulated audiovisual relation. Results The results revealed an occipital-temporal network that adapted independently of the audiovisual relation. Interestingly, several smaller clusters distributed over superior temporal cortex within that network, adapted stronger to congruent than to incongruent audiovisual repetitions, indicating sensitivity to content congruency. Conclusions These results suggest that the revealed clusters contain multisensory neuronal populations that encode content relatedness by selectively responding to congruent audiovisual inputs, since unisensory neuronal populations are assumed to be insensitive to the audiovisual relation. These findings extend our previously revealed mechanism for

  9. Integrating speech technology to meet crew station design requirements

    Science.gov (United States)

    Simpson, Carol A.; Ruth, John C.; Moore, Carolyn A.

    The last two years have seen improvements in speech generation and speech recognition technology that make speech I/O for crew station controls and displays viable for operational systems. These improvements include increased robustness of algorithm performance in high levels of background noise, increased vocabulary size, improved performance in the connected speech mode, and less speaker dependence. This improved capability makes possible far more sophisticated user interface design than was possible with earlier technology. Engineering, linguistic, and human factors design issues are discussed in the context of current voice I/O technology performance.

  10. Effectiveness of an Integrated Phonological Awareness Approach for Children with Childhood Apraxia of Speech (CAS)

    Science.gov (United States)

    McNeill, Brigid C.; Gillon, Gail T.; Dodd, Barbara

    2009-01-01

    This study investigated the effectiveness of an integrated phonological awareness approach for children with childhood apraxia of speech (CAS). Change in speech, phonological awareness, letter knowledge, word decoding, and spelling skills were examined. A controlled multiple single-subject design was employed. Twelve children aged 4-7 years with…

  11. Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

    Science.gov (United States)

    Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

    2010-01-01

    A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.

  12. [Virtual audiovisual talking heads: articulatory data and models--applications].

    Science.gov (United States)

    Badin, P; Elisei, F; Bailly, G; Savariaux, C; Serrurier, A; Tarabalka, Y

    2007-01-01

    In the framework of experimental phonetics, our approach to the study of speech production is based on the measurement, the analysis and the modeling of orofacial articulators such as the jaw, the face and the lips, the tongue or the velum. Therefore, we present in this article experimental techniques that allow characterising the shape and movement of speech articulators (static and dynamic MRI, computed tomodensitometry, electromagnetic articulography, video recording). We then describe the linear models of the various organs that we can elaborate from speaker-specific articulatory data. We show that these models, that exhibit a good geometrical resolution, can be controlled from articulatory data with a good temporal resolution and can thus permit the reconstruction of high quality animation of the articulators. These models, that we have integrated in a virtual talking head, can produce augmented audiovisual speech. In this framework, we have assessed the natural tongue reading capabilities of human subjects by means of audiovisual perception tests. We conclude by suggesting a number of other applications of talking heads.

  13. A Longitudinal Assessment of Early Childhood Education with Integrated Speech Therapy for Children with Significant Language Impairment in Germany

    Science.gov (United States)

    Ullrich, Dieter; Ullrich, Katja; Marten, Magret

    2014-01-01

    Background: In Lower Saxony, Germany, pre-school children with language- and speech-deficits have the opportunity to access kindergartens with integrated language-/speech therapy prior to attending primary school, both regular or with integrated speech therapy. It is unknown whether these early childhood education treatments are helpful and…

  14. Summarizing Audiovisual Contents of a Video Program

    Science.gov (United States)

    Gong, Yihong

    2003-12-01

    In this paper, we focus on video programs that are intended to disseminate information and knowledge such as news, documentaries, seminars, etc, and present an audiovisual summarization system that summarizes the audio and visual contents of the given video separately, and then integrating the two summaries with a partial alignment. The audio summary is created by selecting spoken sentences that best present the main content of the audio speech while the visual summary is created by eliminating duplicates/redundancies and preserving visually rich contents in the image stream. The alignment operation aims to synchronize each spoken sentence in the audio summary with its corresponding speaker's face and to preserve the rich content in the visual summary. A Bipartite Graph-based audiovisual alignment algorithm is developed to efficiently find the best alignment solution that satisfies these alignment requirements. With the proposed system, we strive to produce a video summary that: (1) provides a natural visual and audio content overview, and (2) maximizes the coverage for both audio and visual contents of the original video without having to sacrifice either of them.

  15. Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs.

    Science.gov (United States)

    Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

    2013-01-01

    Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.

  16. Functional neuroanatomy of gesture-speech integration in children varies with individual differences in gesture processing.

    Science.gov (United States)

    Demir-Lira, Özlem Ece; Asaridou, Salomi S; Raja Beharelle, Anjali; Holt, Anna E; Goldin-Meadow, Susan; Small, Steven L

    2018-03-08

    Gesture is an integral part of children's communicative repertoire. However, little is known about the neurobiology of speech and gesture integration in the developing brain. We investigated how 8- to 10-year-old children processed gesture that was essential to understanding a set of narratives. We asked whether the functional neuroanatomy of gesture-speech integration varies as a function of (1) the content of speech, and/or (2) individual differences in how gesture is processed. When gestures provided missing information not present in the speech (i.e., disambiguating gesture; e.g., "pet" + flapping palms = bird), the presence of gesture led to increased activity in inferior frontal gyri, the right middle temporal gyrus, and the left superior temporal gyrus, compared to when gesture provided redundant information (i.e., reinforcing gesture; e.g., "bird" + flapping palms = bird). This pattern of activation was found only in children who were able to successfully integrate gesture and speech behaviorally, as indicated by their performance on post-test story comprehension questions. Children who did not glean meaning from gesture did not show differential activation across the two conditions. Our results suggest that the brain activation pattern for gesture-speech integration in children overlaps with-but is broader than-the pattern in adults performing the same task. Overall, our results provide a possible neurobiological mechanism that could underlie children's increasing ability to integrate gesture and speech over childhood, and account for individual differences in that integration. © 2018 John Wiley & Sons Ltd.

  17. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    Energy Technology Data Exchange (ETDEWEB)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C., E-mail: calexandre@ien.gov.b, E-mail: mol@ien.gov.b, E-mail: cmnap@ien.gov.b, E-mail: mag@ien.gov.b [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Nomiya, Diogo V., E-mail: diogonomiya@gmail.co [Universidade Federal do Rio de Janeiro (UFRJ), RJ (Brazil)

    2009-07-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  18. Man-system interface based on automatic speech recognition: integration to a virtual control desk

    International Nuclear Information System (INIS)

    Jorge, Carlos Alexandre F.; Mol, Antonio Carlos A.; Pereira, Claudio M.N.A.; Aghina, Mauricio Alves C.; Nomiya, Diogo V.

    2009-01-01

    This work reports the implementation of a man-system interface based on automatic speech recognition, and its integration to a virtual nuclear power plant control desk. The later is aimed to reproduce a real control desk using virtual reality technology, for operator training and ergonomic evaluation purpose. An automatic speech recognition system was developed to serve as a new interface with users, substituting computer keyboard and mouse. They can operate this virtual control desk in front of a computer monitor or a projection screen through spoken commands. The automatic speech recognition interface developed is based on a well-known signal processing technique named cepstral analysis, and on artificial neural networks. The speech recognition interface is described, along with its integration with the virtual control desk, and results are presented. (author)

  19. An integrated audio-visual impact tool for wind turbine installations

    International Nuclear Information System (INIS)

    Lymberopoulos, N.; Belessis, M.; Wood, M.; Voutsinas, S.

    1996-01-01

    An integrated software tool was developed for the design of wind parks that takes into account their visual and audio impact. The application is built on a powerful hardware platform and is fully operated through a graphic user interface. The topography, the wind turbines and the daylight conditions are realised digitally. The wind park can be animated in real time and the user can take virtual walks in it while the set-up of the park can be altered interactively. In parallel, the wind speed levels on the terrain, the emitted noise intensity, the annual energy output and the cash flow can be estimated at any stage of the session and prompt the user for rearrangements. The tool has been used to visually simulate existing wind parks in St. Breok, UK and Andros Island, Greece. The results lead to the conclusion that such a tool can assist to the public acceptance and licensing procedures of wind parks. (author)

  20. Multisensory speech perception without the left superior temporal sulcus.

    Science.gov (United States)

    Baum, Sarah H; Martin, Randi C; Hamilton, A Cris; Beauchamp, Michael S

    2012-09-01

    Converging evidence suggests that the left superior temporal sulcus (STS) is a critical site for multisensory integration of auditory and visual information during speech perception. We report a patient, SJ, who suffered a stroke that damaged the left tempo-parietal area, resulting in mild anomic aphasia. Structural MRI showed complete destruction of the left middle and posterior STS, as well as damage to adjacent areas in the temporal and parietal lobes. Surprisingly, SJ demonstrated preserved multisensory integration measured with two independent tests. First, she perceived the McGurk effect, an illusion that requires integration of auditory and visual speech. Second, her perception of morphed audiovisual speech with ambiguous auditory or visual information was significantly influenced by the opposing modality. To understand the neural basis for this preserved multisensory integration, blood-oxygen level dependent functional magnetic resonance imaging (BOLD fMRI) was used to examine brain responses to audiovisual speech in SJ and 23 healthy age-matched controls. In controls, bilateral STS activity was observed. In SJ, no activity was observed in the damaged left STS but in the right STS, more cortex was active in SJ than in any of the normal controls. Further, the amplitude of the BOLD response in right STS response to McGurk stimuli was significantly greater in SJ than in controls. The simplest explanation of these results is a reorganization of SJ's cortical language networks such that the right STS now subserves multisensory integration of speech. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. Interaction and Representational Integration: Evidence from Speech Errors

    Science.gov (United States)

    Goldrick, Matthew; Baker, H. Ross; Murphy, Amanda; Baese-Berk, Melissa

    2011-01-01

    We examine the mechanisms that support interaction between lexical, phonological and phonetic processes during language production. Studies of the phonetics of speech errors have provided evidence that partially activated lexical and phonological representations influence phonetic processing. We examine how these interactive effects are modulated…

  2. Semantic Congruency in Audiovisual Integration as Revealed by the Continuous Flash Suppression Paradigm

    Directory of Open Access Journals (Sweden)

    Yung-Hao Yang

    2011-10-01

    Full Text Available Despite several demonstrations of crossmodal semantic-congruency effect, it remains controversial as to whether it is a genuine perceptual phenomenon or instead it actually results from post-perceptual response bias such as decision or strategies (de Gelder and Bertelson, 2003. Here we combine the invisible stimuli with sounds to exclude the participants' awareness of the relation between visual and auditory stimuli. We render the visual events invisible by adopting the continuous flash suppression paradigm (Tsuchiya and Koch, 2005 in which the dynamic high-contrast visual patches were presented in one eye to suppress the target that was presented in the other eye. The semantic congruency between visual and auditory stimuli was manipulated and participants had to detect any parts of visual target. The results showed that the time needed to detect the visual target (ie, the release from suppression was faster when it was accompanied by a semantically congruent sound than with an incongruent one. This study therefore demonstrates genuine multisensory integration at the semantic level. Furthermore, it also extends from previous studies with neglect blindsight patients (eg, de Gelder, Pourtois, and Weiskrantz, 2002 to normal participants based on their unawareness of the relation between visual and auditory information.

  3. Historia audiovisual para una sociedad audiovisual

    Directory of Open Access Journals (Sweden)

    Julio Montero Díaz

    2013-04-01

    Full Text Available This article analyzes the possibilities of presenting an audiovisual history in a society in which audiovisual media has progressively gained greater protagonism. We analyze specific cases of films and historical documentaries and we assess the difficulties faced by historians to understand the keys of audiovisual language and by filmmakers to understand and incorporate history into their productions. We conclude that it would not be possible to disseminate history in the western world without audiovisual resources circulated through various types of screens (cinema, television, computer, mobile phone, video games.

  4. Interaction and representational integration: Evidence from speech errors

    OpenAIRE

    Goldrick, Matthew; Baker, H. Ross; Murphy, Amanda; Baese-Berk, Melissa

    2011-01-01

    We examine the mechanisms that support interaction between lexical, phonological and phonetic processes during language production. Studies of the phonetics of speech errors have provided evidence that partially activated lexical and phonological representations influence phonetic processing. We examine how these interactive effects are modulated by lexical frequency. Previous research has demonstrated that during lexical access, the processing of high frequency words is facilitated; in contr...

  5. Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes

    Directory of Open Access Journals (Sweden)

    Annalisa eSetti

    2013-09-01

    Full Text Available Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the ‘McGurk illusion’, in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.

  6. Transcranial Magnetic Stimulation over Left Inferior Frontal and Posterior Temporal Cortex Disrupts Gesture-Speech Integration.

    Science.gov (United States)

    Zhao, Wanying; Riggs, Kevin; Schindler, Igor; Holle, Henning

    2018-02-21

    Language and action naturally occur together in the form of cospeech gestures, and there is now convincing evidence that listeners display a strong tendency to integrate semantic information from both domains during comprehension. A contentious question, however, has been which brain areas are causally involved in this integration process. In previous neuroimaging studies, left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG) have emerged as candidate areas; however, it is currently not clear whether these areas are causally or merely epiphenomenally involved in gesture-speech integration. In the present series of experiments, we directly tested for a potential critical role of IFG and pMTG by observing the effect of disrupting activity in these areas using transcranial magnetic stimulation in a mixed gender sample of healthy human volunteers. The outcome measure was performance on a Stroop-like gesture task (Kelly et al., 2010a), which provides a behavioral index of gesture-speech integration. Our results provide clear evidence that disrupting activity in IFG and pMTG selectively impairs gesture-speech integration, suggesting that both areas are causally involved in the process. These findings are consistent with the idea that these areas play a joint role in gesture-speech integration, with IFG regulating strategic semantic access via top-down signals acting upon temporal storage areas. SIGNIFICANCE STATEMENT Previous neuroimaging studies suggest an involvement of inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech integration, but findings have been mixed and due to methodological constraints did not allow inferences of causality. By adopting a virtual lesion approach involving transcranial magnetic stimulation, the present study provides clear evidence that both areas are causally involved in combining semantic information arising from gesture and speech. These findings support the view that, rather than being

  7. Suppression of the µ rhythm during speech and non-speech discrimination revealed by independent component analysis: implications for sensorimotor integration in speech processing.

    Science.gov (United States)

    Bowers, Andrew; Saltuklaroglu, Tim; Harkrider, Ashley; Cuellar, Megan

    2013-01-01

    Constructivist theories propose that articulatory hypotheses about incoming phonetic targets may function to enhance perception by limiting the possibilities for sensory analysis. To provide evidence for this proposal, it is necessary to map ongoing, high-temporal resolution changes in sensorimotor activity (i.e., the sensorimotor μ rhythm) to accurate speech and non-speech discrimination performance (i.e., correct trials.). Sixteen participants (15 female and 1 male) were asked to passively listen to or actively identify speech and tone-sweeps in a two-force choice discrimination task while the electroencephalograph (EEG) was recorded from 32 channels. The stimuli were presented at signal-to-noise ratios (SNRs) in which discrimination accuracy was high (i.e., 80-100%) and low SNRs producing discrimination performance at chance. EEG data were decomposed using independent component analysis and clustered across participants using principle component methods in EEGLAB. ICA revealed left and right sensorimotor µ components for 14/16 and 13/16 participants respectively that were identified on the basis of scalp topography, spectral peaks, and localization to the precentral and postcentral gyri. Time-frequency analysis of left and right lateralized µ component clusters revealed significant (pFDRspeech discrimination trials relative to chance trials following stimulus offset. Findings are consistent with constructivist, internal model theories proposing that early forward motor models generate predictions about likely phonemic units that are then synthesized with incoming sensory cues during active as opposed to passive processing. Future directions and possible translational value for clinical populations in which sensorimotor integration may play a functional role are discussed.

  8. Integrating Information from Speech and Physiological Signals to Achieve Emotional Sensitivity

    DEFF Research Database (Denmark)

    Kim, Jonghwa; André, Elisabeth; Rehm, Matthias

    2005-01-01

    Recently, there has been a significant amount of work on the recognition of emotions from speech and biosignals. Most approaches to emotion recognition so far concentrate on a single modality and do not take advantage of the fact that an integrated multimodal analysis may help to resolve...

  9. Audiovisual quality assessment and prediction for videotelephony

    CERN Document Server

    Belmudez, Benjamin

    2015-01-01

    The work presented in this book focuses on modeling audiovisual quality as perceived by the users of IP-based solutions for video communication like videotelephony. It also extends the current framework for the parametric prediction of audiovisual call quality. The book addresses several aspects related to the quality perception of entire video calls, namely, the quality estimation of the single audio and video modalities in an interactive context, the audiovisual quality integration of these modalities and the temporal pooling of short sample-based quality scores to account for the perceptual quality impact of time-varying degradations.

  10. Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm.

    Science.gov (United States)

    Chen, Yung-Yue

    2018-05-08

    Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H ₂ estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.

  11. Speech Enhancement of Mobile Devices Based on the Integration of a Dual Microphone Array and a Background Noise Elimination Algorithm

    Directory of Open Access Journals (Sweden)

    Yung-Yue Chen

    2018-05-01

    Full Text Available Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H2 estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.

  12. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English

    Science.gov (United States)

    Russo, Frank A.

    2018-01-01

    The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976. PMID:29768426

  13. Multisensory and Modality Specific Processing of Visual Speech in Different Regions of the Premotor Cortex

    Directory of Open Access Journals (Sweden)

    Daniel eCallan

    2014-05-01

    Full Text Available Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex has been shown to be active during both observation and execution of action (‘Mirror System’ properties, and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI study, participants identified vowels produced by a speaker in audio-visual (saw the speaker’s articulating face and heard her voice, visual only (only saw the speaker’s articulating face, and audio only (only heard the speaker’s voice conditions with varying audio signal-to-noise ratios in order to determine the regions of the premotor cortex involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the fMRI analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and premotor cortex. The left ventral inferior premotor cortex showed properties of multimodal (audio-visual enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the premotor cortex are involved with mapping unimodal (in this case visual sensory features of the speech signal with

  14. Speech misperception: speaking and seeing interfere differently with hearing.

    Directory of Open Access Journals (Sweden)

    Takemi Mochida

    Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.

  15. Integrating Automatic Speech Recognition and Machine Translation for Better Translation Outputs

    DEFF Research Database (Denmark)

    Liyanapathirana, Jeevanthi

    translations, combining machine translation with computer assisted translation has drawn attention in current research. This combines two prospects: the opportunity of ensuring high quality translation along with a significant performance gain. Automatic Speech Recognition (ASR) is another important area......, which caters important functionalities in language processing and natural language understanding tasks. In this work we integrate automatic speech recognition and machine translation in parallel. We aim to avoid manual typing of possible translations as dictating the translation would take less time...... to the n-best list rescoring, we also use word graphs with the expectation of arriving at a tighter integration of ASR and MT models. Integration methods include constraining ASR models using language and translation models of MT, and vice versa. We currently develop and experiment different methods...

  16. An analysis of machine translation and speech synthesis in speech-to-speech translation system

    OpenAIRE

    Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

    2011-01-01

    This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...

  17. What Iconic Gesture Fragments Reveal about Gesture-Speech Integration: When Synchrony Is Lost, Memory Can Help

    Science.gov (United States)

    Obermeier, Christian; Holle, Henning; Gunter, Thomas C.

    2011-01-01

    The present series of experiments explores several issues related to gesture-speech integration and synchrony during sentence processing. To be able to more precisely manipulate gesture-speech synchrony, we used gesture fragments instead of complete gestures, thereby avoiding the usual long temporal overlap of gestures with their coexpressive…

  18. Sensory integration dysfunction affects efficacy of speech therapy on children with functional articulation disorders

    Directory of Open Access Journals (Sweden)

    Tung LC

    2013-01-01

    Full Text Available Li-Chen Tung,1,# Chin-Kai Lin,2,# Ching-Lin Hsieh,3,4 Ching-Chi Chen,1 Chin-Tsan Huang,1 Chun-Hou Wang5,6 1Department of Physical Medicine and Rehabilitation, Chi Mei Medical Center, Tainan, 2Program of Early Intervention, Department of Early Childhood Education, National Taichung University of Education, Taichung, 3School of Occupational Therapy, College of Medicine, National Taiwan University, Taipei, 4Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital, Taipei, 5School of Physical Therapy, College of Medical Science and Technology, Chung Shan Medical University, Taichung, 6Physical Therapy Room, Chung Shan Medical University Hospital, Taichung, Taiwan#These authors contributed equally Background: Articulation disorders in young children are due to defects occurring at a certain stage in sensory and motor development. Some children with functional articulation disorders may also have sensory integration dysfunction (SID. We hypothesized that speech therapy would be less efficacious in children with SID than in those without SID. Hence, the purpose of this study was to compare the efficacy of speech therapy in two groups of children with functional articulation disorders: those without and those with SID.Method: A total of 30 young children with functional articulation disorders were divided into two groups, the no-SID group (15 children and the SID group (15 children. The number of pronunciation mistakes was evaluated before and after speech therapy.Results: There were no statistically significant differences in age, sex, sibling order, education of parents, and pretest number of mistakes in pronunciation between the two groups (P > 0.05. The mean and standard deviation in the pre- and posttest number of mistakes in pronunciation were 10.5 ± 3.2 and 3.3 ± 3.3 in the no-SID group, and 10.1 ± 2.9 and 6.9 ± 3.5 in the SID group, respectively. Results showed great changes after speech therapy treatment (F

  19. Audiovisual materials are effective for enhancing the correction of articulation disorders in children with cleft palate.

    Science.gov (United States)

    Pamplona, María Del Carmen; Ysunza, Pablo Antonio; Morales, Santiago

    2017-02-01

    Children with cleft palate frequently show speech disorders known as compensatory articulation. Compensatory articulation requires a prolonged period of speech intervention that should include reinforcement at home. However, frequently relatives do not know how to work with their children at home. To study whether the use of audiovisual materials especially designed for complementing speech pathology treatment in children with compensatory articulation can be effective for stimulating articulation practice at home and consequently enhancing speech normalization in children with cleft palate. Eighty-two patients with compensatory articulation were studied. Patients were randomly divided into two groups. Both groups received speech pathology treatment aimed to correct articulation placement. In addition, patients from the active group received a set of audiovisual materials to be used at home. Parents were instructed about strategies and ideas about how to use the materials with their children. Severity of compensatory articulation was compared at the onset and at the end of the speech intervention. After the speech therapy period, the group of patients using audiovisual materials at home demonstrated significantly greater improvement in articulation, as compared with the patients receiving speech pathology treatment on - site without audiovisual supporting materials. The results of this study suggest that audiovisual materials especially designed for practicing adequate articulation placement at home can be effective for reinforcing and enhancing speech pathology treatment of patients with cleft palate and compensatory articulation. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  20. Comparing the influence of spectro-temporal integration in computational speech segregation

    DEFF Research Database (Denmark)

    Bentsen, Thomas; May, Tobias; Kressner, Abigail Anne

    2016-01-01

    The goal of computational speech segregation systems is to automatically segregate a target speaker from interfering maskers. Typically, these systems include a feature extraction stage in the front-end and a classification stage in the back-end. A spectrotemporal integration strategy can...... be applied in either the frontend, using the so-called delta features, or in the back-end, using a second classifier that exploits the posterior probability of speech from the first classifier across a spectro-temporal window. This study systematically analyzes the influence of such stages on segregation...... metric that comprehensively predicts computational segregation performance and correlates well with intelligibility. The outcome of this study could help to identify the most effective spectro-temporal integration strategy for computational segregation systems....

  1. Monkey Lipsmacking Develops Like the Human Speech Rhythm

    Science.gov (United States)

    Morrill, Ryan J.; Paukner, Annika; Ferrari, Pier F.; Ghazanfar, Asif A.

    2012-01-01

    Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved "de novo" in humans. An alternative account--the one we explored here--is that the rhythm of speech evolved through the modification of rhythmic facial…

  2. The development of multisensory speech perception continues into the late childhood years.

    Science.gov (United States)

    Ross, Lars A; Molholm, Sophie; Blanco, Daniella; Gomez-Ramirez, Manuel; Saint-Amour, Dave; Foxe, John J

    2011-06-01

    Observing a speaker's articulations substantially improves the intelligibility of spoken speech, especially under noisy listening conditions. This multisensory integration of speech inputs is crucial to effective communication. Appropriate development of this ability has major implications for children in classroom and social settings, and deficits in it have been linked to a number of neurodevelopmental disorders, especially autism. It is clear from structural imaging studies that there is a prolonged maturational course within regions of the perisylvian cortex that persists into late childhood, and these regions have been firmly established as being crucial to speech and language functions. Given this protracted maturational timeframe, we reasoned that multisensory speech processing might well show a similarly protracted developmental course. Previous work in adults has shown that audiovisual enhancement in word recognition is most apparent within a restricted range of signal-to-noise ratios (SNRs). Here, we investigated when these properties emerge during childhood by testing multisensory speech recognition abilities in typically developing children aged between 5 and 14 years, and comparing them with those of adults. By parametrically varying SNRs, we found that children benefited significantly less from observing visual articulations, displaying considerably less audiovisual enhancement. The findings suggest that improvement in the ability to recognize speech-in-noise and in audiovisual integration during speech perception continues quite late into the childhood years. The implication is that a considerable amount of multisensory learning remains to be achieved during the later schooling years, and that explicit efforts to accommodate this learning may well be warranted. European Journal of Neuroscience © 2011 Federation of European Neuroscience Societies and Blackwell Publishing Ltd. No claim to original US government works.

  3. Development of Sensitivity to Audiovisual Temporal Asynchrony during Midchildhood

    Science.gov (United States)

    Kaganovich, Natalya

    2016-01-01

    Temporal proximity is one of the key factors determining whether events in different modalities are integrated into a unified percept. Sensitivity to audiovisual temporal asynchrony has been studied in adults in great detail. However, how such sensitivity matures during childhood is poorly understood. We examined perception of audiovisual temporal…

  4. Audiovisual Simultaneity Judgment and Rapid Recalibration throughout the Lifespan.

    Science.gov (United States)

    Noel, Jean-Paul; De Niear, Matthew; Van der Burg, Erik; Wallace, Mark T

    2016-01-01

    Multisensory interactions are well established to convey an array of perceptual and behavioral benefits. One of the key features of multisensory interactions is the temporal structure of the stimuli combined. In an effort to better characterize how temporal factors influence multisensory interactions across the lifespan, we examined audiovisual simultaneity judgment and the degree of rapid recalibration to paired audiovisual stimuli (Flash-Beep and Speech) in a sample of 220 participants ranging from 7 to 86 years of age. Results demonstrate a surprisingly protracted developmental time-course for both audiovisual simultaneity judgment and rapid recalibration, with neither reaching maturity until well into adolescence. Interestingly, correlational analyses revealed that audiovisual simultaneity judgments (i.e., the size of the audiovisual temporal window of simultaneity) and rapid recalibration significantly co-varied as a function of age. Together, our results represent the most complete description of age-related changes in audiovisual simultaneity judgments to date, as well as being the first to describe changes in the degree of rapid recalibration as a function of age. We propose that the developmental time-course of rapid recalibration scaffolds the maturation of more durable audiovisual temporal representations.

  5. Audiovisual sentence repetition as a clinical criterion for auditory development in Persian-language children with hearing loss.

    Science.gov (United States)

    Oryadi-Zanjani, Mohammad Majid; Vahab, Maryam; Rahimi, Zahra; Mayahi, Anis

    2017-02-01

    It is important for clinician such as speech-language pathologists and audiologists to develop more efficient procedures to assess the development of auditory, speech and language skills in children using hearing aid and/or cochlear implant compared to their peers with normal hearing. So, the aim of study was the comparison of the performance of 5-to-7-year-old Persian-language children with and without hearing loss in visual-only, auditory-only, and audiovisual presentation of sentence repetition task. The research was administered as a cross-sectional study. The sample size was 92 Persian 5-7 year old children including: 60 with normal hearing and 32 with hearing loss. The children with hearing loss were recruited from Soroush rehabilitation center for Persian-language children with hearing loss in Shiraz, Iran, through consecutive sampling method. All the children had unilateral cochlear implant or bilateral hearing aid. The assessment tool was the Sentence Repetition Test. The study included three computer-based experiments including visual-only, auditory-only, and audiovisual. The scores were compared within and among the three groups through statistical tests in α = 0.05. The score of sentence repetition task between V-only, A-only, and AV presentation was significantly different in the three groups; in other words, the highest to lowest scores belonged respectively to audiovisual, auditory-only, and visual-only format in the children with normal hearing (P audiovisual sentence repetition scores in all the 5-to-7-year-old children (r = 0.179, n = 92, P = 0.088), but audiovisual sentence repetition scores were found to be strongly correlated with auditory-only scores in all the 5-to-7-year-old children (r = 0.943, n = 92, P = 0.000). According to the study's findings, audiovisual integration occurs in the 5-to-7-year-old Persian children using hearing aid or cochlear implant during sentence repetition similar to their peers with normal hearing

  6. Digital audiovisual archives

    CERN Document Server

    Stockinger, Peter

    2013-01-01

    Today, huge quantities of digital audiovisual resources are already available - everywhere and at any time - through Web portals, online archives and libraries, and video blogs. One central question with respect to this huge amount of audiovisual data is how they can be used in specific (social, pedagogical, etc.) contexts and what are their potential interest for target groups (communities, professionals, students, researchers, etc.).This book examines the question of the (creative) exploitation of digital audiovisual archives from a theoretical, methodological, technical and practical

  7. Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users.

    Science.gov (United States)

    Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha

    2015-03-01

    While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed

  8. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

    Science.gov (United States)

    Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu

    2018-05-01

    Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.

  9. Multisensory Integration in Cochlear Implant Recipients.

    Science.gov (United States)

    Stevenson, Ryan A; Sheffield, Sterling W; Butera, Iliza M; Gifford, René H; Wallace, Mark T

    Speech perception is inherently a multisensory process involving integration of auditory and visual cues. Multisensory integration in cochlear implant (CI) recipients is a unique circumstance in that the integration occurs after auditory deprivation and the provision of hearing via the CI. Despite the clear importance of multisensory cues for perception, in general, and for speech intelligibility, specifically, the topic of multisensory perceptual benefits in CI users has only recently begun to emerge as an area of inquiry. We review the research that has been conducted on multisensory integration in CI users to date and suggest a number of areas needing further research. The overall pattern of results indicates that many CI recipients show at least some perceptual gain that can be attributable to multisensory integration. The extent of this gain, however, varies based on a number of factors, including age of implantation and specific task being assessed (e.g., stimulus detection, phoneme perception, word recognition). Although both children and adults with CIs obtain audiovisual benefits for phoneme, word, and sentence stimuli, neither group shows demonstrable gain for suprasegmental feature perception. Additionally, only early-implanted children and the highest performing adults obtain audiovisual integration benefits similar to individuals with normal hearing. Increasing age of implantation in children is associated with poorer gains resultant from audiovisual integration, suggesting a sensitive period in development for the brain networks that subserve these integrative functions, as well as length of auditory experience. This finding highlights the need for early detection of and intervention for hearing loss, not only in terms of auditory perception, but also in terms of the behavioral and perceptual benefits of audiovisual processing. Importantly, patterns of auditory, visual, and audiovisual responses suggest that underlying integrative processes may be

  10. What Information Is Necessary for Speech Categorization? Harnessing Variability in the Speech Signal by Integrating Cues Computed Relative to Expectations

    Science.gov (United States)

    McMurray, Bob; Jongman, Allard

    2011-01-01

    Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the…

  11. Speech understanding in noise with integrated in-ear and muff-style hearing protection systems

    Directory of Open Access Journals (Sweden)

    Sharon M Abel

    2011-01-01

    Full Text Available Integrated hearing protection systems are designed to enhance free field and radio communications during military operations while protecting against the damaging effects of high-level noise exposure. A study was conducted to compare the effect of increasing the radio volume on the intelligibility of speech over the radios of two candidate systems, in-ear and muff-style, in 85-dBA speech babble noise presented free field. Twenty normal-hearing, English-fluent subjects, half male and half female, were tested in same gender pairs. Alternating as talker and listener, their task was to discriminate consonant-vowel-consonant syllables that contrasted either the initial or final consonant. Percent correct consonant discrimination increased with increases in the radio volume. At the highest volume, subjects achieved 79% with the in-ear device but only 69% with the muff-style device, averaged across the gender of listener/talker pairs and consonant position. Although there was no main effect of gender, female listener/talkers showed a 10% advantage for the final consonant and male listener/talkers showed a 1% advantage for the initial consonant. These results indicate that normal hearing users can achieve reasonably high radio communication scores with integrated in-ear hearing protection in moderately high-level noise that provides both energetic and informational masking. The adequacy of the range of available radio volumes for users with hearing loss has yet to be determined.

  12. Joint evaluation of communication quality and user experience in an audio-visual virtual reality meeting

    DEFF Research Database (Denmark)

    Møller, Anders Kalsgaard; Hoffmann, Pablo F.; Carrozzino, Marcello

    2013-01-01

    The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...

  13. Quantifying temporal ventriloquism in audiovisual synchrony perception

    NARCIS (Netherlands)

    Kuling, I.A.; Kohlrausch, A.G.; Juola, J.F.

    2013-01-01

    The integration of visual and auditory inputs in the human brain works properly only if the components are perceived in close temporal proximity. In the present study, we quantified cross-modal interactions in the human brain for audiovisual stimuli with temporal asynchronies, using a paradigm from

  14. Sensitivity to audio-visual synchrony and its relation to language abilities in children with and without ASD.

    Science.gov (United States)

    Righi, Giulia; Tenenbaum, Elena J; McCormick, Carolyn; Blossom, Megan; Amso, Dima; Sheinkopf, Stephen J

    2018-04-01

    Autism Spectrum Disorder (ASD) is often accompanied by deficits in speech and language processing. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to examine whether young children with ASD show reduced sensitivity to temporal asynchronies in a speech processing task when compared to typically developing controls, and to examine how this sensitivity might relate to language proficiency. Using automated eye tracking methods, we found that children with ASD failed to demonstrate sensitivity to asynchronies of 0.3s, 0.6s, or 1.0s between a video of a woman speaking and the corresponding audio track. In contrast, typically developing children who were language-matched to the ASD group, were sensitive to both 0.6s and 1.0s asynchronies. We also demonstrated that individual differences in sensitivity to audiovisual asynchronies and individual differences in orientation to relevant facial features were both correlated with scores on a standardized measure of language abilities. Results are discussed in the context of attention to visual language and audio-visual processing as potential precursors to language impairment in ASD. Autism Res 2018, 11: 645-653. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to explore whether children with ASD process audio-visual synchrony in ways comparable to their typically developing peers, and the relationship between preference for synchrony and language ability. Results showed that

  15. Integrating Music Therapy Services and Speech-Language Therapy Services for Children with Severe Communication Impairments: A Co-Treatment Model

    Science.gov (United States)

    Geist, Kamile; McCarthy, John; Rodgers-Smith, Amy; Porter, Jessica

    2008-01-01

    Documenting how music therapy can be integrated with speech-language therapy services for children with communication delay is not evident in the literature. In this article, a collaborative model with procedures, experiences, and communication outcomes of integrating music therapy with the existing speech-language services is given. Using…

  16. The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise

    NARCIS (Netherlands)

    Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

    2008-01-01

    OBJECTIVES: The aim of this study was to evaluate the benefit that listeners obtain from visually presented output from an automatic speech recognition (ASR) system during listening to speech in noise. DESIGN: Auditory-alone and audiovisual speech reception thresholds (SRTs) were measured. The SRT

  17. Temporal dynamics of sensorimotor integration in speech perception and production: Independent component analysis of EEG data

    Directory of Open Access Journals (Sweden)

    David eJenson

    2014-07-01

    Full Text Available Activity in premotor and sensorimotor cortices is found in speech production and some perception tasks. Yet, how sensorimotor integration supports these functions is unclear due to a lack of data examining the timing of activity from these regions. Beta (~20Hz and alpha (~10Hz spectral power within the EEG µ rhythm are considered indices of motor and somatosensory activity, respectively. In the current study, perception conditions required discrimination (same/different of syllables pairs (/ba/ and /da/ in quiet and noisy conditions. Production conditions required covert and overt syllable productions and overt word production. Independent component analysis was performed on EEG data obtained during these conditions to 1 identify clusters of µ components common to all conditions and 2 examine real-time event-related spectral perturbations (ERSP within alpha and beta bands. 17 and 15 out of 20 participants produced left and right µ-components, respectively, localized to precentral gyri. Discrimination conditions were characterized by significant (pFDR<.05 early alpha event-related synchronization (ERS prior to and during stimulus presentation and later alpha event-related desynchronization (ERD following stimulus offset. Beta ERD began early and gained strength across time. Differences were found between quiet and noisy discrimination conditions. Both overt syllable and word productions yielded similar alpha/beta ERD that began prior to production and was strongest during muscle activity. Findings during covert production were weaker than during overt production. One explanation for these findings is that µ-beta ERD indexes early predictive coding (e.g., internal modeling and/or overt and covert attentional / motor processes. µ-alpha ERS may index inhibitory input to the premotor cortex from sensory regions prior to and during discrimination, while µ-alpha ERD may index re-afferent sensory feedback during speech rehearsal and production.

  18. Experienced speech-language pathologists' responses to ethical dilemmas: an integrated approach to ethical reasoning.

    Science.gov (United States)

    Kenny, Belinda; Lincoln, Michelle; Balandin, Susan

    2010-05-01

    To investigate the approaches of experienced speech-language pathologists (SLPs) to ethical reasoning and the processes they use to resolve ethical dilemmas. Ten experienced SLPs participated in in-depth interviews. A narrative approach was used to guide participants' descriptions of how they resolved ethical dilemmas. Individual narrative transcriptions were analyzed by using the participant's words to develop an ethical story that described and interpreted their responses to dilemmas. Key concepts from individual stories were then coded into group themes to reflect participants' reasoning processes. Five major themes reflected participants' approaches to ethical reasoning: (a) focusing on the well-being of the client, (b) fulfilling professional roles and responsibilities, (c) attending to professional relationships, (d) managing resources, and (e) integrating personal and professional values. SLPs demonstrated a range of ethical reasoning processes: applying bioethical principles, casuistry, and narrative reasoning when managing ethical dilemmas in the workplace. The results indicate that experienced SLPs adopted an integrated approach to ethical reasoning. They supported clients' rights to make health care choices. Bioethical principles, casuistry, and narrative reasoning provided useful frameworks for facilitating health professionals' application of codes of ethics to complex professional practice issues.

  19. Audiovisual integration of stimulus transients

    DEFF Research Database (Denmark)

    Andersen, Tobias; Mamassian, Pascal

    2008-01-01

    A change in sound intensity can facilitate luminance change detection. We found that this effect did not depend on whether sound intensity and luminance increased or decreased. In contrast, luminance identification was strongly influenced by the congruence of luminance and sound intensity change ...

  20. The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

    Science.gov (United States)

    Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

    2017-01-01

    We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework

  1. A randomized controlled trial on the beneficial effects of training letter-speech sound integration on reading fluency in children with dyslexia

    NARCIS (Netherlands)

    Fraga González, G.; Žarić, G.; Tijms, J.; Bonte, M.; Blomert, L.; van der Molen, M.W.

    2015-01-01

    A recent account of dyslexia assumes that a failure to develop automated letter-speech sound integration might be responsible for the observed lack of reading fluency. This study uses a pre-test-training-post-test design to evaluate the effects of a training program based on letter-speech sound

  2. Elevated audiovisual temporal interaction in patients with migraine without aura

    Science.gov (United States)

    2014-01-01

    Background Photophobia and phonophobia are the most prominent symptoms in patients with migraine without aura. Hypersensitivity to visual stimuli can lead to greater hypersensitivity to auditory stimuli, which suggests that the interaction between visual and auditory stimuli may play an important role in the pathogenesis of migraine. However, audiovisual temporal interactions in migraine have not been well studied. Therefore, our aim was to examine auditory and visual interactions in migraine. Methods In this study, visual, auditory, and audiovisual stimuli with different temporal intervals between the visual and auditory stimuli were randomly presented to the left or right hemispace. During this time, the participants were asked to respond promptly to target stimuli. We used cumulative distribution functions to analyze the response times as a measure of audiovisual integration. Results Our results showed that audiovisual integration was significantly elevated in the migraineurs compared with the normal controls (p audiovisual suppression was weaker in the migraineurs compared with the normal controls (p < 0.05). Conclusions Our findings further objectively support the notion that migraineurs without aura are hypersensitive to external visual and auditory stimuli. Our study offers a new quantitative and objective method to evaluate hypersensitivity to audio-visual stimuli in patients with migraine. PMID:24961903

  3. The development of co-speech gesture and its semantic integration with speech in 6- to 12-year-old children with autism spectrum disorders.

    Science.gov (United States)

    So, Wing-Chee; Wong, Miranda Kit-Yi; Lui, Ming; Yip, Virginia

    2015-11-01

    Previous work leaves open the question of whether children with autism spectrum disorders aged 6-12 years have delay in producing gestures compared to their typically developing peers. This study examined gestural production among school-aged children in a naturalistic context and how their gestures are semantically related to the accompanying speech. Delay in gestural production was found in children with autism spectrum disorders through their middle to late childhood. Compared to their typically developing counterparts, children with autism spectrum disorders gestured less often and used fewer types of gestures, in particular markers, which carry culture-specific meaning. Typically developing children's gestural production was related to language and cognitive skills, but among children with autism spectrum disorders, gestural production was more strongly related to the severity of socio-communicative impairment. Gesture impairment also included the failure to integrate speech with gesture: in particular, supplementary gestures are absent in children with autism spectrum disorders. The findings extend our understanding of gestural production in school-aged children with autism spectrum disorders during spontaneous interaction. The results can help guide new therapies for gestural production for children with autism spectrum disorders in middle and late childhood. © The Author(s) 2014.

  4. Audiovisual perceptual learning with multiple speakers.

    Science.gov (United States)

    Mitchel, Aaron D; Gerfen, Chip; Weiss, Daniel J

    2016-05-01

    One challenge for speech perception is between-speaker variability in the acoustic parameters of speech. For example, the same phoneme (e.g. the vowel in "cat") may have substantially different acoustic properties when produced by two different speakers and yet the listener must be able to interpret these disparate stimuli as equivalent. Perceptual tuning, the use of contextual information to adjust phonemic representations, may be one mechanism that helps listeners overcome obstacles they face due to this variability during speech perception. Here we test whether visual contextual cues to speaker identity may facilitate the formation and maintenance of distributional representations for individual speakers, allowing listeners to adjust phoneme boundaries in a speaker-specific manner. We familiarized participants to an audiovisual continuum between /aba/ and /ada/. During familiarization, the "b-face" mouthed /aba/ when an ambiguous token was played, while the "D-face" mouthed /ada/. At test, the same ambiguous token was more likely to be identified as /aba/ when paired with a stilled image of the "b-face" than with an image of the "D-face." This was not the case in the control condition when the two faces were paired equally with the ambiguous token. Together, these results suggest that listeners may form speaker-specific phonemic representations using facial identity cues.

  5. Venezuela: Nueva Experiencia Audiovisual

    Directory of Open Access Journals (Sweden)

    Revista Chasqui

    2015-01-01

    Full Text Available La Universidad Simón Bolívar (USB creó en 1986, la Fundación para el Desarrollo del Arte Audiovisual, ARTEVISION. Su objetivo general es la promoción y venta de servicios y productos para la televisión, radio, cine, diseño y fotografía de alta calidad artística y técnica. Todo esto sin descuidar los aspectos teóricos-académicos de estas disciplinas.

  6. The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.

    Science.gov (United States)

    Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo

    2009-04-01

    The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained

  7. Audio-visual temporal recalibration can be constrained by content cues regardless of spatial overlap

    Directory of Open Access Journals (Sweden)

    Warrick eRoseboom

    2013-04-01

    Full Text Available It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated, and opposing, estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this was necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; Experiment 1 and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; Experiment 2 we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.

  8. AN EXPERIMENTAL EVALUATION OF AUDIO-VISUAL METHODS--CHANGING ATTITUDES TOWARD EDUCATION.

    Science.gov (United States)

    LOWELL, EDGAR L.; AND OTHERS

    AUDIOVISUAL PROGRAMS FOR PARENTS OF DEAF CHILDREN WERE DEVELOPED AND EVALUATED. EIGHTEEN SOUND FILMS AND ACCOMPANYING RECORDS PRESENTED INFORMATION ON HEARING, LIPREADING AND SPEECH, AND ATTEMPTED TO CHANGE PARENTAL ATTITUDES TOWARD CHILDREN AND SPOUSES. TWO VERSIONS OF THE FILMS AND RECORDS WERE NARRATED BY (1) "STARS" WHO WERE…

  9. ON INTEGRATED COURSE “SOCIAL AND SPEECH COMMUNICATIONS” FOR STUDENTS OF ART HIGHER EDUCATIONAL ESTABLISHMENT

    Directory of Open Access Journals (Sweden)

    Elena Nicolaevna Klemenova

    2013-11-01

    Full Text Available The article describes the experience in teaching the course “Social and Speech Communication”. As the result of training the students are to master the arsenal of means for effective communication, the base of which turns out to be linguistic communication and its bearer that is the language personality, get knowledge about complex processes of information exchange, discover the psychological peculiarities of verbal and non-verbal communication, learn how to communicate for solving professional and personal problems.The skill of fluent mastering all kinds of speech activity, the skill of correct and intellectual communication in various spheres and structures, the skill of speech event linguistic analysis including from the point of view of their esthetical value represent the unity of systemic and individual approach in the sphere of humanitarian training for future architects, designers and managers.DOI: http://dx.doi.org/10.12731/2218-7405-2013-7-43

  10. Speech Entrainment Compensates for Broca's Area Damage

    Science.gov (United States)

    Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

    2015-01-01

    Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443

  11. Effects of Early Bilingual Experience with a Tone and a Non-Tone Language on Speech-Music Integration.

    Directory of Open Access Journals (Sweden)

    Salomi S Asaridou

    Full Text Available We investigated music and language processing in a group of early bilinguals who spoke a tone language and a non-tone language (Cantonese and Dutch. We assessed online speech-music processing interactions, that is, interactions that occur when speech and music are processed simultaneously in songs, with a speeded classification task. In this task, participants judged sung pseudowords either musically (based on the direction of the musical interval or phonologically (based on the identity of the sung vowel. We also assessed longer-term effects of linguistic experience on musical ability, that is, the influence of extensive prior experience with language when processing music. These effects were assessed with a task in which participants had to learn to identify musical intervals and with four pitch-perception tasks. Our hypothesis was that due to their experience in two different languages using lexical versus intonational tone, the early Cantonese-Dutch bilinguals would outperform the Dutch control participants. In online processing, the Cantonese-Dutch bilinguals processed speech and music more holistically than controls. This effect seems to be driven by experience with a tone language, in which integration of segmental and pitch information is fundamental. Regarding longer-term effects of linguistic experience, we found no evidence for a bilingual advantage in either the music-interval learning task or the pitch-perception tasks. Together, these results suggest that being a Cantonese-Dutch bilingual does not have any measurable longer-term effects on pitch and music processing, but does have consequences for how speech and music are processed jointly.

  12. Computationally Efficient Clustering of Audio-Visual Meeting Data

    Science.gov (United States)

    Hung, Hayley; Friedland, Gerald; Yeo, Chuohao

    This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.

  13. Integrating Speech-Language Pathology Services in Palliative End-of-Life Care

    Science.gov (United States)

    Pollens, Robin D.

    2012-01-01

    Clinical speech-language pathologists (SLPs) may receive referrals to consult with teams serving patients who have a severe and/or terminal disease. Palliative care focuses on the prevention or relief of suffering to maximize quality of life for these patients and their families. This article describes how the role of the SLP in palliative care…

  14. Experienced Speech-Language Pathologists' Responses to Ethical Dilemmas: An Integrated Approach to Ethical Reasoning

    Science.gov (United States)

    Kenny, Belinda; Lincoln, Michelle; Balandin, Susan

    2010-01-01

    Purpose: To investigate the approaches of experienced speech-language pathologists (SLPs) to ethical reasoning and the processes they use to resolve ethical dilemmas. Method: Ten experienced SLPs participated in in-depth interviews. A narrative approach was used to guide participants' descriptions of how they resolved ethical dilemmas. Individual…

  15. Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness.

    Science.gov (United States)

    Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas

    2016-01-01

    This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a similar level of monaural

  16. Age-related audiovisual interactions in the superior colliculus of the rat.

    Science.gov (United States)

    Costa, M; Piché, M; Lepore, F; Guillemot, J-P

    2016-04-21

    It is well established that multisensory integration is a functional characteristic of the superior colliculus that disambiguates external stimuli and therefore reduces the reaction times toward simple audiovisual targets in space. However, in a condition where a complex audiovisual stimulus is used, such as the optical flow in the presence of modulated audio signals, little is known about the processing of the multisensory integration in the superior colliculus. Furthermore, since visual and auditory deficits constitute hallmark signs during aging, we sought to gain some insight on whether audiovisual processes in the superior colliculus are altered with age. Extracellular single-unit recordings were conducted in the superior colliculus of anesthetized Sprague-Dawley adult (10-12 months) and aged (21-22 months) rats. Looming circular concentric sinusoidal (CCS) gratings were presented alone and in the presence of sinusoidally amplitude modulated white noise. In both groups of rats, two different audiovisual response interactions were encountered in the spatial domain: superadditive, and suppressive. In contrast, additive audiovisual interactions were found only in adult rats. Hence, superior colliculus audiovisual interactions were more numerous in adult rats (38%) than in aged rats (8%). These results suggest that intersensory interactions in the superior colliculus play an essential role in space processing toward audiovisual moving objects during self-motion. Moreover, aging has a deleterious effect on complex audiovisual interactions. Copyright © 2016 IBRO. Published by Elsevier Ltd. All rights reserved.

  17. Decoding Speech With Integrated Hybrid Signals Recorded From the Human Ventral Motor Cortex

    Directory of Open Access Journals (Sweden)

    Kenji Ibayashi

    2018-04-01

    Full Text Available Restoration of speech communication for locked-in patients by means of brain computer interfaces (BCIs is currently an important area of active research. Among the neural signals obtained from intracranial recordings, single/multi-unit activity (SUA/MUA, local field potential (LFP, and electrocorticography (ECoG are good candidates for an input signal for BCIs. However, the question of which signal or which combination of the three signal modalities is best suited for decoding speech production remains unverified. In order to record SUA, LFP, and ECoG simultaneously from a highly localized area of human ventral sensorimotor cortex (vSMC, we fabricated an electrode the size of which was 7 by 13 mm containing sparsely arranged microneedle and conventional macro contacts. We determined which signal modality is the most capable of decoding speech production, and tested if the combination of these signals could improve the decoding accuracy of spoken phonemes. Feature vectors were constructed from spike frequency obtained from SUAs and event-related spectral perturbation derived from ECoG and LFP signals, then input to the decoder. The results showed that the decoding accuracy for five spoken vowels was highest when features from multiple signals were combined and optimized for each subject, and reached 59% when averaged across all six subjects. This result suggests that multi-scale signals convey complementary information for speech articulation. The current study demonstrated that simultaneous recording of multi-scale neuronal activities could raise decoding accuracy even though the recording area is limited to a small portion of cortex, which is advantageous for future implementation of speech-assisting BCIs.

  18. Text as a Supplement to Speech in Young and Older Adults.

    Science.gov (United States)

    Krull, Vidya; Humes, Larry E

    2016-01-01

    The purpose of this experiment was to quantify the contribution of visual text to auditory speech recognition in background noise. Specifically, the authors tested the hypothesis that partially accurate visual text from an automatic speech recognizer could be used successfully to supplement speech understanding in difficult listening conditions in older adults, with normal or impaired hearing. The working hypotheses were based on what is known regarding audiovisual speech perception in the elderly from speechreading literature. We hypothesized that (1) combining auditory and visual text information will result in improved recognition accuracy compared with auditory or visual text information alone, (2) benefit from supplementing speech with visual text (auditory and visual enhancement) in young adults will be greater than that in older adults, and (3) individual differences in performance on perceptual measures would be associated with cognitive abilities. Fifteen young adults with normal hearing, 15 older adults with normal hearing, and 15 older adults with hearing loss participated in this study. All participants completed sentence recognition tasks in auditory-only, text-only, and combined auditory-text conditions. The auditory sentence stimuli were spectrally shaped to restore audibility for the older participants with impaired hearing. All participants also completed various cognitive measures, including measures of working memory, processing speed, verbal comprehension, perceptual and cognitive speed, processing efficiency, inhibition, and the ability to form wholes from parts. Group effects were examined for each of the perceptual and cognitive measures. Audiovisual benefit was calculated relative to performance on auditory- and visual-text only conditions. Finally, the relationship between perceptual measures and other independent measures were examined using principal-component factor analyses, followed by regression analyses. Both young and older adults

  19. Top-Down Modulation of Auditory-Motor Integration during Speech Production: The Role of Working Memory.

    Science.gov (United States)

    Guo, Zhiqiang; Wu, Xiuqin; Li, Weifeng; Jones, Jeffery A; Yan, Nan; Sheft, Stanley; Liu, Peng; Liu, Hanjun

    2017-10-25

    Although working memory (WM) is considered as an emergent property of the speech perception and production systems, the role of WM in sensorimotor integration during speech processing is largely unknown. We conducted two event-related potential experiments with female and male young adults to investigate the contribution of WM to the neurobehavioural processing of altered auditory feedback during vocal production. A delayed match-to-sample task that required participants to indicate whether the pitch feedback perturbations they heard during vocalizations in test and sample sequences matched, elicited significantly larger vocal compensations, larger N1 responses in the left middle and superior temporal gyrus, and smaller P2 responses in the left middle and superior temporal gyrus, inferior parietal lobule, somatosensory cortex, right inferior frontal gyrus, and insula compared with a control task that did not require memory retention of the sequence of pitch perturbations. On the other hand, participants who underwent extensive auditory WM training produced suppressed vocal compensations that were correlated with improved auditory WM capacity, and enhanced P2 responses in the left middle frontal gyrus, inferior parietal lobule, right inferior frontal gyrus, and insula that were predicted by pretraining auditory WM capacity. These findings indicate that WM can enhance the perception of voice auditory feedback errors while inhibiting compensatory vocal behavior to prevent voice control from being excessively influenced by auditory feedback. This study provides the first evidence that auditory-motor integration for voice control can be modulated by top-down influences arising from WM, rather than modulated exclusively by bottom-up and automatic processes. SIGNIFICANCE STATEMENT One outstanding question that remains unsolved in speech motor control is how the mismatch between predicted and actual voice auditory feedback is detected and corrected. The present study

  20. Speech Perception as a Multimodal Phenomenon

    OpenAIRE

    Rosenblum, Lawrence D.

    2008-01-01

    Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...

  1. Enhancing audiovisual experience with haptic feedback: a survey on HAV.

    Science.gov (United States)

    Danieau, F; Lecuyer, A; Guillotel, P; Fleureau, J; Mollet, N; Christie, M

    2013-01-01

    Haptic technology has been widely employed in applications ranging from teleoperation and medical simulation to art and design, including entertainment, flight simulation, and virtual reality. Today there is a growing interest among researchers in integrating haptic feedback into audiovisual systems. A new medium emerges from this effort: haptic-audiovisual (HAV) content. This paper presents the techniques, formalisms, and key results pertinent to this medium. We first review the three main stages of the HAV workflow: the production, distribution, and rendering of haptic effects. We then highlight the pressing necessity for evaluation techniques in this context and discuss the key challenges in the field. By building on existing technologies and tackling the specific challenges of the enhancement of audiovisual experience with haptics, we believe the field presents exciting research perspectives whose financial and societal stakes are significant.

  2. Integration of literacy into speech-language therapy: a descriptive analysis of treatment practices.

    Science.gov (United States)

    Tambyraja, Sherine R; Schmitt, Mary Beth; Justice, Laura M; Logan, Jessica A R; Schwarz, Sadie

    2014-01-01

    The purpose of the present study was: (a) to examine the extent to which speech-language therapy provided to children with language disorders in the schools targets code-based literacy skills (e.g., alphabet knowledge and phonological awareness) during business-as-usual treatment sessions, and (b) to determine whether literacy-focused therapy time was associated with factors specific to children and/or speech-language pathologists (SLPs). Participants were 151 kindergarten and first-grade children and 40 SLPs. Video-recorded therapy sessions were coded to determine the amount of time that addressed literacy. Assessments of children's literacy skills were administered as well as questionnaires regarding characteristics of SLPs (e.g., service delivery, professional development). Results showed that time spent addressing code-related literacy across therapy sessions was variable. Significant predictors included SLP years of experience, therapy location, and therapy session duration, such that children receiving services from SLPs with more years of experience, and/or who utilized the classroom for therapy, received more literacy-focused time. Additionally, children in longer therapy sessions received more therapy time on literacy skills. There is considerable variability in the extent to which children received literacy-focused time in therapy; however, SLP-level factors predict time spent in literacy more than child-level factors. Further research is needed to understand the nature of literacy-focused therapy in the public schools. Readers will be able to: (a) define code-based literacy skills, (b) discuss the role that speech-language pathologists have in fostering children's literacy development, and (c) identify key factors that may currently influence the inclusion of literacy targets in school-based speech-language therapy. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. The integration of prosodic speech in high functioning autism: a preliminary FMRI study.

    Directory of Open Access Journals (Sweden)

    Isabelle Hesling

    2010-07-01

    Full Text Available Autism is a neurodevelopmental disorder characterized by a specific triad of symptoms such as abnormalities in social interaction, abnormalities in communication and restricted activities and interests. While verbal autistic subjects may present a correct mastery of the formal aspects of speech, they have difficulties in prosody (music of speech, leading to communication disorders. Few behavioural studies have revealed a prosodic impairment in children with autism, and among the few fMRI studies aiming at assessing the neural network involved in language, none has specifically studied prosodic speech. The aim of the present study was to characterize specific prosodic components such as linguistic prosody (intonation, rhythm and emphasis and emotional prosody and to correlate them with the neural network underlying them.We used a behavioural test (Profiling Elements of the Prosodic System, PEPS and fMRI to characterize prosodic deficits and investigate the neural network underlying prosodic processing. Results revealed the existence of a link between perceptive and productive prosodic deficits for some prosodic components (rhythm, emphasis and affect in HFA and also revealed that the neural network involved in prosodic speech perception exhibits abnormal activation in the left SMG as compared to controls (activation positively correlated with intonation and emphasis and an absence of deactivation patterns in regions involved in the default mode.These prosodic impairments could not only result from activation patterns abnormalities but also from an inability to adequately use the strategy of the default network inhibition, both mechanisms that have to be considered for decreasing task performance in High Functioning Autism.

  4. Boosting pitch encoding with audiovisual interactions in congenital amusia.

    Science.gov (United States)

    Albouy, Philippe; Lévêque, Yohana; Hyde, Krista L; Bouchet, Patrick; Tillmann, Barbara; Caclin, Anne

    2015-01-01

    The combination of information across senses can enhance perception, as revealed for example by decreased reaction times or improved stimulus detection. Interestingly, these facilitatory effects have been shown to be maximal when responses to unisensory modalities are weak. The present study investigated whether audiovisual facilitation can be observed in congenital amusia, a music-specific disorder primarily ascribed to impairments of pitch processing. Amusic individuals and their matched controls performed two tasks. In Task 1, they were required to detect auditory, visual, or audiovisual stimuli as rapidly as possible. In Task 2, they were required to detect as accurately and as rapidly as possible a pitch change within an otherwise monotonic 5-tone sequence that was presented either only auditorily (A condition), or simultaneously with a temporally congruent, but otherwise uninformative visual stimulus (AV condition). Results of Task 1 showed that amusics exhibit typical auditory and visual detection, and typical audiovisual integration capacities: both amusics and controls exhibited shorter response times for audiovisual stimuli than for either auditory stimuli or visual stimuli. Results of Task 2 revealed that both groups benefited from simultaneous uninformative visual stimuli to detect pitch changes: accuracy was higher and response times shorter in the AV condition than in the A condition. The audiovisual improvements of response times were observed for different pitch interval sizes depending on the group. These results suggest that both typical listeners and amusic individuals can benefit from multisensory integration to improve their pitch processing abilities and that this benefit varies as a function of task difficulty. These findings constitute the first step towards the perspective to exploit multisensory paradigms to reduce pitch-related deficits in congenital amusia, notably by suggesting that audiovisual paradigms are effective in an appropriate

  5. Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.

    Directory of Open Access Journals (Sweden)

    Kirsten E Smayda

    Full Text Available Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35 and thirty-three older adults (ages 60-90 to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger

  6. Efficient visual search from synchronized auditory signals requires transient audiovisual events.

    Directory of Open Access Journals (Sweden)

    Erik Van der Burg

    Full Text Available BACKGROUND: A prevailing view is that audiovisual integration requires temporally coincident signals. However, a recent study failed to find any evidence for audiovisual integration in visual search even when using synchronized audiovisual events. An important question is what information is critical to observe audiovisual integration. METHODOLOGY/PRINCIPAL FINDINGS: Here we demonstrate that temporal coincidence (i.e., synchrony of auditory and visual components can trigger audiovisual interaction in cluttered displays and consequently produce very fast and efficient target identification. In visual search experiments, subjects found a modulating visual target vastly more efficiently when it was paired with a synchronous auditory signal. By manipulating the kind of temporal modulation (sine wave vs. square wave vs. difference wave; harmonic sine-wave synthesis; gradient of onset/offset ramps we show that abrupt visual events are required for this search efficiency to occur, and that sinusoidal audiovisual modulations do not support efficient search. CONCLUSIONS/SIGNIFICANCE: Thus, audiovisual temporal alignment will only lead to benefits in visual search if the changes in the component signals are both synchronized and transient. We propose that transient signals are necessary in synchrony-driven binding to avoid spurious interactions with unrelated signals when these occur close together in time.

  7. Plantilla 1: El documento audiovisual: elementos importantes

    OpenAIRE

    Alemany, Dolores

    2011-01-01

    Concepto de documento audiovisual y de documentación audiovisual, profundizando en la distinción de documentación de imagen en movimiento con posible incorporación de sonido frente al concepto de documentación audiovisual según plantea Jorge Caldera. Diferenciación entre documentos audiovisuales, obras audiovisuales y patrimonio audiovisual según Félix del Valle.

  8. Multisensory speech perception in autism spectrum disorder: From phoneme to whole-word perception.

    Science.gov (United States)

    Stevenson, Ryan A; Baum, Sarah H; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Wallace, Mark T

    2017-07-01

    Speech perception in noisy environments is boosted when a listener can see the speaker's mouth and integrate the auditory and visual speech information. Autistic children have a diminished capacity to integrate sensory information across modalities, which contributes to core symptoms of autism, such as impairments in social communication. We investigated the abilities of autistic and typically-developing (TD) children to integrate auditory and visual speech stimuli in various signal-to-noise ratios (SNR). Measurements of both whole-word and phoneme recognition were recorded. At the level of whole-word recognition, autistic children exhibited reduced performance in both the auditory and audiovisual modalities. Importantly, autistic children showed reduced behavioral benefit from multisensory integration with whole-word recognition, specifically at low SNRs. At the level of phoneme recognition, autistic children exhibited reduced performance relative to their TD peers in auditory, visual, and audiovisual modalities. However, and in contrast to their performance at the level of whole-word recognition, both autistic and TD children showed benefits from multisensory integration for phoneme recognition. In accordance with the principle of inverse effectiveness, both groups exhibited greater benefit at low SNRs relative to high SNRs. Thus, while autistic children showed typical multisensory benefits during phoneme recognition, these benefits did not translate to typical multisensory benefit of whole-word recognition in noisy environments. We hypothesize that sensory impairments in autistic children raise the SNR threshold needed to extract meaningful information from a given sensory input, resulting in subsequent failure to exhibit behavioral benefits from additional sensory information at the level of whole-word recognition. Autism Res 2017. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. Autism Res 2017, 10: 1280-1290. © 2017 International

  9. Dissociating verbal and nonverbal audiovisual object processing.

    Science.gov (United States)

    Hocking, Julia; Price, Cathy J

    2009-02-01

    This fMRI study investigates how audiovisual integration differs for verbal stimuli that can be matched at a phonological level and nonverbal stimuli that can be matched at a semantic level. Subjects were presented simultaneously with one visual and one auditory stimulus and were instructed to decide whether these stimuli referred to the same object or not. Verbal stimuli were simultaneously presented spoken and written object names, and nonverbal stimuli were photographs of objects simultaneously presented with naturally occurring object sounds. Stimulus differences were controlled by including two further conditions that paired photographs of objects with spoken words and object sounds with written words. Verbal matching, relative to all other conditions, increased activation in a region of the left superior temporal sulcus that has previously been associated with phonological processing. Nonverbal matching, relative to all other conditions, increased activation in a right fusiform region that has previously been associated with structural and conceptual object processing. Thus, we demonstrate how brain activation for audiovisual integration depends on the verbal content of the stimuli, even when stimulus and task processing differences are controlled.

  10. Being First Matters: Topographical Representational Similarity Analysis of ERP Signals Reveals Separate Networks for Audiovisual Temporal Binding Depending on the Leading Sense.

    Science.gov (United States)

    Cecere, Roberto; Gross, Joachim; Willis, Ashleigh; Thut, Gregor

    2017-05-24

    inputs in one modality enhance stimulus processing in another modality. Our research demonstrates that evaluating synchrony of auditory-leading (AV) versus visual-leading (VA) audiovisual stimulus pairs is characterized by two distinct patterns of brain activity. This suggests that audiovisual integration is not a unitary process and that different binding mechanisms are recruited in the brain based on the leading sense. These mechanisms may be relevant for supporting different classes of multisensory operations, for example, auditory enhancement of visual attention (AV) and visual enhancement of auditory speech (VA). Copyright © 2017 Cecere et al.

  11. Contributions of speech-language therapy to the integration of individuals with Down syndrome in the workplace.

    Science.gov (United States)

    Barbosa, Talita Maria Monteiro Farias; Lima, Ivonaldo Leidson Barbosa; Alves, Giorvan Ânderson Dos Santos; Delgado, Isabelle Cahino

    2018-03-01

    To analyze the contributions of speech-language therapy in the integration of young individuals with Down syndrome (DS) into the workplace, with reference to their professionalization. A questionnaire was distributed to eight undergraduate students (tutors) who participated in a project with individuals with DS, five mothers of individuals with DS, and five employees from the institution in which the present study was conducted. The questionnaire assessed the communication, memory, behavior, social interaction, autonomy and independence of the participants with DS, called "trainees". The trainees were employed in one of five routine work sectors at the university that conducted the present study. The data collected in this descriptive and cross-sectional study were analyzed quantitatively and qualitatively. The Research Ethics Committee of the affiliated institute approved the project. Mothers and tutors rated the trainees' language skills as "good". However, their ratings differed from those of the participating employees. After the trainees with DS were placed in a work environment, significant changes were observed in their communication and autonomy. There was no improvement in the trainees' independence, but after training noticeable changes were observed in their social behavior and autonomy. Speech-language therapy during vocational training led to positive changes in the social behavior of individuals with DS, as evidenced by an increase in their autonomy and communication.

  12. Auditory-visual speech integration by prelinguistic infants: perception of an emergent consonant in the McGurk effect.

    Science.gov (United States)

    Burnham, Denis; Dodd, Barbara

    2004-12-01

    The McGurk effect, in which auditory [ba] dubbed onto [ga] lip movements is perceived as "da" or "tha," was employed in a real-time task to investigate auditory-visual speech perception in prelingual infants. Experiments 1A and 1B established the validity of real-time dubbing for producing the effect. In Experiment 2, 4 1/2-month-olds were tested in a habituation-test paradigm, in which an auditory-visual stimulus was presented contingent upon visual fixation of a live face. The experimental group was habituated to a McGurk stimulus (auditory [ba] visual [ga]), and the control group to matching auditory-visual [ba]. Each group was then presented with three auditory-only test trials, [ba], [da], and [(delta)a] (as in then). Visual-fixation durations in test trials showed that the experimental group treated the emergent percept in the McGurk effect, [da] or [(delta)a], as familiar (even though they had not heard these sounds previously) and [ba] as novel. For control group infants [da] and [(delta)a] were no more familiar than [ba]. These results are consistent with infants' perception of the McGurk effect, and support the conclusion that prelinguistic infants integrate auditory and visual speech information. Copyright 2004 Wiley Periodicals, Inc.

  13. Subjective Evaluation of Audiovisual Signals

    Directory of Open Access Journals (Sweden)

    F. Fikejz

    2010-01-01

    Full Text Available This paper deals with subjective evaluation of audiovisual signals, with emphasis on the interaction between acoustic and visual quality. The subjective test is realized by a simple rating method. The audiovisual signal used in this test is a combination of images compressed by JPEG compression codec and sound samples compressed by MPEG-1 Layer III. Images and sounds have various contents. It simulates a real situation when the subject listens to compressed music and watches compressed pictures without the access to original, i.e. uncompressed signals.

  14. Severe Multisensory Speech Integration Deficits in High-Functioning School-Aged Children with Autism Spectrum Disorder (ASD) and Their Resolution During Early Adolescence

    Science.gov (United States)

    Foxe, John J.; Molholm, Sophie; Del Bene, Victor A.; Frey, Hans-Peter; Russo, Natalie N.; Blanco, Daniella; Saint-Amour, Dave; Ross, Lars A.

    2015-01-01

    Under noisy listening conditions, visualizing a speaker's articulations substantially improves speech intelligibility. This multisensory speech integration ability is crucial to effective communication, and the appropriate development of this capacity greatly impacts a child's ability to successfully navigate educational and social settings. Research shows that multisensory integration abilities continue developing late into childhood. The primary aim here was to track the development of these abilities in children with autism, since multisensory deficits are increasingly recognized as a component of the autism spectrum disorder (ASD) phenotype. The abilities of high-functioning ASD children (n = 84) to integrate seen and heard speech were assessed cross-sectionally, while environmental noise levels were systematically manipulated, comparing them with age-matched neurotypical children (n = 142). Severe integration deficits were uncovered in ASD, which were increasingly pronounced as background noise increased. These deficits were evident in school-aged ASD children (5–12 year olds), but were fully ameliorated in ASD children entering adolescence (13–15 year olds). The severity of multisensory deficits uncovered has important implications for educators and clinicians working in ASD. We consider the observation that the multisensory speech system recovers substantially in adolescence as an indication that it is likely amenable to intervention during earlier childhood, with potentially profound implications for the development of social communication abilities in ASD children. PMID:23985136

  15. Speech Problems

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...

  16. Behavioural evidence for separate mechanisms of audiovisual temporal binding as a function of leading sensory modality.

    Science.gov (United States)

    Cecere, Roberto; Gross, Joachim; Thut, Gregor

    2016-06-01

    The ability to integrate auditory and visual information is critical for effective perception and interaction with the environment, and is thought to be abnormal in some clinical populations. Several studies have investigated the time window over which audiovisual events are integrated, also called the temporal binding window, and revealed asymmetries depending on the order of audiovisual input (i.e. the leading sense). When judging audiovisual simultaneity, the binding window appears narrower and non-malleable for auditory-leading stimulus pairs and wider and trainable for visual-leading pairs. Here we specifically examined the level of independence of binding mechanisms when auditory-before-visual vs. visual-before-auditory input is bound. Three groups of healthy participants practiced audiovisual simultaneity detection with feedback, selectively training on auditory-leading stimulus pairs (group 1), visual-leading stimulus pairs (group 2) or both (group 3). Subsequently, we tested for learning transfer (crossover) from trained stimulus pairs to non-trained pairs with opposite audiovisual input. Our data confirmed the known asymmetry in size and trainability for auditory-visual vs. visual-auditory binding windows. More importantly, practicing one type of audiovisual integration (e.g. auditory-visual) did not affect the other type (e.g. visual-auditory), even if trainable by within-condition practice. Together, these results provide crucial evidence that audiovisual temporal binding for auditory-leading vs. visual-leading stimulus pairs are independent, possibly tapping into different circuits for audiovisual integration due to engagement of different multisensory sampling mechanisms depending on leading sense. Our results have implications for informing the study of multisensory interactions in healthy participants and clinical populations with dysfunctional multisensory integration. © 2016 The Authors. European Journal of Neuroscience published by Federation

  17. Search in audiovisual broadcast archives

    NARCIS (Netherlands)

    Huurnink, B.

    2010-01-01

    Documentary makers, journalists, news editors, and other media professionals routinely require previously recorded audiovisual material for new productions. For example, a news editor might wish to reuse footage from overseas services for the evening news, or a documentary maker describing the

  18. Sistema audiovisual para reconocimiento de comandos Audiovisual system for recognition of commands

    Directory of Open Access Journals (Sweden)

    Alexander Ceballos

    2011-08-01

    Full Text Available Se presenta el desarrollo de un sistema automático de reconocimiento audiovisual del habla enfocado en el reconocimiento de comandos. La representación del audio se realizó mediante los coeficientes cepstrales de Mel y las primeras dos derivadas temporales. Para la caracterización del vídeo se hizo seguimiento automático de características visuales de alto nivel a través de toda la secuencia. Para la inicialización automática del algoritmo se emplearon transformaciones de color y contornos activos con información de flujo del vector gradiente ("GVF snakes" sobre la región labial, mientras que para el seguimiento se usaron medidas de similitud entre vecindarios y restricciones morfológicas definidas en el estándar MPEG-4. Inicialmente, se presenta el diseño del sistema de reconocimiento automático del habla, empleando únicamente información de audio (ASR, mediante Modelos Ocultos de Markov (HMMs y un enfoque de palabra aislada; posteriormente, se muestra el diseño de los sistemas empleando únicamente características de vídeo (VSR, y empleando características de audio y vídeo combinadas (AVSR. Al final se comparan los resultados de los tres sistemas para una base de datos propia en español y francés, y se muestra la influencia del ruido acústico, mostrando que el sistema de AVSR es más robusto que ASR y VSR.We present the development of an automatic audiovisual speech recognition system focused on the recognition of commands. Signal audio representation was done using Mel cepstral coefficients and their first and second order time derivatives. In order to characterize the video signal, a set of high-level visual features was tracked throughout the sequences. Automatic initialization of the algorithm was performed using color transformations and active contour models based on Gradient Vector Flow (GVF Snakes on the lip region, whereas visual tracking used similarity measures across neighborhoods and morphological

  19. Temporal processing of audiovisual stimuli is enhanced in musicians: evidence from magnetoencephalography (MEG.

    Directory of Open Access Journals (Sweden)

    Yao Lu

    Full Text Available Numerous studies have demonstrated that the structural and functional differences between professional musicians and non-musicians are not only found within a single modality, but also with regard to multisensory integration. In this study we have combined psychophysical with neurophysiological measurements investigating the processing of non-musical, synchronous or various levels of asynchronous audiovisual events. We hypothesize that long-term multisensory experience alters temporal audiovisual processing already at a non-musical stage. Behaviorally, musicians scored significantly better than non-musicians in judging whether the auditory and visual stimuli were synchronous or asynchronous. At the neural level, the statistical analysis for the audiovisual asynchronous response revealed three clusters of activations including the ACC and the SFG and two bilaterally located activations in IFG and STG in both groups. Musicians, in comparison to the non-musicians, responded to synchronous audiovisual events with enhanced neuronal activity in a broad left posterior temporal region that covers the STG, the insula and the Postcentral Gyrus. Musicians also showed significantly greater activation in the left Cerebellum, when confronted with an audiovisual asynchrony. Taken together, our MEG results form a strong indication that long-term musical training alters the basic audiovisual temporal processing already in an early stage (direct after the auditory N1 wave, while the psychophysical results indicate that musical training may also provide behavioral benefits in the accuracy of the estimates regarding the timing of audiovisual events.

  20. Temporal processing of audiovisual stimuli is enhanced in musicians: evidence from magnetoencephalography (MEG).

    Science.gov (United States)

    Lu, Yao; Paraskevopoulos, Evangelos; Herholz, Sibylle C; Kuchenbuch, Anja; Pantev, Christo

    2014-01-01

    Numerous studies have demonstrated that the structural and functional differences between professional musicians and non-musicians are not only found within a single modality, but also with regard to multisensory integration. In this study we have combined psychophysical with neurophysiological measurements investigating the processing of non-musical, synchronous or various levels of asynchronous audiovisual events. We hypothesize that long-term multisensory experience alters temporal audiovisual processing already at a non-musical stage. Behaviorally, musicians scored significantly better than non-musicians in judging whether the auditory and visual stimuli were synchronous or asynchronous. At the neural level, the statistical analysis for the audiovisual asynchronous response revealed three clusters of activations including the ACC and the SFG and two bilaterally located activations in IFG and STG in both groups. Musicians, in comparison to the non-musicians, responded to synchronous audiovisual events with enhanced neuronal activity in a broad left posterior temporal region that covers the STG, the insula and the Postcentral Gyrus. Musicians also showed significantly greater activation in the left Cerebellum, when confronted with an audiovisual asynchrony. Taken together, our MEG results form a strong indication that long-term musical training alters the basic audiovisual temporal processing already in an early stage (direct after the auditory N1 wave), while the psychophysical results indicate that musical training may also provide behavioral benefits in the accuracy of the estimates regarding the timing of audiovisual events.

  1. Impact of a PACS/RIS-integrated speech recognition system on radiology reporting time and report availability

    International Nuclear Information System (INIS)

    Trumm, C.G.; Glaser, C.; Paasche, V.; Kuettner, B.; Francke, M.; Nissen-Meyer, S.; Reiser, M.; Crispin, A.; Popp, P.

    2006-01-01

    Purpose: Quantification of the impact of a PACS/RIS-integrated speech recognition system (SRS) on the time expenditure for radiology reporting and on hospital-wide report availability (RA) in a university institution. Material and Methods: In a prospective pilot study, the following parameters were assessed for 669 radiographic examinations (CR): 1. time requirement per report dictation (TED: dictation time (s)/number of images [examination] x number of words [report]) with either a combination of PACS/tape-based dictation (TD: analog dictation device/minicassette/transcription) or PACS/RIS/speech recognition system (RR: remote recognition/transcription and OR: online recognition/self-correction by radiologist), respectively, and 2. the Report Turnaround Time (RTT) as the time interval from the entry of the first image into the PACS to the available RIS/HIS report. Two equal time periods were chosen retrospectively from the RIS database: 11/2002-2/2003 (only TD) and 11/2003-2/2004 (only RR or OR with speech recognition system [SRS]). The midterm (≥24 h, 24 h intervals) and short-term (< 24 h, 1 h intervals), RA after examination completion were calculated for all modalities and for Cr, CT, MR and XA/DS separately. The relative increase in the mid-term RA (RIMRA: related to total number of examinations in each time period) and increase in the short-term RA (ISRA: ratio of available reports during the 1st to 24th hour) were calculated. Results: Prospectively, there was a significant difference between TD/RR/OR (n=151/257/261) regarding mean TED (0.44/0.54/0.62 s [per word and image]) and mean RTT (10.47/6.65/1.27 h), respectively. Retrospectively, 37 898/39 680 reports were computed from the RIS database for the time periods of 11/2002-2/2003 and 11/2003-2/2004. For CR/CT there was a shift of the short-term RA to the first 6 hours after examination completion (mean cumulative RA 20% higher) with a more than three-fold increase in the total number of available

  2. Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

    Science.gov (United States)

    Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-01-01

    A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production

  3. Segmentation of the Speaker's Face Region with Audiovisual Correlation

    Science.gov (United States)

    Liu, Yuyu; Sato, Yoichi

    The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.

  4. Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Grossmann, Tobias

    2015-01-01

    Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…

  5. Evolution of non-speech sound memory in postlingual deafness: implications for cochlear implant rehabilitation.

    Science.gov (United States)

    Lazard, D S; Giraud, A L; Truy, E; Lee, H J

    2011-07-01

    Neurofunctional patterns assessed before or after cochlear implantation (CI) are informative markers of implantation outcome. Because phonological memory reorganization in post-lingual deafness is predictive of the outcome, we investigated, using a cross-sectional approach, whether memory of non-speech sounds (NSS) produced by animals or objects (i.e. non-human sounds) is also reorganized, and how this relates to speech perception after CI. We used an fMRI auditory imagery task in which sounds were evoked by pictures of noisy items for post-lingual deaf candidates for CI and for normal-hearing subjects. When deaf subjects imagined sounds, the left inferior frontal gyrus, the right posterior temporal gyrus and the right amygdala were less activated compared to controls. Activity levels in these regions decreased with duration of auditory deprivation, indicating declining NSS representations. Whole brain correlations with duration of auditory deprivation and with speech scores after CI showed an activity decline in dorsal, fronto-parietal, cortical regions, and an activity increase in ventral cortical regions, the right anterior temporal pole and the hippocampal gyrus. Both dorsal and ventral reorganizations predicted poor speech perception outcome after CI. These results suggest that post-CI speech perception relies, at least partially, on the integrity of a neural system used for processing NSS that is based on audio-visual and articulatory mapping processes. When this neural system is reorganized, post-lingual deaf subjects resort to inefficient semantic- and memory-based strategies. These results complement those of other studies on speech processing, suggesting that both speech and NSS representations need to be maintained during deafness to ensure the success of CI. Copyright © 2011 Elsevier Ltd. All rights reserved.

  6. Online integration of information from speech and gesture: Insights from event related potentials

    NARCIS (Netherlands)

    Özyürek, A.; Willems, R.M.; Kita, S.; Hagoort, P.

    2007-01-01

    During language comprehension, listeners use the global semantic representation from previous sentence or discourse context to immediately integrate the meaning of each upcoming word into the unfolding message-level representation. Here we investigate whether communicative gestures that often

  7. Speech Compression

    Directory of Open Access Journals (Sweden)

    Jerry D. Gibson

    2016-06-01

    Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.

  8. Copyright for audiovisual work and analysis of websites offering audiovisual works

    OpenAIRE

    Chrastecká, Nicolle

    2014-01-01

    This Bachelor's thesis deals with the matter of audiovisual piracy. It discusses the question of audiovisual piracy being caused not by the wrong interpretation of law but by the lack of competitiveness among websites with legal audiovisual content. This thesis questions the quality of legal interpretation in the matter of audiovisual piracy and focuses on its sufficiency. It analyses the responsibility of website providers, providers of the illegal content, the responsibility of illegal cont...

  9. Tackling the complexity in speech

    DEFF Research Database (Denmark)

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  10. Learning sparse generative models of audiovisual signals

    OpenAIRE

    Monaci, Gianluca; Sommer, Friedrich T.; Vandergheynst, Pierre

    2008-01-01

    This paper presents a novel framework to learn sparse represen- tations for audiovisual signals. An audiovisual signal is modeled as a sparse sum of audiovisual kernels. The kernels are bimodal functions made of synchronous audio and video components that can be positioned independently and arbitrarily in space and time. We design an algorithm capable of learning sets of such audiovi- sual, synchronous, shift-invariant functions by alternatingly solving a coding and a learning pr...

  11. Net neutrality and audiovisual services

    OpenAIRE

    van Eijk, N.; Nikoltchev, S.

    2011-01-01

    Net neutrality is high on the European agenda. New regulations for the communication sector provide a legal framework for net neutrality and need to be implemented on both a European and a national level. The key element is not just about blocking or slowing down traffic across communication networks: the control over the distribution of audiovisual services constitutes a vital part of the problem. In this contribution, the phenomenon of net neutrality is described first. Next, the European a...

  12. Electrophysiological correlates of predictive coding of auditory location in the perception of natural audiovisual events.

    Science.gov (United States)

    Stekelenburg, Jeroen J; Vroomen, Jean

    2012-01-01

    In many natural audiovisual events (e.g., a clap of the two hands), the visual signal precedes the sound and thus allows observers to predict when, where, and which sound will occur. Previous studies have reported that there are distinct neural correlates of temporal (when) versus phonetic/semantic (which) content on audiovisual integration. Here we examined the effect of visual prediction of auditory location (where) in audiovisual biological motion stimuli by varying the spatial congruency between the auditory and visual parts. Visual stimuli were presented centrally, whereas auditory stimuli were presented either centrally or at 90° azimuth. Typical sub-additive amplitude reductions (AV - V audiovisual interaction was also found at 40-60 ms (P50) in the spatially congruent condition, while no effect of congruency was found on the suppression of the P2. This indicates that visual prediction of auditory location can be coded very early in auditory processing.

  13. Audiovisual spoken word recognition as a clinical criterion for sensory aids efficiency in Persian-language children with hearing loss.

    Science.gov (United States)

    Oryadi-Zanjani, Mohammad Majid; Vahab, Maryam; Bazrafkan, Mozhdeh; Haghjoo, Asghar

    2015-12-01

    The aim of this study was to examine the role of audiovisual speech recognition as a clinical criterion of cochlear implant or hearing aid efficiency in Persian-language children with severe-to-profound hearing loss. This research was administered as a cross-sectional study. The sample size was 60 Persian 5-7 year old children. The assessment tool was one of subtests of Persian version of the Test of Language Development-Primary 3. The study included two experiments: auditory-only and audiovisual presentation conditions. The test was a closed-set including 30 words which were orally presented by a speech-language pathologist. The scores of audiovisual word perception were significantly higher than auditory-only condition in the children with normal hearing (Paudiovisual presentation conditions (P>0.05). The audiovisual spoken word recognition can be applied as a clinical criterion to assess the children with severe to profound hearing loss in order to find whether cochlear implant or hearing aid has been efficient for them or not; i.e. if a child with hearing impairment who using CI or HA can obtain higher scores in audiovisual spoken word recognition than auditory-only condition, his/her auditory skills have appropriately developed due to effective CI or HA as one of the main factors of auditory habilitation. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  14. Quality models for audiovisual streaming

    Science.gov (United States)

    Thang, Truong Cong; Kim, Young Suk; Kim, Cheon Seog; Ro, Yong Man

    2006-01-01

    Quality is an essential factor in multimedia communication, especially in compression and adaptation. Quality metrics can be divided into three categories: within-modality quality, cross-modality quality, and multi-modality quality. Most research has so far focused on within-modality quality. Moreover, quality is normally just considered from the perceptual perspective. In practice, content may be drastically adapted, even converted to another modality. In this case, we should consider the quality from semantic perspective as well. In this work, we investigate the multi-modality quality from the semantic perspective. To model the semantic quality, we apply the concept of "conceptual graph", which consists of semantic nodes and relations between the nodes. As an typical of multi-modality example, we focus on audiovisual streaming service. Specifically, we evaluate the amount of information conveyed by a audiovisual content where both video and audio channels may be strongly degraded, even audio are converted to text. In the experiments, we also consider the perceptual quality model of audiovisual content, so as to see the difference with semantic quality model.

  15. Applications in accessibility of text-to-speech synthesis for South African languages: Initial system integration and user engagement

    CSIR Research Space (South Africa)

    Schlünz, Georg I

    2017-09-01

    Full Text Available with little or no functional speech to speak out loud. Screen readers and accessible e-books allow a print-disabled (visually-impaired, partially-sighted or dyslexic) individual to read text material by listening to audio versions. Text-to-speech synthesis...

  16. The influence of phonetic dimensions on aphasic speech perception

    NARCIS (Netherlands)

    de Kok, D.A.; Jonkers, R.; Bastiaanse, Y.R.M.

    2010-01-01

    Individuals with aphasia have more problems detecting small differences between speech sounds than larger ones. This paper reports how phonemic processing is impaired and how this is influenced by speechreading. A non-word discrimination task was carried out with 'audiovisual', 'auditory only' and

  17. Visual Information Can Hinder Working Memory Processing of Speech

    Science.gov (United States)

    Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Ronnberg, Jerker; Rudner, Mary

    2013-01-01

    Purpose: The purpose of the present study was to evaluate the new Cognitive Spare Capacity Test (CSCT), which measures aspects of working memory capacity for heard speech in the audiovisual and auditory-only modalities of presentation. Method: In Experiment 1, 20 young adults with normal hearing performed the CSCT and an independent battery of…

  18. Audiovisual Temporal Processing and Synchrony Perception in the Rat.

    Science.gov (United States)

    Schormans, Ashley L; Scott, Kaela E; Vo, Albert M Q; Tyker, Anna; Typlt, Marei; Stolzberg, Daniel; Allman, Brian L

    2016-01-01

    Extensive research on humans has improved our understanding of how the brain integrates information from our different senses, and has begun to uncover the brain regions and large-scale neural activity that contributes to an observer's ability to perceive the relative timing of auditory and visual stimuli. In the present study, we developed the first behavioral tasks to assess the perception of audiovisual temporal synchrony in rats. Modeled after the parameters used in human studies, separate groups of rats were trained to perform: (1) a simultaneity judgment task in which they reported whether audiovisual stimuli at various stimulus onset asynchronies (SOAs) were presented simultaneously or not; and (2) a temporal order judgment task in which they reported whether they perceived the auditory or visual stimulus to have been presented first. Furthermore, using in vivo electrophysiological recordings in the lateral extrastriate visual (V2L) cortex of anesthetized rats, we performed the first investigation of how neurons in the rat multisensory cortex integrate audiovisual stimuli presented at different SOAs. As predicted, rats ( n = 7) trained to perform the simultaneity judgment task could accurately (~80%) identify synchronous vs. asynchronous (200 ms SOA) trials. Moreover, the rats judged trials at 10 ms SOA to be synchronous, whereas the majority (~70%) of trials at 100 ms SOA were perceived to be asynchronous. During the temporal order judgment task, rats ( n = 7) perceived the synchronous audiovisual stimuli to be "visual first" for ~52% of the trials, and calculation of the smallest timing interval between the auditory and visual stimuli that could be detected in each rat (i.e., the just noticeable difference (JND)) ranged from 77 ms to 122 ms. Neurons in the rat V2L cortex were sensitive to the timing of audiovisual stimuli, such that spiking activity was greatest during trials when the visual stimulus preceded the auditory by 20-40 ms. Ultimately, given

  19. Auditory and audio-visual processing in patients with cochlear, auditory brainstem, and auditory midbrain implants: An EEG study.

    Science.gov (United States)

    Schierholz, Irina; Finke, Mareike; Kral, Andrej; Büchner, Andreas; Rach, Stefan; Lenarz, Thomas; Dengler, Reinhard; Sandmann, Pascale

    2017-04-01

    There is substantial variability in speech recognition ability across patients with cochlear implants (CIs), auditory brainstem implants (ABIs), and auditory midbrain implants (AMIs). To better understand how this variability is related to central processing differences, the current electroencephalography (EEG) study compared hearing abilities and auditory-cortex activation in patients with electrical stimulation at different sites of the auditory pathway. Three different groups of patients with auditory implants (Hannover Medical School; ABI: n = 6, CI: n = 6; AMI: n = 2) performed a speeded response task and a speech recognition test with auditory, visual, and audio-visual stimuli. Behavioral performance and cortical processing of auditory and audio-visual stimuli were compared between groups. ABI and AMI patients showed prolonged response times on auditory and audio-visual stimuli compared with NH listeners and CI patients. This was confirmed by prolonged N1 latencies and reduced N1 amplitudes in ABI and AMI patients. However, patients with central auditory implants showed a remarkable gain in performance when visual and auditory input was combined, in both speech and non-speech conditions, which was reflected by a strong visual modulation of auditory-cortex activation in these individuals. In sum, the results suggest that the behavioral improvement for audio-visual conditions in central auditory implant patients is based on enhanced audio-visual interactions in the auditory cortex. Their findings may provide important implications for the optimization of electrical stimulation and rehabilitation strategies in patients with central auditory prostheses. Hum Brain Mapp 38:2206-2225, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  20. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  1. Fusion for Audio-Visual Laughter Detection

    NARCIS (Netherlands)

    Reuderink, B.

    2007-01-01

    Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed

  2. Decreased BOLD responses in audiovisual processing

    NARCIS (Netherlands)

    Wiersinga-Post, Esther; Tomaskovic, Sonja; Slabu, Lavinia; Renken, Remco; de Smit, Femke; Duifhuis, Hendrikus

    2010-01-01

    Audiovisual processing was studied in a functional magnetic resonance imaging study using the McGurk effect. Perceptual responses and the brain activity patterns were measured as a function of audiovisual delay. In several cortical and subcortical brain areas, BOLD responses correlated negatively

  3. Audiovisual signs and information science: an evaluation

    Directory of Open Access Journals (Sweden)

    Jalver Bethônico

    2006-12-01

    Full Text Available This work evaluates the relationship of Information Science with audiovisual signs, pointing out conceptual limitations, difficulties imposed by the verbal fundament of knowledge, the reduced use within libraries and the ways in the direction of a more consistent analysis of the audiovisual means, supported by the semiotics of Charles Peirce.

  4. Speech-to-Speech Relay Service

    Science.gov (United States)

    Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...

  5. Cinco discursos da digitalidade audiovisual

    Directory of Open Access Journals (Sweden)

    Gerbase, Carlos

    2001-01-01

    Full Text Available Michel Foucault ensina que toda fala sistemática - inclusive aquela que se afirma “neutra” ou “uma desinteressada visão objetiva do que acontece” - é, na verdade, mecanismo de articulação do saber e, na seqüência, de formação de poder. O aparecimento de novas tecnologias, especialmente as digitais, no campo da produção audiovisual, provoca uma avalanche de declarações de cineastas, ensaios de acadêmicos e previsões de demiurgos da mídia.

  6. Compliments in Audiovisual Translation – issues in character identity

    Directory of Open Access Journals (Sweden)

    Isabel Fernandes Silva

    2011-12-01

    Full Text Available Over the last decades, audiovisual translation has gained increased significance in Translation Studies as well as an interdisciplinary subject within other fields (media, cinema studies etc. Although many articles have been published on communicative aspects of translation such as politeness, only recently have scholars taken an interest in the translation of compliments. This study will focus on both these areas from a multimodal and pragmatic perspective, emphasizing the links between these fields and how this multidisciplinary approach will evidence the polysemiotic nature of the translation process. In Audiovisual Translation both text and image are at play, therefore, the translation of speech produced by the characters may either omit (because it is provided by visualgestual signs or it may emphasize information. A selection was made of the compliments present in the film What Women Want, our focus being on subtitles which did not successfully convey the compliment expressed in the source text, as well as analyze the reasons for this, namely difference in register, Culture Specific Items and repetitions. These differences lead to a different portrayal/identity/perception of the main character in the English version (original soundtrack and subtitled versions in Portuguese and Italian.

  7. Sex differences in multisensory speech processing in both typically developing children and those on the autism spectrum.

    Directory of Open Access Journals (Sweden)

    Lars A. Ross

    2015-05-01

    Full Text Available Background: Previous work has revealed sizeable deficits in the abilities of children with an autism spectrum disorder (ASD to integrate auditory and visual speech signals, with clear implications for social communication in this population. There is a strong male preponderance in ASD, with approximately four affected males for every female. The presence of sex differences in ASD symptoms suggests a sexual dimorphism in the ASD phenotype, and raises the question of whether this dimorphism extends to ASD traits in the neurotypical population. Here, we investigated possible sexual dimorphism in multisensory speech integration in both ASD and neurotypical individuals. Methods: We assessed whether males and females differed in their ability to benefit from visual speech when target words were presented under varying levels of signal-to-noise, in samples of neurotypical children and adults, and in children diagnosed with an ASD. Results: In typically developing (TD children and children with ASD, females (n= 47 and n=15 respectively were significantly superior in their ability to recognize words under audiovisual listening conditions compared to males (n= 55 and n=58 respectively. This sex difference was absent in our sample of neurotypical adults (n= 28 females; n= 28 males. Conclusions: We propose that the development of audiovisual integration is delayed in male relative to female children, a delay that is also observed in ASD. In neurotypicals, these sex differences disappear in early adulthood when females approach their performance maximum and males catch up. Our findings underline the importance of considering sex differences in the search for autism endophenotypes and strongly encourage increased efforts to study the underrepresented population of females within ASD.

  8. Apraxia of Speech

    Science.gov (United States)

    ... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...

  9. A Randomized Controlled Trial on The Beneficial Effects of Training Letter-Speech Sound Integration on Reading Fluency in Children with Dyslexia.

    Directory of Open Access Journals (Sweden)

    Gorka Fraga González

    Full Text Available A recent account of dyslexia assumes that a failure to develop automated letter-speech sound integration might be responsible for the observed lack of reading fluency. This study uses a pre-test-training-post-test design to evaluate the effects of a training program based on letter-speech sound associations with a special focus on gains in reading fluency. A sample of 44 children with dyslexia and 23 typical readers, aged 8 to 9, was recruited. Children with dyslexia were randomly allocated to either the training program group (n = 23 or a waiting-list control group (n = 21. The training intensively focused on letter-speech sound mapping and consisted of 34 individual sessions of 45 minutes over a five month period. The children with dyslexia showed substantial reading gains for the main word reading and spelling measures after training, improving at a faster rate than typical readers and waiting-list controls. The results are interpreted within the conceptual framework assuming a multisensory integration deficit as the most proximal cause of dysfluent reading in dyslexia.ISRCTN register ISRCTN12783279.

  10. Audiovisual Rehabilitation in Hemianopia: A Model-Based Theoretical Investigation.

    Science.gov (United States)

    Magosso, Elisa; Cuppini, Cristiano; Bertini, Caterina

    2017-01-01

    Hemianopic patients exhibit visual detection improvement in the blind field when audiovisual stimuli are given in spatiotemporally coincidence. Beyond this "online" multisensory improvement, there is evidence of long-lasting, "offline" effects induced by audiovisual training: patients show improved visual detection and orientation after they were trained to detect and saccade toward visual targets given in spatiotemporal proximity with auditory stimuli. These effects are ascribed to the Superior Colliculus (SC), which is spared in these patients and plays a pivotal role in audiovisual integration and oculomotor behavior. Recently, we developed a neural network model of audiovisual cortico-collicular loops, including interconnected areas representing the retina, striate and extrastriate visual cortices, auditory cortex, and SC. The network simulated unilateral V1 lesion with possible spared tissue and reproduced "online" effects. Here, we extend the previous network to shed light on circuits, plastic mechanisms, and synaptic reorganization that can mediate the training effects and functionally implement visual rehabilitation. The network is enriched by the oculomotor SC-brainstem route, and Hebbian mechanisms of synaptic plasticity, and is used to test different training paradigms (audiovisual/visual stimulation in eye-movements/fixed-eyes condition) on simulated patients. Results predict different training effects and associate them to synaptic changes in specific circuits. Thanks to the SC multisensory enhancement, the audiovisual training is able to effectively strengthen the retina-SC route, which in turn can foster reinforcement of the SC-brainstem route (this occurs only in eye-movements condition) and reinforcement of the SC-extrastriate route (this occurs in presence of survived V1 tissue, regardless of eye condition). The retina-SC-brainstem circuit may mediate compensatory effects: the model assumes that reinforcement of this circuit can translate visual

  11. 29 CFR 2.13 - Audiovisual coverage prohibited.

    Science.gov (United States)

    2010-07-01

    ... 29 Labor 1 2010-07-01 2010-07-01 true Audiovisual coverage prohibited. 2.13 Section 2.13 Labor Office of the Secretary of Labor GENERAL REGULATIONS Audiovisual Coverage of Administrative Hearings § 2.13 Audiovisual coverage prohibited. The Department shall not permit audiovisual coverage of the...

  12. Introductory speeches

    International Nuclear Information System (INIS)

    2001-01-01

    This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering

  13. Perception of co-speech gestures in aphasic patients: a visual exploration study during the observation of dyadic conversations.

    Science.gov (United States)

    Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M

    2015-03-01

    Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Sustainable models of audiovisual commons

    Directory of Open Access Journals (Sweden)

    Mayo Fuster Morell

    2013-03-01

    Full Text Available This paper addresses an emerging phenomenon characterized by continuous change and experimentation: the collaborative commons creation of audiovisual content online. The analysis wants to focus on models of sustainability of collaborative online creation, paying particular attention to the use of different forms of advertising. This article is an excerpt of a larger investigation, which unit of analysis are cases of Online Creation Communities that take as their central node of activity the Catalan territory. From 22 selected cases, the methodology combines quantitative analysis, through a questionnaire delivered to all cases, and qualitative analysis through face interviews conducted in 8 cases studied. The research, which conclusions we summarize in this article,in this article, leads us to conclude that the sustainability of the project depends largely on relationships of trust and interdependence between different voluntary agents, the non-monetary contributions and retributions as well as resources and infrastructure of free use. All together leads us to understand that this is and will be a very important area for the future of audiovisual content and its sustainability, which will imply changes in the policies that govern them.

  15. Comparison of audio and audiovisual measures of adult stuttering: Implications for clinical trials.

    Science.gov (United States)

    O'Brian, Sue; Jones, Mark; Onslow, Mark; Packman, Ann; Menzies, Ross; Lowe, Robyn

    2015-04-15

    This study investigated whether measures of percentage syllables stuttered (%SS) and stuttering severity ratings with a 9-point scale differ when made from audiovisual compared with audio-only recordings. Four experienced speech-language pathologists measured %SS and assigned stuttering severity ratings to 10-minute audiovisual and audio-only recordings of 36 adults. There was a mean 18% increase in %SS scores when samples were presented in audiovisual compared with audio-only mode. This result was consistent across both higher and lower %SS scores and was found to be directly attributable to counts of stuttered syllables rather than the total number of syllables. There was no significant difference between stuttering severity ratings made from the two modes. In clinical trials research, when using %SS as the primary outcome measure, audiovisual samples would be preferred as long as clear, good quality, front-on images can be easily captured. Alternatively, stuttering severity ratings may be a more valid measure to use as they correlate well with %SS and values are not influenced by the presentation mode.

  16. Neurofunctional Underpinnings of Audiovisual Emotion Processing in Teens with Autism Spectrum Disorders

    Science.gov (United States)

    Doyle-Thomas, Krissy A.R.; Goldberg, Jeremy; Szatmari, Peter; Hall, Geoffrey B.C.

    2013-01-01

    Despite successful performance on some audiovisual emotion tasks, hypoactivity has been observed in frontal and temporal integration cortices in individuals with autism spectrum disorders (ASD). Little is understood about the neurofunctional network underlying this ability in individuals with ASD. Research suggests that there may be processing biases in individuals with ASD, based on their ability to obtain meaningful information from the face and/or the voice. This functional magnetic resonance imaging study examined brain activity in teens with ASD (n = 18) and typically developing controls (n = 16) during audiovisual and unimodal emotion processing. Teens with ASD had a significantly lower accuracy when matching an emotional face to an emotion label. However, no differences in accuracy were observed between groups when matching an emotional voice or face-voice pair to an emotion label. In both groups brain activity during audiovisual emotion matching differed significantly from activity during unimodal emotion matching. Between-group analyses of audiovisual processing revealed significantly greater activation in teens with ASD in a parietofrontal network believed to be implicated in attention, goal-directed behaviors, and semantic processing. In contrast, controls showed greater activity in frontal and temporal association cortices during this task. These results suggest that in the absence of engaging integrative emotional networks during audiovisual emotion matching, teens with ASD may have recruited the parietofrontal network as an alternate compensatory system. PMID:23750139

  17. Visual Temporal Acuity Is Related to Auditory Speech Perception Abilities in Cochlear Implant Users.

    Science.gov (United States)

    Jahn, Kelly N; Stevenson, Ryan A; Wallace, Mark T

    Despite significant improvements in speech perception abilities following cochlear implantation, many prelingually deafened cochlear implant (CI) recipients continue to rely heavily on visual information to develop speech and language. Increased reliance on visual cues for understanding spoken language could lead to the development of unique audiovisual integration and visual-only processing abilities in these individuals. Brain imaging studies have demonstrated that good CI performers, as indexed by auditory-only speech perception abilities, have different patterns of visual cortex activation in response to visual and auditory stimuli as compared with poor CI performers. However, no studies have examined whether speech perception performance is related to any type of visual processing abilities following cochlear implantation. The purpose of the present study was to provide a preliminary examination of the relationship between clinical, auditory-only speech perception tests, and visual temporal acuity in prelingually deafened adult CI users. It was hypothesized that prelingually deafened CI users, who exhibit better (i.e., more acute) visual temporal processing abilities would demonstrate better auditory-only speech perception performance than those with poorer visual temporal acuity. Ten prelingually deafened adult CI users were recruited for this study. Participants completed a visual temporal order judgment task to quantify visual temporal acuity. To assess auditory-only speech perception abilities, participants completed the consonant-nucleus-consonant word recognition test and the AzBio sentence recognition test. Results were analyzed using two-tailed partial Pearson correlations, Spearman's rho correlations, and independent samples t tests. Visual temporal acuity was significantly correlated with auditory-only word and sentence recognition abilities. In addition, proficient CI users, as assessed via auditory-only speech perception performance, demonstrated

  18. Audiovisual Styling and the Film Experience

    DEFF Research Database (Denmark)

    Langkjær, Birger

    2015-01-01

    Approaches to music and audiovisual meaning in film appear to be very different in nature and scope when considered from the point of view of experimental psychology or humanistic studies. Nevertheless, this article argues that experimental studies square with ideas of audiovisual perception...... and meaning in humanistic film music studies in two ways: through studies of vertical synchronous interaction and through studies of horizontal narrative effects. Also, it is argued that the combination of insights from quantitative experimental studies and qualitative audiovisual film analysis may actually...... be combined into a more complex understanding of how audiovisual features interact in the minds of their audiences. This is demonstrated through a review of a series of experimental studies. Yet, it is also argued that textual analysis and concepts from within film and music studies can provide insights...

  19. Neural circuits in auditory and audiovisual memory.

    Science.gov (United States)

    Plakke, B; Romanski, L M

    2016-06-01

    Working memory is the ability to employ recently seen or heard stimuli and apply them to changing cognitive context. Although much is known about language processing and visual working memory, the neurobiological basis of auditory working memory is less clear. Historically, part of the problem has been the difficulty in obtaining a robust animal model to study auditory short-term memory. In recent years there has been neurophysiological and lesion studies indicating a cortical network involving both temporal and frontal cortices. Studies specifically targeting the role of the prefrontal cortex (PFC) in auditory working memory have suggested that dorsal and ventral prefrontal regions perform different roles during the processing of auditory mnemonic information, with the dorsolateral PFC performing similar functions for both auditory and visual working memory. In contrast, the ventrolateral PFC (VLPFC), which contains cells that respond robustly to auditory stimuli and that process both face and vocal stimuli may be an essential locus for both auditory and audiovisual working memory. These findings suggest a critical role for the VLPFC in the processing, integrating, and retaining of communication information. This article is part of a Special Issue entitled SI: Auditory working memory. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Development of Trivia Game for speech understanding in background noise.

    Science.gov (United States)

    Schwartz, Kathryn; Ringleb, Stacie I; Sandberg, Hilary; Raymer, Anastasia; Watson, Ginger S

    2015-01-01

    Listening in noise is an everyday activity and poses a challenge for many people. To improve the ability to understand speech in noise, a computerized auditory rehabilitation game was developed. In Trivia Game players are challenged to answer trivia questions spoken aloud. As players progress through the game, the level of background noise increases. A study using Trivia Game was conducted as a proof-of-concept investigation in healthy participants. College students with normal hearing were randomly assigned to a control (n = 13) or a treatment (n = 14) group. Treatment participants played Trivia Game 12 times over a 4-week period. All participants completed objective (auditory-only and audiovisual formats) and subjective listening in noise measures at baseline and 4 weeks later. There were no statistical differences between the groups at baseline. At post-test, the treatment group significantly improved their overall speech understanding in noise in the audiovisual condition and reported significant benefits in their functional listening abilities. Playing Trivia Game improved speech understanding in noise in healthy listeners. Significant findings for the audiovisual condition suggest that participants improved face-reading abilities. Trivia Game may be a platform for investigating changes in speech understanding in individuals with sensory, linguistic and cognitive impairments.

  1. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  2. Mild developmental foreign accent syndrome and psychiatric comorbidity: Altered white matter integrity in speech and emotion regulation networks

    Directory of Open Access Journals (Sweden)

    Marcelo L Berthier

    2016-08-01

    Full Text Available Foreign accent syndrome (FAS is a speech disorder that is defined by the emergence of a peculiar manner of articulation and intonation which is perceived as foreign. In most cases of acquired FAS (AFAS the new accent is secondary to small focal lesions involving components of the bilaterally distributed neural network for speech production. In the past few years FAS has also been described in different psychiatric conditions (conversion disorder, bipolar disorder, schizophrenia as well as in developmental disorders (specific language impairment, apraxia of speech. In the present study, two adult males, one with atypical phonetic production and the other one with cluttering, reported having developmental FAS (DFAS since their adolescence. Perceptual analysis by naïve judges could not confirm the presence of foreign accent, possibly due to the mildness of the speech disorder. However, detailed linguistic analysis provided evidence of prosodic and segmental errors previously reported in AFAS cases. Cognitive testing showed reduced communication in activities of daily living and mild deficits related to psychiatric disorders. Psychiatric evaluation revealed long-lasting internalizing disorders (neuroticism, anxiety, obsessive-compulsive disorder, social phobia, depression, alexithymia, hopelessness, and apathy in both subjects. Diffusion tensor imaging (DTI data from each subject with DFAS were compared with data from a group of 21 age- and gender-matched healthy control subjects. Diffusion parameters (MD, AD, and RD in predefined regions of interest showed changes of white matter microstructure in regions previously related with AFAS and psychiatric disorders. In conclusion, the present findings militate against the possibility that these two subjects have FAS of psychogenic origin. Rather, our findings provide evidence that mild DFAS occurring in the context of subtle, yet persistent, developmental speech disorders may be associated with

  3. Semantic congruency but not temporal synchrony enhances long-term memory performance for audio-visual scenes.

    Science.gov (United States)

    Meyerhoff, Hauke S; Huff, Markus

    2016-04-01

    Human long-term memory for visual objects and scenes is tremendous. Here, we test how auditory information contributes to long-term memory performance for realistic scenes. In a total of six experiments, we manipulated the presentation modality (auditory, visual, audio-visual) as well as semantic congruency and temporal synchrony between auditory and visual information of brief filmic clips. Our results show that audio-visual clips generally elicit more accurate memory performance than unimodal clips. This advantage even increases with congruent visual and auditory information. However, violations of audio-visual synchrony hardly have any influence on memory performance. Memory performance remained intact even with a sequential presentation of auditory and visual information, but finally declined when the matching tracks of one scene were presented separately with intervening tracks during learning. With respect to memory performance, our results therefore show that audio-visual integration is sensitive to semantic congruency but remarkably robust against asymmetries between different modalities.

  4. Electrophysiological correlates of predictive coding of auditory location in the perception of natural audiovisual events

    Directory of Open Access Journals (Sweden)

    Jeroen eStekelenburg

    2012-05-01

    Full Text Available In many natural audiovisual events (e.g., a clap of the two hands, the visual signal precedes the sound and thus allows observers to predict when, where, and which sound will occur. Previous studies have already reported that there are distinct neural correlates of temporal (when versus phonetic/semantic (which content on audiovisual integration. Here we examined the effect of visual prediction of auditory location (where in audiovisual biological motion stimuli by varying the spatial congruency between the auditory and visual part of the audiovisual stimulus. Visual stimuli were presented centrally, whereas auditory stimuli were presented either centrally or at 90° azimuth. Typical subadditive amplitude reductions (AV – V < A were found for the auditory N1 and P2 for spatially congruent and incongruent conditions. The new finding is that the N1 suppression was larger for spatially congruent stimuli. A very early audiovisual interaction was also found at 30-50 ms in the spatially congruent condition, while no effect of congruency was found on the suppression of the P2. This indicates that visual prediction of auditory location can be coded very early in auditory processing.

  5. A Novel Audiovisual Brain-Computer Interface and Its Application in Awareness Detection

    Science.gov (United States)

    Wang, Fei; He, Yanbin; Pan, Jiahui; Xie, Qiuyou; Yu, Ronghao; Zhang, Rui; Li, Yuanqing

    2015-01-01

    Currently, detecting awareness in patients with disorders of consciousness (DOC) is a challenging task, which is commonly addressed through behavioral observation scales such as the JFK Coma Recovery Scale-Revised. Brain-computer interfaces (BCIs) provide an alternative approach to detect awareness in patients with DOC. However, these patients have a much lower capability of using BCIs compared to healthy individuals. This study proposed a novel BCI using temporally, spatially, and semantically congruent audiovisual stimuli involving numbers (i.e., visual and spoken numbers). Subjects were instructed to selectively attend to the target stimuli cued by instruction. Ten healthy subjects first participated in the experiment to evaluate the system. The results indicated that the audiovisual BCI system outperformed auditory-only and visual-only systems. Through event-related potential analysis, we observed audiovisual integration effects for target stimuli, which enhanced the discriminability between brain responses for target and nontarget stimuli and thus improved the performance of the audiovisual BCI. This system was then applied to detect the awareness of seven DOC patients, five of whom exhibited command following as well as number recognition. Thus, this audiovisual BCI system may be used as a supportive bedside tool for awareness detection in patients with DOC. PMID:26123281

  6. Effects of Audiovisual Media on L2 Listening Comprehension: A Preliminary Study in French

    Science.gov (United States)

    Becker, Shannon R.; Sturm, Jessica L.

    2017-01-01

    The purpose of the present study was to determine whether integrating online audiovisual materials into the listening instruction of L2 French learners would have a measurable impact on their listening comprehension development. Students from two intact sections of second-semester French were tested on their listening comprehension before and…

  7. Visual Cues Contribute Differentially to Audiovisual Perception of Consonants and Vowels in Improving Recognition and Reducing Cognitive Demands in Listeners With Hearing Impairment Using Hearing Aids.

    Science.gov (United States)

    Moradi, Shahram; Lidestam, Björn; Danielsson, Henrik; Ng, Elaine Hoi Ning; Rönnberg, Jerker

    2017-09-18

    We sought to examine the contribution of visual cues in audiovisual identification of consonants and vowels-in terms of isolation points (the shortest time required for correct identification of a speech stimulus), accuracy, and cognitive demands-in listeners with hearing impairment using hearing aids. The study comprised 199 participants with hearing impairment (mean age = 61.1 years) with bilateral, symmetrical, mild-to-severe sensorineural hearing loss. Gated Swedish consonants and vowels were presented aurally and audiovisually to participants. Linear amplification was adjusted for each participant to assure audibility. The reading span test was used to measure participants' working memory capacity. Audiovisual presentation resulted in shortened isolation points and improved accuracy for consonants and vowels relative to auditory-only presentation. This benefit was more evident for consonants than vowels. In addition, correlations and subsequent analyses revealed that listeners with higher scores on the reading span test identified both consonants and vowels earlier in auditory-only presentation, but only vowels (not consonants) in audiovisual presentation. Consonants and vowels differed in terms of the benefits afforded from their associative visual cues, as indicated by the degree of audiovisual benefit and reduction in cognitive demands linked to the identification of consonants and vowels presented audiovisually.

  8. Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

    Science.gov (United States)

    Borrie, Stephanie A

    2015-03-01

    This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.

  9. Hysteresis in audiovisual synchrony perception.

    Directory of Open Access Journals (Sweden)

    Jean-Rémy Martin

    Full Text Available The effect of stimulation history on the perception of a current event can yield two opposite effects, namely: adaptation or hysteresis. The perception of the current event thus goes in the opposite or in the same direction as prior stimulation, respectively. In audiovisual (AV synchrony perception, adaptation effects have primarily been reported. Here, we tested if perceptual hysteresis could also be observed over adaptation in AV timing perception by varying different experimental conditions. Participants were asked to judge the synchrony of the last (test stimulus of an AV sequence with either constant or gradually changing AV intervals (constant and dynamic condition, respectively. The onset timing of the test stimulus could be cued or not (prospective vs. retrospective condition, respectively. We observed hysteretic effects for AV synchrony judgments in the retrospective condition that were independent of the constant or dynamic nature of the adapted stimuli; these effects disappeared in the prospective condition. The present findings suggest that knowing when to estimate a stimulus property has a crucial impact on perceptual simultaneity judgments. Our results extend beyond AV timing perception, and have strong implications regarding the comparative study of hysteresis and adaptation phenomena.

  10. A promessa do audiovisual interativo

    Directory of Open Access Journals (Sweden)

    João Baptista Winck

    Full Text Available A cadeia produtiva do audiovisual utiliza o capital cultural, especialmente a criatividade, como sua principal fonte de recursos, inaugurando o que se vem chamando de economia criativa. Essa cadeia de valor manufatura a inventividade como matéria-prima, transformado idéias em objetos de consumo de larga escala. A indústria da televisão está inserida num conglomerado maior de indústrias, como a da moda, das artes, da música etc. Esse gigantesco parque tecnológico reúne as atividades que têm a criação como valor, sua produção em escala como meio e o incremento da propriedade intelectual como fim em si mesmo. A industrialização da criatividade, aos poucos, está alterando o corpo teórico acerca do que se pensa sobre as relações de trabalho, as ferramentas e, acima de tudo, o conceito de bens como produto da inteligência.

  11. On the Role of Crossmodal Prediction in Audiovisual Emotion Perception

    Directory of Open Access Journals (Sweden)

    Sarah eJessen

    2013-07-01

    Full Text Available Humans rely on multiple sensory modalities to determine the emotional state of others. In fact, such multisensory perception may be one of the mechanisms explaining the ease and efficiency by which others’ emotions are recognized. But how and when exactly do the different modalities interact? One aspect in multisensory perception that has received increasing interest in recent years is the concept of crossmodal prediction. In emotion perception, as in most other settings, visual information precedes the auditory one. Thereby, leading in visual information can facilitate subsequent auditory processing. While this mechanism has often been described in audiovisual speech perception, it has not been addressed so far in audiovisual emotion perception. Based on the current state of the art in (a crossmodal prediction and (b multisensory emotion perception research, we propose that it is essential to consider the former in order to fully understand the latter. Focusing on electroencephalographic (EEG and magnetoencephalographic (MEG studies, we provide a brief overview of the current research in both fields. In discussing these findings, we suggest that emotional visual information may allow for a more reliable prediction of auditory information compared to non-emotional visual information. In support of this hypothesis, we present a re-analysis of a previous data set that shows an inverse correlation between the N1 response in the EEG and the duration of visual emotional but not non-emotional information. If the assumption that emotional content allows for more reliable predictions can be corroborated in future studies, crossmodal prediction is a crucial factor in our understanding of multisensory emotion perception.

  12. On the role of crossmodal prediction in audiovisual emotion perception.

    Science.gov (United States)

    Jessen, Sarah; Kotz, Sonja A

    2013-01-01

    Humans rely on multiple sensory modalities to determine the emotional state of others. In fact, such multisensory perception may be one of the mechanisms explaining the ease and efficiency by which others' emotions are recognized. But how and when exactly do the different modalities interact? One aspect in multisensory perception that has received increasing interest in recent years is the concept of cross-modal prediction. In emotion perception, as in most other settings, visual information precedes the auditory information. Thereby, leading in visual information can facilitate subsequent auditory processing. While this mechanism has often been described in audiovisual speech perception, so far it has not been addressed in audiovisual emotion perception. Based on the current state of the art in (a) cross-modal prediction and (b) multisensory emotion perception research, we propose that it is essential to consider the former in order to fully understand the latter. Focusing on electroencephalographic (EEG) and magnetoencephalographic (MEG) studies, we provide a brief overview of the current research in both fields. In discussing these findings, we suggest that emotional visual information may allow more reliable predicting of auditory information compared to non-emotional visual information. In support of this hypothesis, we present a re-analysis of a previous data set that shows an inverse correlation between the N1 EEG response and the duration of visual emotional, but not non-emotional information. If the assumption that emotional content allows more reliable predicting can be corroborated in future studies, cross-modal prediction is a crucial factor in our understanding of multisensory emotion perception.

  13. Neural pathways for visual speech perception

    Directory of Open Access Journals (Sweden)

    Lynne E Bernstein

    2014-12-01

    Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.

  14. The effectiveness of Speech-Music Therapy for Aphasia (SMTA) in five speakers with Apraxia of Speech and aphasia

    NARCIS (Netherlands)

    Hurkmans, Joost; Jonkers, Roel; de Bruijn, Madeleen; Boonstra, Anne M.; Hartman, Paul P.; Arendzen, Hans; Reinders - Messelink, Heelen

    2015-01-01

    Background: Several studies using musical elements in the treatment of neurological language and speech disorders have reported improvement of speech production. One such programme, Speech-Music Therapy for Aphasia (SMTA), integrates speech therapy and music therapy (MT) to treat the individual with

  15. Audiovisual preservation strategies, data models and value-chains

    OpenAIRE

    Addis, Matthew; Wright, Richard

    2010-01-01

    This is a report on preservation strategies, models and value-chains for digital file-based audiovisual content. The report includes: (a)current and emerging value-chains and business-models for audiovisual preservation;(b) a comparison of preservation strategies for audiovisual content including their strengths and weaknesses, and(c) a review of current preservation metadata models, and requirements for extension to support audiovisual files.

  16. A Catalan code of best practices for the audiovisual sector

    OpenAIRE

    Teodoro, Emma; Casanovas, Pompeu

    2010-01-01

    In spite of a new general law regarding Audiovisual Communication, the regulatory framework of the audiovisual sector in Spain can still be defined as huge, disperse and obsolete. The first part of this paper provides an overview of the major challenges of the Spanish audiovisual sector as a result of the convergence of platforms, services and operators, paying especial attention to the Audiovisual Sector in Catalonia. In the second part, we will present an example of self-regulation through...

  17. A simple and efficient method to enhance audiovisual binding tendencies

    Directory of Open Access Journals (Sweden)

    Brian Odegaard

    2017-04-01

    Full Text Available Individuals vary in their tendency to bind signals from multiple senses. For the same set of sights and sounds, one individual may frequently integrate multisensory signals and experience a unified percept, whereas another individual may rarely bind them and often experience two distinct sensations. Thus, while this binding/integration tendency is specific to each individual, it is not clear how plastic this tendency is in adulthood, and how sensory experiences may cause it to change. Here, we conducted an exploratory investigation which provides evidence that (1 the brain’s tendency to bind in spatial perception is plastic, (2 that it can change following brief exposure to simple audiovisual stimuli, and (3 that exposure to temporally synchronous, spatially discrepant stimuli provides the most effective method to modify it. These results can inform current theories about how the brain updates its internal model of the surrounding sensory world, as well as future investigations seeking to increase integration tendencies.

  18. 29 CFR 2.12 - Audiovisual coverage permitted.

    Science.gov (United States)

    2010-07-01

    ... 29 Labor 1 2010-07-01 2010-07-01 true Audiovisual coverage permitted. 2.12 Section 2.12 Labor Office of the Secretary of Labor GENERAL REGULATIONS Audiovisual Coverage of Administrative Hearings § 2.12 Audiovisual coverage permitted. The following are the types of hearings where the Department...

  19. Hearing faces: how the infant brain matches the face it sees with the speech it hears.

    Science.gov (United States)

    Bristow, Davina; Dehaene-Lambertz, Ghislaine; Mattout, Jeremie; Soares, Catherine; Gliga, Teodora; Baillet, Sylvain; Mangin, Jean-François

    2009-05-01

    Speech is not a purely auditory signal. From around 2 months of age, infants are able to correctly match the vowel they hear with the appropriate articulating face. However, there is no behavioral evidence of integrated audiovisual perception until 4 months of age, at the earliest, when an illusory percept can be created by the fusion of the auditory stimulus and of the facial cues (McGurk effect). To understand how infants initially match the articulatory movements they see with the sounds they hear, we recorded high-density ERPs in response to auditory vowels that followed a congruent or incongruent silently articulating face in 10-week-old infants. In a first experiment, we determined that auditory-visual integration occurs during the early stages of perception as in adults. The mismatch response was similar in timing and in topography whether the preceding vowels were presented visually or aurally. In the second experiment, we studied audiovisual integration in the linguistic (vowel perception) and nonlinguistic (gender perception) domain. We observed a mismatch response for both types of change at similar latencies. Their topographies were significantly different demonstrating that cross-modal integration of these features is computed in parallel by two different networks. Indeed, brain source modeling revealed that phoneme and gender computations were lateralized toward the left and toward the right hemisphere, respectively, suggesting that each hemisphere possesses an early processing bias. We also observed repetition suppression in temporal regions and repetition enhancement in frontal regions. These results underscore how complex and structured is the human cortical organization which sustains communication from the first weeks of life on.

  20. Bayesian calibration of simultaneity in audiovisual temporal order judgments.

    Directory of Open Access Journals (Sweden)

    Shinya Yamamoto

    Full Text Available After repeated exposures to two successive audiovisual stimuli presented in one frequent order, participants eventually perceive a pair separated by some lag time in the same order as occurring simultaneously (lag adaptation. In contrast, we previously found that perceptual changes occurred in the opposite direction in response to tactile stimuli, conforming to bayesian integration theory (bayesian calibration. We further showed, in theory, that the effect of bayesian calibration cannot be observed when the lag adaptation was fully operational. This led to the hypothesis that bayesian calibration affects judgments regarding the order of audiovisual stimuli, but that this effect is concealed behind the lag adaptation mechanism. In the present study, we showed that lag adaptation is pitch-insensitive using two sounds at 1046 and 1480 Hz. This enabled us to cancel lag adaptation by associating one pitch with sound-first stimuli and the other with light-first stimuli. When we presented each type of stimulus (high- or low-tone in a different block, the point of simultaneity shifted to "sound-first" for the pitch associated with sound-first stimuli, and to "light-first" for the pitch associated with light-first stimuli. These results are consistent with lag adaptation. In contrast, when we delivered each type of stimulus in a randomized order, the point of simultaneity shifted to "light-first" for the pitch associated with sound-first stimuli, and to "sound-first" for the pitch associated with light-first stimuli. The results clearly show that bayesian calibration is pitch-specific and is at work behind pitch-insensitive lag adaptation during temporal order judgment of audiovisual stimuli.

  1. Neural Entrainment to Speech Modulates Speech Intelligibility

    NARCIS (Netherlands)

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

  2. Congruent and Incongruent Cues in Highly Familiar Audiovisual Action Sequences: An ERP Study

    Directory of Open Access Journals (Sweden)

    SM Wuerger

    2012-07-01

    Full Text Available In a previous fMRI study we found significant differences in BOLD responses for congruent and incongruent semantic audio-visual action sequences (whole-body actions and speech actions in bilateral pSTS, left SMA, left IFG, and IPL (Meyer, Greenlee, & Wuerger, JOCN, 2011. Here, we present results from a 128-channel ERP study that examined the time-course of these interactions using a one-back task. ERPs in response to congruent and incongruent audio-visual actions were compared to identify regions and latencies of differences. Responses to congruent and incongruent stimuli differed between 240–280 ms, 340–420 ms, and 460–660 ms after stimulus onset. A dipole analysis revealed that the difference around 250 ms can be partly explained by a modulation of sources in the vicinity of the superior temporal area, while the responses after 400 ms are consistent with sources in inferior frontal areas. Our results are in line with a model that postulates early recognition of congruent audiovisual actions in the pSTS, perhaps as a sensory memory buffer, and a later role of the IFG, perhaps in a generative capacity, in reconciling incongruent signals.

  3. Reproducibility and discriminability of brain patterns of semantic categories enhanced by congruent audiovisual stimuli.

    Directory of Open Access Journals (Sweden)

    Yuanqing Li

    Full Text Available One of the central questions in cognitive neuroscience is the precise neural representation, or brain pattern, associated with a semantic category. In this study, we explored the influence of audiovisual stimuli on the brain patterns of concepts or semantic categories through a functional magnetic resonance imaging (fMRI experiment. We used a pattern search method to extract brain patterns corresponding to two semantic categories: "old people" and "young people." These brain patterns were elicited by semantically congruent audiovisual, semantically incongruent audiovisual, unimodal visual, and unimodal auditory stimuli belonging to the two semantic categories. We calculated the reproducibility index, which measures the similarity of the patterns within the same category. We also decoded the semantic categories from these brain patterns. The decoding accuracy reflects the discriminability of the brain patterns between two categories. The results showed that both the reproducibility index of brain patterns and the decoding accuracy were significantly higher for semantically congruent audiovisual stimuli than for unimodal visual and unimodal auditory stimuli, while the semantically incongruent stimuli did not elicit brain patterns with significantly higher reproducibility index or decoding accuracy. Thus, the semantically congruent audiovisual stimuli enhanced the within-class reproducibility of brain patterns and the between-class discriminability of brain patterns, and facilitate neural representations of semantic categories or concepts. Furthermore, we analyzed the brain activity in superior temporal sulcus and middle temporal gyrus (STS/MTG. The strength of the fMRI signal and the reproducibility index were enhanced by the semantically congruent audiovisual stimuli. Our results support the use of the reproducibility index as a potential tool to supplement the fMRI signal amplitude for evaluating multimodal integration.

  4. Effect of Perceptual Load on Semantic Access by Speech in Children

    Science.gov (United States)

    Jerger, Susan; Damian, Markus F.; Mills, Candice; Bartlett, James; Tye-Murray, Nancy; Abdi, Herve

    2013-01-01

    Purpose: To examine whether semantic access by speech requires attention in children. Method: Children ("N" = 200) named pictures and ignored distractors on a cross-modal (distractors: auditory-no face) or multimodal (distractors: auditory-static face and audiovisual- dynamic face) picture word task. The cross-modal task had a low load,…

  5. Visual speech alters the discrimination and identification of non-intact auditory speech in children with hearing loss.

    Science.gov (United States)

    Jerger, Susan; Damian, Markus F; McAlpine, Rachel P; Abdi, Hervé

    2017-03-01

    Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/-B/aa or/-B/az). The items started with an easy-to-speechread/B/or difficult-to-speechread/G/onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/-B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same-as opposed to different-responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g.,/-B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz-as opposed to az- responses in the audiovisual than auditory mode. Performance in the audiovisual mode showed more same

  6. Visual Speech Alters the Discrimination and Identification of Non-Intact Auditory Speech in Children with Hearing Loss

    Science.gov (United States)

    Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Hervé

    2017-01-01

    Objectives Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Methods Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/–B/aa or /–B/az). The items started with an easy-to-speechread /B/ or difficult-to-speechread /G/ onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/–B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same—as opposed to different—responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g., /–B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz—as opposed to az— responses in the audiovisual than auditory mode. Results

  7. Speak, Move, Play and Learn with Children on the Autism Spectrum: Activities to Boost Communication Skills, Sensory Integration and Coordination Using Simple Ideas from Speech and Language Pathology and Occupational Therapy

    Science.gov (United States)

    Brady, Lois Jean; Gonzalez, America X.; Zawadzki, Maciej; Presley, Corinda

    2012-01-01

    This practical resource is brimming with ideas and guidance for using simple ideas from speech and language pathology and occupational therapy to boost communication, sensory integration, and coordination skills in children on the autism spectrum. Suitable for use in the classroom, at home, and in community settings, it is packed with…

  8. Audiovisual Blindsight: Audiovisual learning in the absence of primary visual cortex

    OpenAIRE

    Mehrdad eSeirafi; Peter eDe Weerd; Alan J Pegna; Beatrice ede Gelder

    2016-01-01

    Learning audiovisual associations is mediated by the primary cortical areas; however, recent animal studies suggest that such learning can take place even in the absence of the primary visual cortex. Other studies have demonstrated the involvement of extra-geniculate pathways and especially the superior colliculus (SC) in audiovisual association learning. Here, we investigated such learning in a rare human patient with complete loss of the bilateral striate cortex. We carried out an implicit...

  9. Audio-visual identification of place of articulation and voicing in white and babble noise.

    Science.gov (United States)

    Alm, Magnus; Behne, Dawn M; Wang, Yue; Eg, Ragnhild

    2009-07-01

    Research shows that noise and phonetic attributes influence the degree to which auditory and visual modalities are used in audio-visual speech perception (AVSP). Research has, however, mainly focused on white noise and single phonetic attributes, thus neglecting the more common babble noise and possible interactions between phonetic attributes. This study explores whether white and babble noise differentially influence AVSP and whether these differences depend on phonetic attributes. White and babble noise of 0 and -12 dB signal-to-noise ratio were added to congruent and incongruent audio-visual stop consonant-vowel stimuli. The audio (A) and video (V) of incongruent stimuli differed either in place of articulation (POA) or voicing. Responses from 15 young adults show that, compared to white noise, babble resulted in more audio responses for POA stimuli, and fewer for voicing stimuli. Voiced syllables received more audio responses than voiceless syllables. Results can be attributed to discrepancies in the acoustic spectra of both the noise and speech target. Voiced consonants may be more auditorily salient than voiceless consonants which are more spectrally similar to white noise. Visual cues contribute to identification of voicing, but only if the POA is visually salient and auditorily susceptible to the noise type.

  10. Speech Research

    Science.gov (United States)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  11. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  12. Search in audiovisual broadcast archives : doctoral abstract

    NARCIS (Netherlands)

    Huurnink, B.

    Documentary makers, journalists, news editors, and other media professionals routinely require previously recorded audiovisual material for new productions. For example, a news editor might wish to reuse footage shot by overseas services for the evening news, or a documentary maker might require

  13. Planning and Producing Audiovisual Materials. Third Edition.

    Science.gov (United States)

    Kemp, Jerrold E.

    A revised edition of this handbook provides illustrated, step-by-step explanations of how to plan and produce audiovisual materials. Included are sections on the fundamental skills--photography, graphics and recording sound--followed by individual sections on photographic print series, slide series, filmstrips, tape recordings, overhead…

  14. Longevity and Depreciation of Audiovisual Equipment.

    Science.gov (United States)

    Post, Richard

    1987-01-01

    Describes results of survey of media service directors at public universities in Ohio to determine the expected longevity of audiovisual equipment. Use of the Delphi technique for estimates is explained, results are compared with an earlier survey done in 1977, and use of spreadsheet software to calculate depreciation is discussed. (LRW)

  15. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  16. Visual feedback of tongue movement for novel speech sound learning

    Directory of Open Access Journals (Sweden)

    William F Katz

    2015-11-01

    Full Text Available Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV information. Second language (L2 learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals. However, little is known about the role of viewing one’s own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker’s learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ̠/; a voiced, coronal, palatal stop before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers’ productions were evaluated using kinematic (tongue-tip spatial positioning and acoustic (burst spectra measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing.

  17. Co-speech gestures influence neural activity in brain regions associated with processing semantic information.

    Science.gov (United States)

    Dick, Anthony Steven; Goldin-Meadow, Susan; Hasson, Uri; Skipper, Jeremy I; Small, Steven L

    2009-11-01

    Everyday communication is accompanied by visual information from several sources, including co-speech gestures, which provide semantic information listeners use to help disambiguate the speaker's message. Using fMRI, we examined how gestures influence neural activity in brain regions associated with processing semantic information. The BOLD response was recorded while participants listened to stories under three audiovisual conditions and one auditory-only (speech alone) condition. In the first audiovisual condition, the storyteller produced gestures that naturally accompany speech. In the second, the storyteller made semantically unrelated hand movements. In the third, the storyteller kept her hands still. In addition to inferior parietal and posterior superior and middle temporal regions, bilateral posterior superior temporal sulcus and left anterior inferior frontal gyrus responded more strongly to speech when it was further accompanied by gesture, regardless of the semantic relation to speech. However, the right inferior frontal gyrus was sensitive to the semantic import of the hand movements, demonstrating more activity when hand movements were semantically unrelated to the accompanying speech. These findings show that perceiving hand movements during speech modulates the distributed pattern of neural activation involved in both biological motion perception and discourse comprehension, suggesting listeners attempt to find meaning, not only in the words speakers produce, but also in the hand movements that accompany speech.

  18. Reduced audiovisual recalibration in the elderly.

    Science.gov (United States)

    Chan, Yu Man; Pianta, Michael J; McKendrick, Allison M

    2014-01-01

    Perceived synchrony of visual and auditory signals can be altered by exposure to a stream of temporally offset stimulus pairs. Previous literature suggests that adapting to audiovisual temporal offsets is an important recalibration to correctly combine audiovisual stimuli into a single percept across a range of source distances. Healthy aging results in synchrony perception over a wider range of temporally offset visual and auditory signals, independent of age-related unisensory declines in vision and hearing sensitivities. However, the impact of aging on audiovisual recalibration is unknown. Audiovisual synchrony perception for sound-lead and sound-lag stimuli was measured for 15 younger (22-32 years old) and 15 older (64-74 years old) healthy adults using a method-of-constant-stimuli, after adapting to a stream of visual and auditory pairs. The adaptation pairs were either synchronous or asynchronous (sound-lag of 230 ms). The adaptation effect for each observer was computed as the shift in the mean of the individually fitted psychometric functions after adapting to asynchrony. Post-adaptation to synchrony, the younger and older observers had average window widths (±standard deviation) of 326 (±80) and 448 (±105) ms, respectively. There was no adaptation effect for sound-lead pairs. Both the younger and older observers, however, perceived more sound-lag pairs as synchronous. The magnitude of the adaptation effect in the older observers was not correlated with how often they saw the adapting sound-lag stimuli as asynchronous. Our finding demonstrates that audiovisual synchrony perception adapts less with advancing age.

  19. Enhanced audio-visual interactions in the auditory cortex of elderly cochlear-implant users.

    Science.gov (United States)

    Schierholz, Irina; Finke, Mareike; Schulte, Svenja; Hauthal, Nadine; Kantzke, Christoph; Rach, Stefan; Büchner, Andreas; Dengler, Reinhard; Sandmann, Pascale

    2015-10-01

    Auditory deprivation and the restoration of hearing via a cochlear implant (CI) can induce functional plasticity in auditory cortical areas. How these plastic changes affect the ability to integrate combined auditory (A) and visual (V) information is not yet well understood. In the present study, we used electroencephalography (EEG) to examine whether age, temporary deafness and altered sensory experience with a CI can affect audio-visual (AV) interactions in post-lingually deafened CI users. Young and elderly CI users and age-matched NH listeners performed a speeded response task on basic auditory, visual and audio-visual stimuli. Regarding the behavioral results, a redundant signals effect, that is, faster response times to cross-modal (AV) than to both of the two modality-specific stimuli (A, V), was revealed for all groups of participants. Moreover, in all four groups, we found evidence for audio-visual integration. Regarding event-related responses (ERPs), we observed a more pronounced visual modulation of the cortical auditory response at N1 latency (approximately 100 ms after stimulus onset) in the elderly CI users when compared with young CI users and elderly NH listeners. Thus, elderly CI users showed enhanced audio-visual binding which may be a consequence of compensatory strategies developed due to temporary deafness and/or degraded sensory input after implantation. These results indicate that the combination of aging, sensory deprivation and CI facilitates the coupling between the auditory and the visual modality. We suggest that this enhancement in multisensory interactions could be used to optimize auditory rehabilitation, especially in elderly CI users, by the application of strong audio-visually based rehabilitation strategies after implant switch-on. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Does dynamic information about the speaker's face contribute to semantic speech processing? ERP evidence.

    Science.gov (United States)

    Hernández-Gutiérrez, David; Abdel Rahman, Rasha; Martín-Loeches, Manuel; Muñoz, Francisco; Schacht, Annekathrin; Sommer, Werner

    2018-07-01

    Face-to-face interactions characterize communication in social contexts. These situations are typically multimodal, requiring the integration of linguistic auditory input with facial information from the speaker. In particular, eye gaze and visual speech provide the listener with social and linguistic information, respectively. Despite the importance of this context for an ecological study of language, research on audiovisual integration has mainly focused on the phonological level, leaving aside effects on semantic comprehension. Here we used event-related potentials (ERPs) to investigate the influence of facial dynamic information on semantic processing of connected speech. Participants were presented with either a video or a still picture of the speaker, concomitant to auditory sentences. Along three experiments, we manipulated the presence or absence of the speaker's dynamic facial features (mouth and eyes) and compared the amplitudes of the semantic N400 elicited by unexpected words. Contrary to our predictions, the N400 was not modulated by dynamic facial information; therefore, semantic processing seems to be unaffected by the speaker's gaze and visual speech. Even though, during the processing of expected words, dynamic faces elicited a long-lasting late posterior positivity compared to the static condition. This effect was significantly reduced when the mouth of the speaker was covered. Our findings may indicate an increase of attentional processing to richer communicative contexts. The present findings also demonstrate that in natural communicative face-to-face encounters, perceiving the face of a speaker in motion provides supplementary information that is taken into account by the listener, especially when auditory comprehension is non-demanding. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  2. Catching Audiovisual Interactions With a First-Person Fisherman Video Game.

    Science.gov (United States)

    Sun, Yile; Hickey, Timothy J; Shinn-Cunningham, Barbara; Sekuler, Robert

    2017-07-01

    The human brain is excellent at integrating information from different sources across multiple sensory modalities. To examine one particularly important form of multisensory interaction, we manipulated the temporal correlation between visual and auditory stimuli in a first-person fisherman video game. Subjects saw rapidly swimming fish whose size oscillated, either at 6 or 8 Hz. Subjects categorized each fish according to its rate of size oscillation, while trying to ignore a concurrent broadband sound seemingly emitted by the fish. In three experiments, categorization was faster and more accurate when the rate at which a fish oscillated in size matched the rate at which the accompanying, task-irrelevant sound was amplitude modulated. Control conditions showed that the difference between responses to matched and mismatched audiovisual signals reflected a performance gain in the matched condition, rather than a cost from the mismatched condition. The performance advantage with matched audiovisual signals was remarkably robust over changes in task demands between experiments. Performance with matched or unmatched audiovisual signals improved over successive trials at about the same rate, emblematic of perceptual learning in which visual oscillation rate becomes more discriminable with experience. Finally, analysis at the level of individual subjects' performance pointed to differences in the rates at which subjects can extract information from audiovisual stimuli.

  3. Audiovisual preconditioning enhances the efficacy of an anatomical dissection course: A randomised study.

    Science.gov (United States)

    Collins, Anne M; Quinlan, Christine S; Dolan, Roisin T; O'Neill, Shane P; Tierney, Paul; Cronin, Kevin J; Ridgway, Paul F

    2015-07-01

    The benefits of incorporating audiovisual materials into learning are well recognised. The outcome of integrating such a modality in to anatomical education has not been reported previously. The aim of this randomised study was to determine whether audiovisual preconditioning is a useful adjunct to learning at an upper limb dissection course. Prior to instruction participants completed a standardised pre course multiple-choice questionnaire (MCQ). The intervention group was subsequently shown a video with a pre-recorded commentary. Following initial dissection, both groups completed a second MCQ. The final MCQ was completed at the conclusion of the course. Statistical analysis confirmed a significant improvement in the performance in both groups over the duration of the three MCQs. The intervention group significantly outperformed their control group counterparts immediately following audiovisual preconditioning and in the post course MCQ. Audiovisual preconditioning is a practical and effective tool that should be incorporated in to future course curricula to optimise learning. Level of evidence This study appraises an intervention in medical education. Kirkpatrick Level 2b (modification of knowledge). Copyright © 2015 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. All rights reserved.

  4. The audiovisual communication policy of the socialist Government (2004-2009: A neoliberal turn

    Directory of Open Access Journals (Sweden)

    Ramón Zallo, Ph. D.

    2010-01-01

    Full Text Available The first legislature of Jose Luis Rodriguez Zapatero’s government (2004-08 generated important initiatives for some progressive changes in the public communicative system. However, all of these initiatives have been dissolving in the second legislature to give way to a non-regulated and privatizing model that is detrimental to the public service. Three phases can be distinguished, even temporarily: the first one is characterized by interesting reforms; followed by contradictory reforms and, in the second legislature, an accumulation of counter reforms, that lead the system towards a communicative system model completely different from the one devised in the first legislature. This indicates that there has been not one but two different audiovisual policies running the cyclical route of the audiovisual policy from one end to the other. The emphasis has changed from the public service to private concentration; from decentralization to centralization; from the diffusion of knowledge to the accumulation and appropriation of the cognitive capital; from the Keynesian model - combined with the Schumpeterian model and a preference for social access - to a delayed return to the neoliberal model, after having distorted the market through public decisions in the benefit of the most important audiovisual services providers. All this seems to crystallize the impressive process of concentration occurring between audiovisual services providers in two large groups that would be integrated by Mediaset and Sogecable and - in negotiations - between Antena 3 and Imagina. A combination of neo-statist restructuring of the market and neo-liberalism.

  5. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...

  6. 77 FR 16561 - Certain Audiovisual Components and Products Containing the Same; Notice of Receipt of Complaint...

    Science.gov (United States)

    2012-03-21

    ... INTERNATIONAL TRADE COMMISSION [DN 2884] Certain Audiovisual Components and Products Containing.... International Trade Commission has received a complaint entitled Certain Audiovisual Components and Products... audiovisual components and products containing the same. The complaint names as respondents Funai Electric...

  7. 77 FR 16560 - Certain Audiovisual Components and Products Containing the Same; Notice of Receipt of Complaint...

    Science.gov (United States)

    2012-03-21

    ... INTERNATIONAL TRADE COMMISSION [DN 2884] Certain Audiovisual Components and Products Containing.... International Trade Commission has received a complaint entitled Certain Audiovisual Components and Products... audiovisual components and products containing the same. The complaint names as respondents Funai Electric...

  8. Automated social skills training with audiovisual information.

    Science.gov (United States)

    Tanaka, Hiroki; Sakti, Sakriani; Neubig, Graham; Negoro, Hideki; Iwasaka, Hidemi; Nakamura, Satoshi

    2016-08-01

    People with social communication difficulties tend to have superior skills using computers, and as a result computer-based social skills training systems are flourishing. Social skills training, performed by human trainers, is a well-established method to obtain appropriate skills in social interaction. Previous works have attempted to automate one or several parts of social skills training through human-computer interaction. However, while previous work on simulating social skills training considered only acoustic and linguistic features, human social skills trainers take into account visual features (e.g. facial expression, posture). In this paper, we create and evaluate a social skills training system that closes this gap by considering audiovisual features regarding ratio of smiling, yaw, and pitch. An experimental evaluation measures the difference in effectiveness of social skill training when using audio features and audiovisual features. Results showed that the visual features were effective to improve users' social skills.

  9. Inactivation of Primate Prefrontal Cortex Impairs Auditory and Audiovisual Working Memory.

    Science.gov (United States)

    Plakke, Bethany; Hwang, Jaewon; Romanski, Lizabeth M

    2015-07-01

    The prefrontal cortex is associated with cognitive functions that include planning, reasoning, decision-making, working memory, and communication. Neurophysiology and neuropsychology studies have established that dorsolateral prefrontal cortex is essential in spatial working memory while the ventral frontal lobe processes language and communication signals. Single-unit recordings in nonhuman primates has shown that ventral prefrontal (VLPFC) neurons integrate face and vocal information and are active during audiovisual working memory. However, whether VLPFC is essential in remembering face and voice information is unknown. We therefore trained nonhuman primates in an audiovisual working memory paradigm using naturalistic face-vocalization movies as memoranda. We inactivated VLPFC, with reversible cortical cooling, and examined performance when faces, vocalizations or both faces and vocalization had to be remembered. We found that VLPFC inactivation impaired subjects' performance in audiovisual and auditory-alone versions of the task. In contrast, VLPFC inactivation did not disrupt visual working memory. Our studies demonstrate the importance of VLPFC in auditory and audiovisual working memory for social stimuli but suggest a different role for VLPFC in unimodal visual processing. The ventral frontal lobe, or inferior frontal gyrus, plays an important role in audiovisual communication in the human brain. Studies with nonhuman primates have found that neurons within ventral prefrontal cortex (VLPFC) encode both faces and vocalizations and that VLPFC is active when animals need to remember these social stimuli. In the present study, we temporarily inactivated VLPFC by cooling the cortex while nonhuman primates performed a working memory task. This impaired the ability of subjects to remember a face and vocalization pair or just the vocalization alone. Our work highlights the importance of the primate VLPFC in the processing of faces and vocalizations in a manner that

  10. Alterations in audiovisual simultaneity perception in amblyopia

    OpenAIRE

    Richards, Michael D.; Goltz, Herbert C.; Wong, Agnes M. F.

    2017-01-01

    Amblyopia is a developmental visual impairment that is increasingly recognized to affect higher-level perceptual and multisensory processes. To further investigate the audiovisual (AV) perceptual impairments associated with this condition, we characterized the temporal interval in which asynchronous auditory and visual stimuli are perceived as simultaneous 50% of the time (i.e., the AV simultaneity window). Adults with unilateral amblyopia (n = 17) and visually normal controls (n = 17) judged...

  11. Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2017-01-01

    Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....

  12. Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2018-01-01

    Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....

  13. [Accommodation effects of the audiovisual stimulation in the patients experiencing eyestrain with the concomitant disturbances of psychological adaptation].

    Science.gov (United States)

    Shakula, A V; Emel'ianov, G A

    2014-01-01

    The present study was designed to evaluate the effectiveness of audiovisual stimulation on the state of the eye accommodation system in the patients experiencing eyes train with the concomitant disturbances of psychological. It was shown that a course of audiovisual stimulation (seeing a psychorelaxing film accompanied by a proper music) results in positive (5.9-21.9%) dynamics of the objective accommodation parameters and of the subjective status (4.5-33.2%). Taken together, these findings whole allow this method to be regarded as "relaxing preparation" in the integral complex of the measures for the preservation of the professional vision in this group of the patients.

  14. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Georgios Mantokoudis

    Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  15. Seeing the Talker's Face Improves Free Recall of Speech for Young Adults With Normal Hearing but Not Older Adults With Hearing Loss.

    Science.gov (United States)

    Rudner, Mary; Mishra, Sushmit; Stenfelt, Stefan; Lunner, Thomas; Rönnberg, Jerker

    2016-06-01

    Seeing the talker's face improves speech understanding in noise, possibly releasing resources for cognitive processing. We investigated whether it improves free recall of spoken two-digit numbers. Twenty younger adults with normal hearing and 24 older adults with hearing loss listened to and subsequently recalled lists of 13 two-digit numbers, with alternating male and female talkers. Lists were presented in quiet as well as in stationary and speech-like noise at a signal-to-noise ratio giving approximately 90% intelligibility. Amplification compensated for loss of audibility. Seeing the talker's face improved free recall performance for the younger but not the older group. Poorer performance in background noise was contingent on individual differences in working memory capacity. The effect of seeing the talker's face did not differ in quiet and noise. We have argued that the absence of an effect of seeing the talker's face for older adults with hearing loss may be due to modulation of audiovisual integration mechanisms caused by an interaction between task demands and participant characteristics. In particular, we suggest that executive task demands and interindividual executive skills may play a key role in determining the benefit of seeing the talker's face during a speech-based cognitive task.

  16. Commencement Speech as a Hybrid Polydiscursive Practice

    Directory of Open Access Journals (Sweden)

    Светлана Викторовна Иванова

    2017-12-01

    Full Text Available Discourse and media communication researchers pay attention to the fact that popular discursive and communicative practices have a tendency to hybridization and convergence. Discourse which is understood as language in use is flexible. Consequently, it turns out that one and the same text can represent several types of discourses. A vivid example of this tendency is revealed in American commencement speech / commencement address / graduation speech. A commencement speech is a speech university graduates are addressed with which in compliance with the modern trend is delivered by outstanding media personalities (politicians, athletes, actors, etc.. The objective of this study is to define the specificity of the realization of polydiscursive practices within commencement speech. The research involves discursive, contextual, stylistic and definitive analyses. Methodologically the study is based on the discourse analysis theory, in particular the notion of a discursive practice as a verbalized social practice makes up the conceptual basis of the research. This research draws upon a hundred commencement speeches delivered by prominent representatives of American society since 1980s till now. In brief, commencement speech belongs to institutional discourse public speech embodies. Commencement speech institutional parameters are well represented in speeches delivered by people in power like American and university presidents. Nevertheless, as the results of the research indicate commencement speech institutional character is not its only feature. Conceptual information analysis enables to refer commencement speech to didactic discourse as it is aimed at teaching university graduates how to deal with challenges life is rich in. Discursive practices of personal discourse are also actively integrated into the commencement speech discourse. More than that, existential discursive practices also find their way into the discourse under study. Commencement

  17. The Galker test of speech reception in noise

    DEFF Research Database (Denmark)

    Lauritsen, Maj-Britt Glenn; Söderström, Margareta; Kreiner, Svend

    2016-01-01

    PURPOSE: We tested "the Galker test", a speech reception in noise test developed for primary care for Danish preschool children, to explore if the children's ability to hear and understand speech was associated with gender, age, middle ear status, and the level of background noise. METHODS......: The Galker test is a 35-item audio-visual, computerized word discrimination test in background noise. Included were 370 normally developed children attending day care center. The children were examined with the Galker test, tympanometry, audiometry, and the Reynell test of verbal comprehension. Parents...... and daycare teachers completed questionnaires on the children's ability to hear and understand speech. As most of the variables were not assessed using interval scales, non-parametric statistics (Goodman-Kruskal's gamma) were used for analyzing associations with the Galker test score. For comparisons...

  18. Speech disorders - children

    Science.gov (United States)

    ... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...

  19. Speech Processing.

    Science.gov (United States)

    1983-05-01

    The VDE system developed had the capability of recognizing up to 248 separate words in syntactic structures. 4 The two systems described are isolated...AND SPEAKER RECOGNITION by M.J.Hunt 5 ASSESSMENT OF SPEECH SYSTEMS ’ ..- * . by R.K.Moore 6 A SURVEY OF CURRENT EQUIPMENT AND RESEARCH’ by J.S.Bridle...TECHNOLOGY IN NAVY TRAINING SYSTEMS by R.Breaux, M.Blind and R.Lynchard 10 9 I-I GENERAL REVIEW OF MILITARY APPLICATIONS OF VOICE PROCESSING DR. BRUNO

  20. Speech Recognition

    Directory of Open Access Journals (Sweden)

    Adrian Morariu

    2009-01-01

    Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.

  1. The Effects of Audiovisual Stimulation on the Acceptance of Background Noise.

    Science.gov (United States)

    Plyler, Patrick N; Lang, Rowan; Monroe, Amy L; Gaudiano, Paul

    2015-05-01

    Previous examinations of noise acceptance have been conducted using an auditory stimulus only; however, the effect of visual speech supplementation of the auditory stimulus on acceptance of noise remains limited. The purpose of the present study was to determine the effect of audiovisual stimulation on the acceptance of noise in listeners with normal and impaired hearing. A repeated measures design was utilized. A total of 92 adult participants were recruited for this experiment. Of these participants, 54 were listeners with normal hearing and 38 were listeners with sensorineural hearing impairment. Most comfortable levels and acceptable noise levels (ANL) were obtained using auditory and auditory-visual stimulation modes for the unaided listening condition for each participant and for the aided listening condition for 35 of the participants with impaired hearing that owned hearing aids. Speech reading ability was assessed using the Utley test for each participant. The addition of visual input did not impact the most comfortable level values for listeners in either group; however, visual input improved unaided ANL values for listeners with normal hearing and aided ANL values in listeners with impaired hearing. ANL benefit received from visual speech input was related to the auditory ANL in listeners in each group; however, it was not related to speech reading ability for either listener group in any experimental condition. Visual speech input can significantly impact measures of noise acceptance. The current ANL measure may not accurately reflect acceptance of noise values when in more realistic environments, where the signal of interest is both audible and visible to the listener. American Academy of Audiology.

  2. Understanding the basics of audiovisual archiving in Africa and the ...

    African Journals Online (AJOL)

    In the developed world, the cultural value of the audiovisual media gained legitimacy and widening acceptance after World War II, and this is what Africa still requires. There are a lot of problems in Africa, and because of this, activities such as preservation of a historical record, especially in the audiovisual media are seen as ...

  3. Trigger videos on the Web: Impact of audiovisual design

    NARCIS (Netherlands)

    Verleur, R.; Heuvelman, A.; Verhagen, Pleunes Willem

    2011-01-01

    Audiovisual design might impact emotional responses, as studies from the 1970s and 1980s on movie and television content show. Given today's abundant presence of web-based videos, this study investigates whether audiovisual design will impact web-video content in a similar way. The study is

  4. Audiovisual Archive Exploitation in the Networked Information Society

    NARCIS (Netherlands)

    Ordelman, Roeland J.F.

    2011-01-01

    Safeguarding the massive body of audiovisual content, including rich music collections, in audiovisual archives and enabling access for various types of user groups is a prerequisite for unlocking the social-economic value of these collections. Data quantities and the need for specific content

  5. Decision-level fusion for audio-visual laughter detection

    NARCIS (Netherlands)

    Reuderink, B.; Poel, M.; Truong, K.; Poppe, R.; Pantic, M.

    2008-01-01

    Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is

  6. Haptic and Audio-visual Stimuli: Enhancing Experiences and Interaction

    NARCIS (Netherlands)

    Nijholt, Antinus; Dijk, Esko O.; Lemmens, Paul M.C.; Luitjens, S.B.

    2010-01-01

    The intention of the symposium on Haptic and Audio-visual stimuli at the EuroHaptics 2010 conference is to deepen the understanding of the effect of combined Haptic and Audio-visual stimuli. The knowledge gained will be used to enhance experiences and interactions in daily life. To this end, a

  7. Knowledge Generated by Audiovisual Narrative Action Research Loops

    Science.gov (United States)

    Bautista Garcia-Vera, Antonio

    2012-01-01

    We present data collected from the research project funded by the Ministry of Education and Science of Spain entitled "Audiovisual Narratives and Intercultural Relations in Education." One of the aims of the research was to determine the nature of thought processes occurring during audiovisual narratives. We studied the possibility of…

  8. Use of Audiovisual Texts in University Education Process

    Science.gov (United States)

    Aleksandrov, Evgeniy P.

    2014-01-01

    Audio-visual learning technologies offer great opportunities in the development of students' analytical and projective abilities. These technologies can be used in classroom activities and for homework. This article discusses the features of audiovisual media texts use in a series of social sciences and humanities in the University curriculum.

  9. Trigger Videos on the Web: Impact of Audiovisual Design

    Science.gov (United States)

    Verleur, Ria; Heuvelman, Ard; Verhagen, Plon W.

    2011-01-01

    Audiovisual design might impact emotional responses, as studies from the 1970s and 1980s on movie and television content show. Given today's abundant presence of web-based videos, this study investigates whether audiovisual design will impact web-video content in a similar way. The study is motivated by the potential influence of video-evoked…

  10. Audiovisual consumption and its social logics on the web

    OpenAIRE

    Rose Marie Santini; Juan C. Calvi

    2013-01-01

    This article analyzes the social logics underlying audiovisualconsumption on digital networks. We retrieved some data on the Internet globaltraffic of audiovisual files since 2008 to identify formats, modes of distributionand consumption of audiovisual contents that tend to prevail on the Web. Thisresearch shows the types of social practices which are dominant among usersand its relation to what we designate as “Internet culture”.

  11. EXPLICITATION AND ADDITION TECHNIQUES IN AUDIOVISUAL TRANSLATION: A MULTIMODAL APPROACH OF ENGLISHINDONESIAN SUBTITLES

    Directory of Open Access Journals (Sweden)

    Ichwan Suyudi

    2017-12-01

    Full Text Available In audiovisual translation, the multimodality of the audiovisual text is both a challenge and a resource for subtitlers. This paper illustrates how multi-modes provide information that helps subtitlers to gain a better understanding of meaning-making practices that will influence them to make a decision-making in translating a certain verbal text. Subtitlers may explicit, add, and condense the texts based on the multi-modes as seen on the visual frames. Subtitlers have to consider the distribution and integration of the meanings of multi-modes in order to create comprehensive equivalence between the source and target texts. Excerpts of visual frames in this paper are taken from English films Forrest Gump (drama, 1996, and James Bond (thriller, 2010.

  12. Vicarious audiovisual learning in perfusion education.

    Science.gov (United States)

    Rath, Thomas E; Holt, David W

    2010-12-01

    Perfusion technology is a mechanical and visual science traditionally taught with didactic instruction combined with clinical experience. It is difficult to provide perfusion students the opportunity to experience difficult clinical situations, set up complex perfusion equipment, or observe corrective measures taken during catastrophic events because of patient safety concerns. Although high fidelity simulators offer exciting opportunities for future perfusion training, we explore the use of a less costly low fidelity form of simulation instruction, vicarious audiovisual learning. Two low fidelity modes of instruction; description with text and a vicarious, first person audiovisual production depicting the same content were compared. Students (n = 37) sampled from five North American perfusion schools were prospectively randomized to one of two online learning modules, text or video.These modules described the setup and operation of the MAQUET ROTAFLOW stand-alone centrifugal console and pump. Using a 10 question multiple-choice test, students were assessed immediately after viewing the module (test #1) and then again 2 weeks later (test #2) to determine cognition and recall of the module content. In addition, students completed a questionnaire assessing the learning preferences of today's perfusion student. Mean test scores from test #1 for video learners (n = 18) were significantly higher (88.89%) than for text learners (n = 19) (74.74%), (p audiovisual learning modules may be an efficacious, low cost means of delivering perfusion training on subjects such as equipment setup and operation. Video learning appears to improve cognition and retention of learned content and may play an important role in how we teach perfusion in the future, as simulation technology becomes more prevalent.

  13. Audiovisual Interaction in Time Perception

    Directory of Open Access Journals (Sweden)

    Kuan-Ming Chen

    2011-10-01

    Full Text Available We examined the cross-modal effect of irrelevant sound (or disk on the perceived visual (or auditory duration, and how visual and auditory signals are integrated when perceiving the duration. Participants conducted a duration discrimination task with a 2-Interval-Forced-Choice procedure, with one interval containing the standard duration and the other the comparison duration. In study 1, the standard and comparison durations were either in the same modality or with another modality added. The point-of-subjective-equality and threshold were measured from the psychometric functions. Results showed that sound expanded the perceived visual duration at the intermediate durations but there was no effect of disk on the perceived auditory duration. In study 2, bimodal signals were used in both the standard and comparison durations and the Maximum-Likelihood-Estimation (MLE model was used to predict bimodal performance from the observed unimodal results. The contribution of auditory signals to the bimodal estimate of duration was greater than that predicted by the MLE model, and so was the contribution of visual signals when these signals were temporally informative (ie, looming disks. We propose a hybrid model that considers both the prior bias for auditory signal and the reliability of both auditory and visual signals to explain the results.

  14. Speech and Language Delay

    Science.gov (United States)

    ... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...

  15. Categorization of natural dynamic audiovisual scenes.

    Directory of Open Access Journals (Sweden)

    Olli Rummukainen

    Full Text Available This work analyzed the perceptual attributes of natural dynamic audiovisual scenes. We presented thirty participants with 19 natural scenes in a similarity categorization task, followed by a semi-structured interview. The scenes were reproduced with an immersive audiovisual display. Natural scene perception has been studied mainly with unimodal settings, which have identified motion as one of the most salient attributes related to visual scenes, and sound intensity along with pitch trajectories related to auditory scenes. However, controlled laboratory experiments with natural multimodal stimuli are still scarce. Our results show that humans pay attention to similar perceptual attributes in natural scenes, and a two-dimensional perceptual map of the stimulus scenes and perceptual attributes was obtained in this work. The exploratory results show the amount of movement, perceived noisiness, and eventfulness of the scene to be the most important perceptual attributes in naturalistically reproduced real-world urban environments. We found the scene gist properties openness and expansion to remain as important factors in scenes with no salient auditory or visual events. We propose that the study of scene perception should move forward to understand better the processes behind multimodal scene processing in real-world environments. We publish our stimulus scenes as spherical video recordings and sound field recordings in a publicly available database.

  16. Alterations in audiovisual simultaneity perception in amblyopia.

    Science.gov (United States)

    Richards, Michael D; Goltz, Herbert C; Wong, Agnes M F

    2017-01-01

    Amblyopia is a developmental visual impairment that is increasingly recognized to affect higher-level perceptual and multisensory processes. To further investigate the audiovisual (AV) perceptual impairments associated with this condition, we characterized the temporal interval in which asynchronous auditory and visual stimuli are perceived as simultaneous 50% of the time (i.e., the AV simultaneity window). Adults with unilateral amblyopia (n = 17) and visually normal controls (n = 17) judged the simultaneity of a flash and a click presented with both eyes viewing. The signal onset asynchrony (SOA) varied from 0 ms to 450 ms for auditory-lead and visual-lead conditions. A subset of participants with amblyopia (n = 6) was tested monocularly. Compared to the control group, the auditory-lead side of the AV simultaneity window was widened by 48 ms (36%; p = 0.002), whereas that of the visual-lead side was widened by 86 ms (37%; p = 0.02). The overall mean window width was 500 ms, compared to 366 ms among controls (37% wider; p = 0.002). Among participants with amblyopia, the simultaneity window parameters were unchanged by viewing condition, but subgroup analysis revealed differential effects on the parameters by amblyopia severity, etiology, and foveal suppression status. Possible mechanisms to explain these findings include visual temporal uncertainty, interocular perceptual latency asymmetry, and disruption of normal developmental tuning of sensitivity to audiovisual asynchrony.

  17. Alterations in audiovisual simultaneity perception in amblyopia.

    Directory of Open Access Journals (Sweden)

    Michael D Richards

    Full Text Available Amblyopia is a developmental visual impairment that is increasingly recognized to affect higher-level perceptual and multisensory processes. To further investigate the audiovisual (AV perceptual impairments associated with this condition, we characterized the temporal interval in which asynchronous auditory and visual stimuli are perceived as simultaneous 50% of the time (i.e., the AV simultaneity window. Adults with unilateral amblyopia (n = 17 and visually normal controls (n = 17 judged the simultaneity of a flash and a click presented with both eyes viewing. The signal onset asynchrony (SOA varied from 0 ms to 450 ms for auditory-lead and visual-lead conditions. A subset of participants with amblyopia (n = 6 was tested monocularly. Compared to the control group, the auditory-lead side of the AV simultaneity window was widened by 48 ms (36%; p = 0.002, whereas that of the visual-lead side was widened by 86 ms (37%; p = 0.02. The overall mean window width was 500 ms, compared to 366 ms among controls (37% wider; p = 0.002. Among participants with amblyopia, the simultaneity window parameters were unchanged by viewing condition, but subgroup analysis revealed differential effects on the parameters by amblyopia severity, etiology, and foveal suppression status. Possible mechanisms to explain these findings include visual temporal uncertainty, interocular perceptual latency asymmetry, and disruption of normal developmental tuning of sensitivity to audiovisual asynchrony.

  18. Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

    Directory of Open Access Journals (Sweden)

    Md. Rabiul Islam

    2014-01-01

    Full Text Available The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs and Linear Prediction Cepstral Coefficients (LPCCs are combined to get the audio feature vectors and Active Shape Model (ASM based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features.

  19. Audio-visual assistance in co-creating transition knowledge

    Science.gov (United States)

    Hezel, Bernd; Broschkowski, Ephraim; Kropp, Jürgen P.

    2013-04-01

    Earth system and climate impact research results point to the tremendous ecologic, economic and societal implications of climate change. Specifically people will have to adopt lifestyles that are very different from those they currently strive for in order to mitigate severe changes of our known environment. It will most likely not suffice to transfer the scientific findings into international agreements and appropriate legislation. A transition is rather reliant on pioneers that define new role models, on change agents that mainstream the concept of sufficiency and on narratives that make different futures appealing. In order for the research community to be able to provide sustainable transition pathways that are viable, an integration of the physical constraints and the societal dynamics is needed. Hence the necessary transition knowledge is to be co-created by social and natural science and society. To this end, the Climate Media Factory - in itself a massively transdisciplinary venture - strives to provide an audio-visual connection between the different scientific cultures and a bi-directional link to stake holders and society. Since methodology, particular language and knowledge level of the involved is not the same, we develop new entertaining formats on the basis of a "complexity on demand" approach. They present scientific information in an integrated and entertaining way with different levels of detail that provide entry points to users with different requirements. Two examples shall illustrate the advantages and restrictions of the approach.

  20. Automatic Speech Recognition from Neural Signals: A Focused Review

    Directory of Open Access Journals (Sweden)

    Christian Herff

    2016-09-01

    Full Text Available Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e.~patients suffering from locked-in syndrome. For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people.This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography. As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the emph{Brain-to-text} system.

  1. Audiovisual Interval Size Estimation Is Associated with Early Musical Training.

    Directory of Open Access Journals (Sweden)

    Mary Kathryn Abel

    Full Text Available Although pitch is a fundamental attribute of auditory perception, substantial individual differences exist in our ability to perceive differences in pitch. Little is known about how these individual differences in the auditory modality might affect crossmodal processes such as audiovisual perception. In this study, we asked whether individual differences in pitch perception might affect audiovisual perception, as it relates to age of onset and number of years of musical training. Fifty-seven subjects made subjective ratings of interval size when given point-light displays of audio, visual, and audiovisual stimuli of sung intervals. Audiovisual stimuli were divided into congruent and incongruent (audiovisual-mismatched stimuli. Participants' ratings correlated strongly with interval size in audio-only, visual-only, and audiovisual-congruent conditions. In the audiovisual-incongruent condition, ratings correlated more with audio than with visual stimuli, particularly for subjects who had better pitch perception abilities and higher nonverbal IQ scores. To further investigate the effects of age of onset and length of musical training, subjects were divided into musically trained and untrained groups. Results showed that among subjects with musical training, the degree to which participants' ratings correlated with auditory interval size during incongruent audiovisual perception was correlated with both nonverbal IQ and age of onset of musical training. After partialing out nonverbal IQ, pitch discrimination thresholds were no longer associated with incongruent audio scores, whereas age of onset of musical training remained associated with incongruent audio scores. These findings invite future research on the developmental effects of musical training, particularly those relating to the process of audiovisual perception.

  2. Audiovisual Interval Size Estimation Is Associated with Early Musical Training.

    Science.gov (United States)

    Abel, Mary Kathryn; Li, H Charles; Russo, Frank A; Schlaug, Gottfried; Loui, Psyche

    2016-01-01

    Although pitch is a fundamental attribute of auditory perception, substantial individual differences exist in our ability to perceive differences in pitch. Little is known about how these individual differences in the auditory modality might affect crossmodal processes such as audiovisual perception. In this study, we asked whether individual differences in pitch perception might affect audiovisual perception, as it relates to age of onset and number of years of musical training. Fifty-seven subjects made subjective ratings of interval size when given point-light displays of audio, visual, and audiovisual stimuli of sung intervals. Audiovisual stimuli were divided into congruent and incongruent (audiovisual-mismatched) stimuli. Participants' ratings correlated strongly with interval size in audio-only, visual-only, and audiovisual-congruent conditions. In the audiovisual-incongruent condition, ratings correlated more with audio than with visual stimuli, particularly for subjects who had better pitch perception abilities and higher nonverbal IQ scores. To further investigate the effects of age of onset and length of musical training, subjects were divided into musically trained and untrained groups. Results showed that among subjects with musical training, the degree to which participants' ratings correlated with auditory interval size during incongruent audiovisual perception was correlated with both nonverbal IQ and age of onset of musical training. After partialing out nonverbal IQ, pitch discrimination thresholds were no longer associated with incongruent audio scores, whereas age of onset of musical training remained associated with incongruent audio scores. These findings invite future research on the developmental effects of musical training, particularly those relating to the process of audiovisual perception.

  3. Impact of audio-visual storytelling in simulation learning experiences of undergraduate nursing students.

    Science.gov (United States)

    Johnston, Sandra; Parker, Christina N; Fox, Amanda

    2017-09-01

    Use of high fidelity simulation has become increasingly popular in nursing education to the extent that it is now an integral component of most nursing programs. Anecdotal evidence suggests that students have difficulty engaging with simulation manikins due to their unrealistic appearance. Introduction of the manikin as a 'real patient' with the use of an audio-visual narrative may engage students in the simulated learning experience and impact on their learning. A paucity of literature currently exists on the use of audio-visual narratives to enhance simulated learning experiences. This study aimed to determine if viewing an audio-visual narrative during a simulation pre-brief altered undergraduate nursing student perceptions of the learning experience. A quasi-experimental post-test design was utilised. A convenience sample of final year baccalaureate nursing students at a large metropolitan university. Participants completed a modified version of the Student Satisfaction with Simulation Experiences survey. This 12-item questionnaire contained questions relating to the ability to transfer skills learned in simulation to the real clinical world, the realism of the simulation and the overall value of the learning experience. Descriptive statistics were used to summarise demographic information. Two tailed, independent group t-tests were used to determine statistical differences within the categories. Findings indicated that students reported high levels of value, realism and transferability in relation to the viewing of an audio-visual narrative. Statistically significant results (t=2.38, psimulation to clinical practice. The subgroups of age and gender although not significant indicated some interesting results. High satisfaction with simulation was indicated by all students in relation to value and realism. There was a significant finding in relation to transferability on knowledge and this is vital to quality educational outcomes. Copyright © 2017. Published by

  4. The neural processing of foreign-accented speech and its relationship to listener bias

    Directory of Open Access Journals (Sweden)

    Han-Gyol eYi

    2014-10-01

    Full Text Available Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners’ perceptual ability to process Korean-accented English audiovisual speech (Yi et al., 2013. Here, we examine the neural mechanism underlying the influence of listener bias to foreign faces on speech perception. In a functional magnetic resonance imaging (fMRI study, native English speakers listened to native- and Korean-accented English sentences, with or without faces. The participants’ Asian-foreign association was measured using an implicit association test (IAT, conducted outside the scanner. We found that foreign-accented speech evoked greater activity in the bilateral primary auditory cortices and the inferior frontal gyri, potentially reflecting greater computational demand. Higher IAT scores, indicating greater bias, were associated with increased BOLD response to foreign-accented speech with faces in the primary auditory cortex, the early node for spectrotemporal analysis. We conclude the following: (1 foreign-accented speech perception places greater demand on the neural systems underlying speech perception; (2 face of the talker can exaggerate the perceived foreignness of foreign-accented speech; (3 implicit Asian-foreign association is associated with decreased neural efficiency in early spectrotemporal processing.

  5. Neuronal basis of speech comprehension.

    Science.gov (United States)

    Specht, Karsten

    2014-01-01

    Verbal communication does not rely only on the simple perception of auditory signals. It is rather a parallel and integrative processing of linguistic and non-linguistic information, involving temporal and frontal areas in particular. This review describes the inherent complexity of auditory speech comprehension from a functional-neuroanatomical perspective. The review is divided into two parts. In the first part, structural and functional asymmetry of language relevant structures will be discus. The second part of the review will discuss recent neuroimaging studies, which coherently demonstrate that speech comprehension processes rely on a hierarchical network involving the temporal, parietal, and frontal lobes. Further, the results support the dual-stream model for speech comprehension, with a dorsal stream for auditory-motor integration, and a ventral stream for extracting meaning but also the processing of sentences and narratives. Specific patterns of functional asymmetry between the left and right hemisphere can also be demonstrated. The review article concludes with a discussion on interactions between the dorsal and ventral streams, particularly the involvement of motor related areas in speech perception processes, and outlines some remaining unresolved issues. This article is part of a Special Issue entitled Human Auditory Neuroimaging. Copyright © 2013 Elsevier B.V. All rights reserved.

  6. Sensorimotor Representation of Speech Perception. Cross-Decoding of Place of Articulation Features during Selective Attention to Syllables in 7T fMRI

    NARCIS (Netherlands)

    Archila-Meléndez, Mario E.; Valente, Giancarlo; Correia, Joao M.; Rouhl, Rob P. W.; van Kranen-Mastenbroek, Vivianne H.; Jansma, Bernadette M.

    2018-01-01

    Sensorimotor integration, the translation between acoustic signals and motoric programs, may constitute a crucial mechanism for speech. During speech perception, the acoustic-motoric translations include the recruitment of cortical areas for the representation of speech articulatory features, such

  7. Audiovisual interpretative skills: between textual culture and formalized literacy

    Directory of Open Access Journals (Sweden)

    Estefanía Jiménez, Ph. D.

    2010-01-01

    Full Text Available This paper presents the results of a study on the process of acquiring interpretative skills to decode audiovisual texts among adolescents and youth. Based on the conception of such competence as the ability to understand the meanings connoted beneath the literal discourses of audiovisual texts, this study compared two variables: the acquisition of such skills from the personal and social experience in the consumption of audiovisual products (which is affected by age difference, and, on the second hand, the differences marked by the existence of formalized processes of media literacy. Based on focus groups of young students, the research assesses the existing academic debate about these processes of acquiring skills to interpret audiovisual materials.

  8. Exposure to audiovisual programs as sources of authentic language ...

    African Journals Online (AJOL)

    Exposure to audiovisual programs as sources of authentic language input and second ... Southern African Linguistics and Applied Language Studies ... The findings of the present research contribute more insights on the type and amount of ...

  9. On-line repository of audiovisual material feminist research methodology

    Directory of Open Access Journals (Sweden)

    Lena Prado

    2014-12-01

    Full Text Available This paper includes a collection of audiovisual material available in the repository of the Interdisciplinary Seminar of Feminist Research Methodology SIMReF (http://www.simref.net.

  10. Proper Use of Audio-Visual Aids: Essential for Educators.

    Science.gov (United States)

    Dejardin, Conrad

    1989-01-01

    Criticizes educators as the worst users of audio-visual aids and among the worst public speakers. Offers guidelines for the proper use of an overhead projector and the development of transparencies. (DMM)

  11. An Instrumented Glove for Control Audiovisual Elements in Performing Arts

    Directory of Open Access Journals (Sweden)

    Rafael Tavares

    2018-02-01

    Full Text Available The use of cutting-edge technologies such as wearable devices to control reactive audiovisual systems are rarely applied in more conventional stage performances, such as opera performances. This work reports a cross-disciplinary approach for the research and development of the WMTSensorGlove, a data-glove used in an opera performance to control audiovisual elements on stage through gestural movements. A system architecture of the interaction between the wireless wearable device and the different audiovisual systems is presented, taking advantage of the Open Sound Control (OSC protocol. The developed wearable system was used as audiovisual controller in “As sete mulheres de Jeremias Epicentro”, a portuguese opera by Quarteto Contratempus, which was premiered in September 2017.

  12. Audiovisual consumption and its social logics on the web

    Directory of Open Access Journals (Sweden)

    Rose Marie Santini

    2013-06-01

    Full Text Available This article analyzes the social logics underlying audiovisualconsumption on digital networks. We retrieved some data on the Internet globaltraffic of audiovisual files since 2008 to identify formats, modes of distributionand consumption of audiovisual contents that tend to prevail on the Web. Thisresearch shows the types of social practices which are dominant among usersand its relation to what we designate as “Internet culture”.

  13. Narrativa audiovisual. Estrategias y recursos [Reseña

    OpenAIRE

    Cuenca Jaramillo, María Dolores

    2011-01-01

    Reseña del libro "Narrativa audiovisual. Estrategias y recursos" de Fernando Canet y Josep Prósper. Cuenca Jaramillo, MD. (2011). Narrativa audiovisual. Estrategias y recursos [Reseña]. Vivat Academia. Revista de Comunicación. Año XIV(117):125-130. http://hdl.handle.net/10251/46210 Senia 125 130 Año XIV 117

  14. [Audio-visual communication in the history of psychiatry].

    Science.gov (United States)

    Farina, B; Remoli, V; Russo, F

    1993-12-01

    The authors analyse the evolution of visual communication in the history of psychiatry. From the 18th century oil paintings to the first dagherrotic prints until the cinematography and the modern audiovisual systems they observed an increasing diffusion of the new communication techniques in psychiatry, and described the use of the different techniques in psychiatric practice. The article ends with a brief review of the current applications of the audiovisual in therapy, training, teaching, and research.

  15. Plan empresa productora de audiovisuales : La Central Audiovisual y Publicidad

    OpenAIRE

    Arroyave Velasquez, Alejandro

    2015-01-01

    El presente documento corresponde al plan de creación de empresa La Central Publicidad y Audiovisual, una empresa dedicada a la pre-producción, producción y post-producción de material de tipo audiovisual. La empresa estará ubicada en la ciudad de Cali y tiene como mercado objetivo atender los diferentes tipos de empresas de la ciudad, entre las cuales se encuentran las pequeñas, medianas y grandes empresas.

  16. Audiovisual Association Learning in the Absence of Primary Visual Cortex

    OpenAIRE

    Seirafi, Mehrdad; De Weerd, Peter; Pegna, Alan J.; de Gelder, Beatrice

    2016-01-01

    Learning audiovisual associations is mediated by the primary cortical areas; however, recent animal studies suggest that such learning can take place even in the absence of the primary visual cortex. Other studies have demonstrated the involvement of extra-geniculate pathways and especially the superior colliculus (SC) in audiovisual association learning. Here, we investigated such learning in a rare human patient with complete loss of the bilateral striate cortex. We carried out an implicit ...

  17. THE ONTOGENESIS OF SPEECH DEVELOPMENT

    Directory of Open Access Journals (Sweden)

    T. E. Braudo

    2017-01-01

    Full Text Available The purpose of this article is to acquaint the specialists, working with children having developmental disorders, with age-related norms for speech development. Many well-known linguists and psychologists studied speech ontogenesis (logogenesis. Speech is a higher mental function, which integrates many functional systems. Speech development in infants during the first months after birth is ensured by the innate hearing and emerging ability to fix the gaze on the face of an adult. Innate emotional reactions are also being developed during this period, turning into nonverbal forms of communication. At about 6 months a baby starts to pronounce some syllables; at 7–9 months – repeats various sounds combinations, pronounced by adults. At 10–11 months a baby begins to react on the words, referred to him/her. The first words usually appear at an age of 1 year; this is the start of the stage of active speech development. At this time it is acceptable, if a child confuses or rearranges sounds, distorts or misses them. By the age of 1.5 years a child begins to understand abstract explanations of adults. Significant vocabulary enlargement occurs between 2 and 3 years; grammatical structures of the language are being formed during this period (a child starts to use phrases and sentences. Preschool age (3–7 y. o. is characterized by incorrect, but steadily improving pronunciation of sounds and phonemic perception. The vocabulary increases; abstract speech and retelling are being formed. Children over 7 y. o. continue to improve grammar, writing and reading skills. The described stages may not have strict age boundaries, as soon as they are dependent not only on environment, but also on the child’s mental constitution, heredity and character.

  18. Cortical activity patterns predict robust speech discrimination ability in noise

    Science.gov (United States)

    Shetake, Jai A.; Wolf, Jordan T.; Cheung, Ryan J.; Engineer, Crystal T.; Ram, Satyananda K.; Kilgard, Michael P.

    2012-01-01

    The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem. PMID:22098331

  19. Neurophysiological evidence for the interplay of speech segmentation and word-referent mapping during novel word learning.

    Science.gov (United States)

    François, Clément; Cunillera, Toni; Garcia, Enara; Laine, Matti; Rodriguez-Fornells, Antoni

    2017-04-01

    Learning a new language requires the identification of word units from continuous speech (the speech segmentation problem) and mapping them onto conceptual representation (the word to world mapping problem). Recent behavioral studies have revealed that the statistical properties found within and across modalities can serve as cues for both processes. However, segmentation and mapping have been largely studied separately, and thus it remains unclear whether both processes can be accomplished at the same time and if they share common neurophysiological features. To address this question, we recorded EEG of 20 adult participants during both an audio alone speech segmentation task and an audiovisual word-to-picture association task. The participants were tested for both the implicit detection of online mismatches (structural auditory and visual semantic violations) as well as for the explicit recognition of words and word-to-picture associations. The ERP results from the learning phase revealed a delayed learning-related fronto-central negativity (FN400) in the audiovisual condition compared to the audio alone condition. Interestingly, while online structural auditory violations elicited clear MMN/N200 components in the audio alone condition, visual-semantic violations induced meaning-related N400 modulations in the audiovisual condition. The present results support the idea that speech segmentation and meaning mapping can take place in parallel and act in synergy to enhance novel word learning. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Teleconferences and Audiovisual Materials in Earth Science Education

    Science.gov (United States)

    Cortina, L. M.

    2007-05-01

    Unidad de Educacion Continua y a Distancia, Universidad Nacional Autonoma de Mexico, Coyoaca 04510 Mexico, MEXICO As stated in the special session description, 21st century undergraduate education has access to resources/experiences that go beyond university classrooms. However in some cases, resources may go largely unused and a number of factors may be cited such as logistic problems, restricted internet and telecommunication service access, miss-information, etc. We present and comment on our efforts and experiences at the National University of Mexico in a new unit dedicated to teleconferences and audio-visual materials. The unit forms part of the geosciences institutes, located in the central UNAM campus and campuses in other States. The use of teleconference in formal graduate and undergraduate education allows teachers and lecturers to distribute course material as in classrooms. Course by teleconference requires learning and student and teacher effort without physical contact, but they have access to multimedia available to support their exhibition. Well selected multimedia material allows the students to identify and recognize digital information to aid understanding natural phenomena integral to Earth Sciences. Cooperation with international partnerships providing access to new materials and experiences and to field practices will greatly add to our efforts. We will present specific examples of the experiences that we have at the Earth Sciences Postgraduate Program of UNAM with the use of technology in the education in geosciences.

  1. Speech and Communication Disorders

    Science.gov (United States)

    ... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...

  2. 36 CFR 1237.14 - What are the additional scheduling requirements for audiovisual, cartographic, and related records?

    Science.gov (United States)

    2010-07-01

    ... scheduling requirements for audiovisual, cartographic, and related records? 1237.14 Section 1237.14 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL... audiovisual, cartographic, and related records? The disposition instructions should also provide that...

  3. Multisensory integration of emotional faces and voices in schizophrenics

    NARCIS (Netherlands)

    Gelder, B. de; Vroomen, J.H.M.; Jong, S. de; Masthoff, E.D.M.; Trompenaars, F.J.; Hodiamont, P.P.G.

    2005-01-01

    In their natural environment, organisms receive information through multiple sensory channels and these inputs from different sensory systems are routinely combined into integrated percepts. Previously, we reported that in a population of schizophrenics, deficits in audiovisual integration were

  4. Multisensory integration of emotional faces and voices in schizophrenics

    NARCIS (Netherlands)

    Gelder, B. de; Vroomen, J.H.M.; Jong, S.J. de; Masthoff, E.D.M.; Trompenaars, F.J.; Hodiamont, P.P.G.

    2005-01-01

    their natural environment, organisms receive information through multiple sensory channels and these inputs from different sensory systems are routinely combined into integrated percepts. Previously, we reported that in a population of schizophrenics, deficits in audiovisual integration were

  5. Free Speech Yearbook 1978.

    Science.gov (United States)

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  6. Audiovisual Modulation in Mouse Primary Visual Cortex Depends on Cross-Modal Stimulus Configuration and Congruency.

    Science.gov (United States)

    Meijer, Guido T; Montijn, Jorrit S; Pennartz, Cyriel M A; Lansink, Carien S

    2017-09-06

    The sensory neocortex is a highly connected associative network that integrates information from multiple senses, even at the level of the primary sensory areas. Although a growing body of empirical evidence supports this view, the neural mechanisms of cross-modal integration in primary sensory areas, such as the primary visual cortex (V1), are still largely unknown. Using two-photon calcium imaging in awake mice, we show that the encoding of audiovisual stimuli in V1 neuronal populations is highly dependent on the features of the stimulus constituents. When the visual and auditory stimulus features were modulated at the same rate (i.e., temporally congruent), neurons responded with either an enhancement or suppression compared with unisensory visual stimuli, and their prevalence was balanced. Temporally incongruent tones or white-noise bursts included in audiovisual stimulus pairs resulted in predominant response suppression across the neuronal population. Visual contrast did not influence multisensory processing when the audiovisual stimulus pairs were congruent; however, when white-noise bursts were used, neurons generally showed response suppression when the visual stimulus contrast was high whereas this effect was absent when the visual contrast was low. Furthermore, a small fraction of V1 neurons, predominantly those located near the lateral border of V1, responded to sound alone. These results show that V1 is involved in the encoding of cross-modal interactions in a more versatile way than previously thought. SIGNIFICANCE STATEMENT The neural substrate of cross-modal integration is not limited to specialized cortical association areas but extends to primary sensory areas. Using two-photon imaging of large groups of neurons, we show that multisensory modulation of V1 populations is strongly determined by the individual and shared features of cross-modal stimulus constituents, such as contrast, frequency, congruency, and temporal structure. Congruent

  7. Integrating Clinical Neuropsychology into the Undergraduate Curriculum.

    Science.gov (United States)

    Puente, Antonio E.; And Others

    1991-01-01

    Claims little information exists in undergraduate education about clinical neuropsychology. Outlines an undergraduate neuropsychology course and proposes ways to integrate the subject into existing undergraduate psychology courses. Suggests developing specialized audio-visual materials for telecourses or existing courses. (NL)

  8. The Functional Connectome of Speech Control.

    Directory of Open Access Journals (Sweden)

    Stefan Fuertinger

    2015-07-01

    Full Text Available In the past few years, several studies have been directed to understanding the complexity of functional interactions between different brain regions during various human behaviors. Among these, neuroimaging research installed the notion that speech and language require an orchestration of brain regions for comprehension, planning, and integration of a heard sound with a spoken word. However, these studies have been largely limited to mapping the neural correlates of separate speech elements and examining distinct cortical or subcortical circuits involved in different aspects of speech control. As a result, the complexity of the brain network machinery controlling speech and language remained largely unknown. Using graph theoretical analysis of functional MRI (fMRI data in healthy subjects, we quantified the large-scale speech network topology by constructing functional brain networks of increasing hierarchy from the resting state to motor output of meaningless syllables to complex production of real-life speech as well as compared to non-speech-related sequential finger tapping and pure tone discrimination networks. We identified a segregated network of highly connected local neural communities (hubs in the primary sensorimotor and parietal regions, which formed a commonly shared core hub network across the examined conditions, with the left area 4p playing an important role in speech network organization. These sensorimotor core hubs exhibited features of flexible hubs based on their participation in several functional domains across different networks and ability to adaptively switch long-range functional connectivity depending on task content, resulting in a distinct community structure of each examined network. Specifically, compared to other tasks, speech production was characterized by the formation of six distinct neural communities with specialized recruitment of the prefrontal cortex, insula, putamen, and thalamus, which collectively

  9. Inner Speech: Development, Cognitive Functions, Phenomenology, and Neurobiology

    Science.gov (United States)

    2015-01-01

    Inner speech—also known as covert speech or verbal thinking—has been implicated in theories of cognitive development, speech monitoring, executive function, and psychopathology. Despite a growing body of knowledge on its phenomenology, development, and function, approaches to the scientific study of inner speech have remained diffuse and largely unintegrated. This review examines prominent theoretical approaches to inner speech and methodological challenges in its study, before reviewing current evidence on inner speech in children and adults from both typical and atypical populations. We conclude by considering prospects for an integrated cognitive science of inner speech, and present a multicomponent model of the phenomenon informed by developmental, cognitive, and psycholinguistic considerations. Despite its variability among individuals and across the life span, inner speech appears to perform significant functions in human cognition, which in some cases reflect its developmental origins and its sharing of resources with other cognitive processes. PMID:26011789

  10. Dissociated roles of the inferior frontal gyrus and superior temporal sulcus in audiovisual processing: top-down and bottom-up mismatch detection.

    Science.gov (United States)

    Uno, Takeshi; Kawai, Kensuke; Sakai, Katsuyuki; Wakebe, Toshihiro; Ibaraki, Takuya; Kunii, Naoto; Matsuo, Takeshi; Saito, Nobuhito

    2015-01-01

    Visual inputs can distort auditory perception, and accurate auditory processing requires the ability to detect and ignore visual input that is simultaneous and incongruent with auditory information. However, the neural basis of this auditory selection from audiovisual information is unknown, whereas integration process of audiovisual inputs is intensively researched. Here, we tested the hypothesis that the inferior frontal gyrus (IFG) and superior temporal sulcus (STS) are involved in top-down and bottom-up processing, respectively, of target auditory information from audiovisual inputs. We recorded high gamma activity (HGA), which is associated with neuronal firing in local brain regions, using electrocorticography while patients with epilepsy judged the syllable spoken by a voice while looking at a voice-congruent or -incongruent lip movement from the speaker. The STS exhibited stronger HGA if the patient was presented with information of large audiovisual incongruence than of small incongruence, especially if the auditory information was correctly identified. On the other hand, the IFG exhibited stronger HGA in trials with small audiovisual incongruence when patients correctly perceived the auditory information than when patients incorrectly perceived the auditory information due to the mismatched visual information. These results indicate that the IFG and STS have dissociated roles in selective auditory processing, and suggest that the neural basis of selective auditory processing changes dynamically in accordance with the degree of incongruity between auditory and visual information.

  11. Audio-visual feedback improves the BCI performance in the navigational control of a humanoid robot

    Directory of Open Access Journals (Sweden)

    Emmanuele eTidoni

    2014-06-01

    Full Text Available Advancement in brain computer interfaces (BCI technology allows people to actively interact in the world through surrogates. Controlling real humanoid robots using BCI as intuitively as we control our body represents a challenge for current research in robotics and neuroscience. In order to successfully interact with the environment the brain integrates multiple sensory cues to form a coherent representation of the world. Cognitive neuroscience studies demonstrate that multisensory integration may imply a gain with respect to a single modality and ultimately improve the overall sensorimotor performance. For example, reactivity to simultaneous visual and auditory stimuli may be higher than to the sum of the same stimuli delivered in isolation or in temporal sequence. Yet, knowledge about whether audio-visual integration may improve the control of a surrogate is meager. To explore this issue, we provided human footstep sounds as audio feedback to BCI users while controlling a humanoid robot. Participants were asked to steer their robot surrogate and perform a pick-and-place task through BCI-SSVEPs. We found that audio-visual synchrony between footsteps sound and actual humanoid’s walk reduces the time required for steering the robot. Thus, auditory feedback congruent with the humanoid actions may improve motor decisions of the BCI’s user and help in the feeling of control over it. Our results shed light on the possibility to increase robot’s control through the combination of multisensory feedback to a BCI user.

  12. Internet Video Telephony Allows Speech Reading by Deaf Individuals and Improves Speech Perception by Cochlear Implant Users

    Science.gov (United States)

    Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D.; Senn, Pascal

    2013-01-01

    Objective To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Methods Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280×720, 640×480, 320×240, 160×120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0–500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Results Higher frame rate (>7 fps), higher camera resolution (>640×480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Conclusion Webcameras have the potential to improve telecommunication of hearing-impaired individuals. PMID:23359119

  13. Speech in spinocerebellar ataxia.

    Science.gov (United States)

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. Interdependent processing and encoding of speech and concurrent background noise.

    Science.gov (United States)

    Cooper, Angela; Brouwer, Susanne; Bradlow, Ann R

    2015-05-01

    Speech processing can often take place in adverse listening conditions that involve the mixing of speech and background noise. In this study, we investigated processing dependencies between background noise and indexical speech features, using a speeded classification paradigm (Garner, 1974; Exp. 1), and whether background noise is encoded and represented in memory for spoken words in a continuous recognition memory paradigm (Exp. 2). Whether or not the noise spectrally overlapped with the speech signal was also manipulated. The results of Experiment 1 indicated that background noise and indexical features of speech (gender, talker identity) cannot be completely segregated during processing, even when the two auditory streams are spectrally nonoverlapping. Perceptual interference was asymmetric, whereby irrelevant indexical feature variation in the speech signal slowed noise classification to a greater extent than irrelevant noise variation slowed speech classification. This asymmetry may stem from the fact that speech features have greater functional relevance to listeners, and are thus more difficult to selectively ignore than background noise. Experiment 2 revealed that a recognition cost for words embedded in different types of background noise on the first and second occurrences only emerged when the noise and the speech signal were spectrally overlapping. Together, these data suggest integral processing of speech and background noise, modulated by the level of processing and the spectral separation of the speech and noise.

  15. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  16. Memory and learning with rapid audiovisual sequences

    Science.gov (United States)

    Keller, Arielle S.; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193

  17. Memory and learning with rapid audiovisual sequences.

    Science.gov (United States)

    Keller, Arielle S; Sekuler, Robert

    2015-01-01

    We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.

  18. Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech.

    Science.gov (United States)

    Broderick, Michael P; Anderson, Andrew J; Di Liberto, Giovanni M; Crosse, Michael J; Lalor, Edmund C

    2018-03-05

    People routinely hear and understand speech at rates of 120-200 words per minute [1, 2]. Thus, speech comprehension must involve rapid, online neural mechanisms that process words' meanings in an approximately time-locked fashion. However, electrophysiological evidence for such time-locked processing has been lacking for continuous speech. Although valuable insights into semantic processing have been provided by the "N400 component" of the event-related potential [3-6], this literature has been dominated by paradigms using incongruous words within specially constructed sentences, with less emphasis on natural, narrative speech comprehension. Building on the discovery that cortical activity "tracks" the dynamics of running speech [7-9] and psycholinguistic work demonstrating [10-12] and modeling [13-15] how context impacts on word processing, we describe a new approach for deriving an electrophysiological correlate of natural speech comprehension. We used a computational model [16] to quantify the meaning carried by words based on how semantically dissimilar they were to their preceding context and then regressed this measure against electroencephalographic (EEG) data recorded from subjects as they listened to narrative speech. This produced a prominent negativity at a time lag of 200-600 ms on centro-parietal EEG channels, characteristics common to the N400. Applying this approach to EEG datasets involving time-reversed speech, cocktail party attention, and audiovisual speech-in-noise demonstrated that this response was very sensitive to whether or not subjects understood the speech they heard. These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion. Copyright © 2018 Elsevier Ltd. All rights reserved.

  19. Gesture facilitates the syntactic analysis of speech

    Directory of Open Access Journals (Sweden)

    Henning eHolle

    2012-03-01

    Full Text Available Recent research suggests that the brain routinely binds together information from gesture and speech. However, most of this research focused on the integration of representational gestures with the semantic content of speech. Much less is known about how other aspects of gesture, such as emphasis, influence the interpretation of the syntactic relations in a spoken message. Here, we investigated whether beat gestures alter which syntactic structure is assigned to ambiguous spoken German sentences. The P600 component of the Event Related Brain Potential indicated that the more complex syntactic structure is easier to process when the speaker emphasizes the subject of a sentence with a beat. Thus, a simple flick of the hand can change our interpretation of who has been doing what to whom in a spoken sentence. We conclude that gestures and speech are an integrated system. Unlike previous studies, which have shown that the brain effortlessly integrates semantic information from gesture and speech, our study is the first to demonstrate that this integration also occurs for syntactic information. Moreover, the effect appears to be gesture-specific and was not found for other stimuli that draw attention to certain parts of speech, including prosodic emphasis, or a moving visual stimulus with the same trajectory as the gesture. This suggests that only visual emphasis produced with a communicative intention in mind (that is, beat gestures influences language comprehension, but not a simple visual movement lacking such an intention.

  20. Challenges and opportunities for audiovisual diversity in the Internet

    Directory of Open Access Journals (Sweden)

    Trinidad García Leiva

    2017-06-01

    Full Text Available http://dx.doi.org/10.5007/2175-7984.2017v16n35p132 At the gates of the first quarter of the XXI century, nobody doubts the fact that the value chain of the audiovisual industry has suffered important transformations. The digital era presents opportunities for cultural enrichment as well as displays new challenges. After presenting a general portray of the audiovisual industries in the digital era, taking as a point of departure the Spanish case and paying attention to players and logics in tension, this paper will present some notes about the advantages and disadvantages that exist for the diversity of audiovisual production, distribution and consumption online. It is here sustained that the diversity of the audiovisual sector online is not guaranteed because the formula that has made some players successful and powerful is based on walled-garden models to monetize contents (which, besides, add restrictions to their reproduction and circulation by and among consumers. The final objective is to present some ideas about the elements that prevent the strengthening of the diversity of the audiovisual industry in the digital scenario. Barriers to overcome are classified as technological, financial, social, legal and political.

  1. The production of audiovisual teaching tools in minimally invasive surgery.

    Science.gov (United States)

    Tolerton, Sarah K; Hugh, Thomas J; Cosman, Peter H

    2012-01-01

    Audiovisual learning resources have become valuable adjuncts to formal teaching in surgical training. This report discusses the process and challenges of preparing an audiovisual teaching tool for laparoscopic cholecystectomy. The relative value in surgical education and training, for both the creator and viewer are addressed. This audiovisual teaching resource was prepared as part of the Master of Surgery program at the University of Sydney, Australia. The different methods of video production used to create operative teaching tools are discussed. Collating and editing material for an audiovisual teaching resource can be a time-consuming and technically challenging process. However, quality learning resources can now be produced even with limited prior video editing experience. With minimal cost and suitable guidance to ensure clinically relevant content, most surgeons should be able to produce short, high-quality education videos of both open and minimally invasive surgery. Despite the challenges faced during production of audiovisual teaching tools, these resources are now relatively easy to produce using readily available software. These resources are particularly attractive to surgical trainees when real time operative footage is used. They serve as valuable adjuncts to formal teaching, particularly in the setting of minimally invasive surgery. Copyright © 2012 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.

  2. Parametric packet-based audiovisual quality model for IPTV services

    CERN Document Server

    Garcia, Marie-Neige

    2014-01-01

    This volume presents a parametric packet-based audiovisual quality model for Internet Protocol TeleVision (IPTV) services. The model is composed of three quality modules for the respective audio, video and audiovisual components. The audio and video quality modules take as input a parametric description of the audiovisual processing path, and deliver an estimate of the audio and video quality. These outputs are sent to the audiovisual quality module which provides an estimate of the audiovisual quality. Estimates of perceived quality are typically used both in the network planning phase and as part of the quality monitoring. The same audio quality model is used for both these phases, while two variants of the video quality model have been developed for addressing the two application scenarios. The addressed packetization scheme is MPEG2 Transport Stream over Real-time Transport Protocol over Internet Protocol. In the case of quality monitoring, that is the case for which the network is already set-up, the aud...

  3. Audiovisual Association Learning in the Absence of Primary Visual Cortex.

    Science.gov (United States)

    Seirafi, Mehrdad; De Weerd, Peter; Pegna, Alan J; de Gelder, Beatrice

    2015-01-01

    Learning audiovisual associations is mediated by the primary cortical areas; however, recent animal studies suggest that such learning can take place even in the absence of the primary visual cortex. Other studies have demonstrated the involvement of extra-geniculate pathways and especially the superior colliculus (SC) in audiovisual association learning. Here, we investigated such learning in a rare human patient with complete loss of the bilateral striate cortex. We carried out an implicit audiovisual association learning task with two different colors of red and purple (the latter color known to minimally activate the extra-genicular pathway). Interestingly, the patient learned the association between an auditory cue and a visual stimulus only when the unseen visual stimulus was red, but not when it was purple. The current study presents the first evidence showing the possibility of audiovisual association learning in humans with lesioned striate cortex. Furthermore, in line with animal studies, it supports an important role for the SC in audiovisual associative learning.

  4. The Fungible Audio-Visual Mapping and its Experience

    Directory of Open Access Journals (Sweden)

    Adriana Sa

    2014-12-01

    Full Text Available This article draws a perceptual approach to audio-visual mapping. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping. Our aim here is to glean its constitution and aspect. We will report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, after which we pose quantitative and qualitative questions regarding their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole. 

  5. Media Aid Beyond the Factual: Culture, Development, and Audiovisual Assistance

    Directory of Open Access Journals (Sweden)

    Benjamin A. J. Pearson

    2015-01-01

    Full Text Available This paper discusses audiovisual assistance, a form of development aid that focuses on the production and distribution of cultural and entertainment media such as fictional films and TV shows. While the first audiovisual assistance program dates back to UNESCO’s International Fund for the Promotion of Culture in the 1970s, the past two decades have seen a proliferation of audiovisual assistance that, I argue, is related to a growing concern for culture in post-2015 global development agendas. In this paper, I examine the aims and motivations behind the EU’s audiovisual assistance programs to countries in the Global South, using data from policy documents and semi-structured, in-depth interviews with Program Managers and administrative staff in Brussels. These programs prioritize forms of audiovisual content that are locally specific, yet globally tradable. Furthermore, I argue that they have an ambivalent relationship with traditional notions of international development, one that conceptualizes media not only as a means to achieve economic development and human rights aims, but as a form of development itself.

  6. A Network Model of Observation and Imitation of Speech

    Science.gov (United States)

    Mashal, Nira; Solodkin, Ana; Dick, Anthony Steven; Chen, E. Elinor; Small, Steven L.

    2012-01-01

    Much evidence has now accumulated demonstrating and quantifying the extent of shared regional brain activation for observation and execution of speech. However, the nature of the actual networks that implement these functions, i.e., both the brain regions and the connections among them, and the similarities and differences across these networks has not been elucidated. The current study aims to characterize formally a network for observation and imitation of syllables in the healthy adult brain and to compare their structure and effective connectivity. Eleven healthy participants observed or imitated audiovisual syllables spoken by a human actor. We constructed four structural equation models to characterize the networks for observation and imitation in each of the two hemispheres. Our results show that the network models for observation and imitation comprise the same essential structure but differ in important ways from each other (in both hemispheres) based on connectivity. In particular, our results show that the connections from posterior superior temporal gyrus and sulcus to ventral premotor, ventral premotor to dorsal premotor, and dorsal premotor to primary motor cortex in the left hemisphere are stronger during imitation than during observation. The first two connections are implicated in a putative dorsal stream of speech perception, thought to involve translating auditory speech signals into motor representations. Thus, the current results suggest that flow of information during imitation, starting at the posterior superior temporal cortex and ending in the motor cortex, enhances input to the motor cortex in the service of speech execution. PMID:22470360

  7. Documentary management of the sport audio-visual information in the generalist televisions

    OpenAIRE

    Jorge Caldera Serrano; Felipe Alonso

    2007-01-01

    The management of the sport audio-visual documentation of the Information Systems of the state, zonal and local chains is analyzed within the framework. For it it is made makes a route by the documentary chain that makes the sport audio-visual information with the purpose of being analyzing each one of the parameters, showing therefore a series of recommendations and norms for the preparation of the sport audio-visual registry. Evidently the audio-visual sport documentation difference i...

  8. 36 CFR 1237.26 - What materials and processes must agencies use to create audiovisual records?

    Science.gov (United States)

    2010-07-01

    ... must agencies use to create audiovisual records? 1237.26 Section 1237.26 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC, AND RELATED RECORDS MANAGEMENT § 1237.26 What materials and processes must agencies use to create audiovisual...

  9. 36 CFR 1237.20 - What are special considerations in the maintenance of audiovisual records?

    Science.gov (United States)

    2010-07-01

    ... considerations in the maintenance of audiovisual records? 1237.20 Section 1237.20 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC, AND RELATED RECORDS MANAGEMENT § 1237.20 What are special considerations in the maintenance of audiovisual...

  10. 36 CFR 1237.18 - What are the environmental standards for audiovisual records storage?

    Science.gov (United States)

    2010-07-01

    ... standards for audiovisual records storage? 1237.18 Section 1237.18 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC, AND RELATED RECORDS MANAGEMENT § 1237.18 What are the environmental standards for audiovisual records storage? (a...

  11. 77 FR 22803 - Certain Audiovisual Components and Products Containing the Same; Institution of Investigation...

    Science.gov (United States)

    2012-04-17

    ... INTERNATIONAL TRADE COMMISSION [Inv. No. 337-TA-837] Certain Audiovisual Components and Products... importation of certain audiovisual components and products containing the same by reason of infringement of... importation, or the sale within the United States after importation of certain audiovisual components and...

  12. 36 CFR 1237.16 - How do agencies store audiovisual records?

    Science.gov (United States)

    2010-07-01

    ... audiovisual records? 1237.16 Section 1237.16 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC, AND RELATED RECORDS MANAGEMENT § 1237.16 How do agencies store audiovisual records? Agencies must maintain appropriate storage conditions for permanent...

  13. 36 CFR 1237.10 - How must agencies manage their audiovisual, cartographic, and related records?

    Science.gov (United States)

    2010-07-01

    ... their audiovisual, cartographic, and related records? 1237.10 Section 1237.10 Parks, Forests, and Public Property NATIONAL ARCHIVES AND RECORDS ADMINISTRATION RECORDS MANAGEMENT AUDIOVISUAL, CARTOGRAPHIC, AND RELATED RECORDS MANAGEMENT § 1237.10 How must agencies manage their audiovisual, cartographic, and related...

  14. Finding the Correspondence of Audio-Visual Events by Object Manipulation

    Science.gov (United States)

    Nishibori, Kento; Takeuchi, Yoshinori; Matsumoto, Tetsuya; Kudo, Hiroaki; Ohnishi, Noboru

    A human being understands the objects in the environment by integrating information obtained by the senses of sight, hearing and touch. In this integration, active manipulation of objects plays an important role. We propose a method for finding the correspondence of audio-visual events by manipulating an object. The method uses the general grouping rules in Gestalt psychology, i.e. “simultaneity” and “similarity” among motion command, sound onsets and motion of the object in images. In experiments, we used a microphone, a camera, and a robot which has a hand manipulator. The robot grasps an object like a bell and shakes it or grasps an object like a stick and beat a drum in a periodic, or non-periodic motion. Then the object emits periodical/non-periodical events. To create more realistic scenario, we put other event source (a metronome) in the environment. As a result, we had a success rate of 73.8 percent in finding the correspondence between audio-visual events (afferent signal) which are relating to robot motion (efferent signal).

  15. Future-saving audiovisual content for Data Science: Preservation of geoinformatics video heritage with the TIB|AV-Portal

    Science.gov (United States)

    Löwe, Peter; Plank, Margret; Ziedorn, Frauke

    2015-04-01

    of Science and Technology. The web-based portal allows for extended search capabilities based on enhanced metadata derived by automated video analysis. By combining state-of-the-art multimedia retrieval techniques such as speech-, text-, and image recognition with semantic analysis, content-based access to videos at the segment level is provided. Further, by using the open standard Media Fragment Identifier (MFID), a citable Digital Object Identifier is displayed for each video segment. In addition to the continuously growing footprint of contemporary content, the importance of vintage audiovisual information needs to be considered: This paper showcases the successful application of the TIB|AV-Portal in the preservation and provision of a newly discovered version of a GRASS GIS promotional video produced by US Army -Corps of Enginers Laboratory (US-CERL) in 1987. The video is provides insight into the constraints of the very early days of the GRASS GIS project, which is the oldest active Free and Open Source Software (FOSS) GIS project which has been active for over thirty years. GRASS itself has turned into a collaborative scientific platform and a repository of scientific peer-reviewed code and algorithm/knowledge hub for future generation of scientists [1]. This is a reference case for future preservation activities regarding semantic-enhanced Web 2.0 content from geospatial software projects within Academia and beyond. References: [1] Chemin, Y., Petras V., Petrasova, A., Landa, M., Gebbert, S., Zambelli, P., Neteler, M., Löwe, P.: GRASS GIS: a peer-reviewed scientific platform and future research Repository, Geophysical Research Abstracts, Vol. 17, EGU2015-8314-1, 2015 (submitted)