WorldWideScience

Sample records for audio-visual speech cue

  1. Dynamic Bayesian Networks for Audio-Visual Speech Recognition

    Directory of Open Access Journals (Sweden)

    Liang Luhong

    2002-01-01

    Full Text Available The use of visual features in audio-visual speech recognition (AVSR is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM and the factorial HMM (FHMM, and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.

  2. Cross-modal matching of audio-visual German and French fluent speech in infancy.

    Science.gov (United States)

    Kubicek, Claudia; Hillairet de Boisferon, Anne; Dupierrix, Eve; Pascalis, Olivier; Lœvenbruck, Hélène; Gervain, Judit; Schwarzer, Gudrun

    2014-01-01

    The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants' audio-visual matching ability of native (German) and non-native (French) fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matching native as well as non-native audio and visual speech stimuli, whereas 6-month-olds perceived the audio-visual correspondence of native language stimuli only. This suggests that intersensory matching narrows for fluent speech between 4.5 and 6 months of age. In Experiment 2, auditory and visual speech information was presented simultaneously, therefore, providing temporal synchrony cues. Here, 6-month-olds were found to match native as well as non-native speech indicating facilitation of temporal synchrony cues on the intersensory perception of non-native fluent speech. Intriguingly, despite the fact that audio and visual stimuli cohered temporally, 12-month-olds matched the non-native language only. Results were discussed with regard to multisensory perceptual narrowing during the first year of life.

  3. Cross-modal matching of audio-visual German and French fluent speech in infancy.

    Directory of Open Access Journals (Sweden)

    Claudia Kubicek

    Full Text Available The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants' audio-visual matching ability of native (German and non-native (French fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matching native as well as non-native audio and visual speech stimuli, whereas 6-month-olds perceived the audio-visual correspondence of native language stimuli only. This suggests that intersensory matching narrows for fluent speech between 4.5 and 6 months of age. In Experiment 2, auditory and visual speech information was presented simultaneously, therefore, providing temporal synchrony cues. Here, 6-month-olds were found to match native as well as non-native speech indicating facilitation of temporal synchrony cues on the intersensory perception of non-native fluent speech. Intriguingly, despite the fact that audio and visual stimuli cohered temporally, 12-month-olds matched the non-native language only. Results were discussed with regard to multisensory perceptual narrowing during the first year of life.

  4. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

    Directory of Open Access Journals (Sweden)

    Magnus eAlm

    2015-07-01

    Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.

  5. Talker variability in audio-visual speech perception.

    Science.gov (United States)

    Heald, Shannon L M; Nusbaum, Howard C

    2014-01-01

    A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  6. Audio-Visual Speech Perception: A Developmental ERP Investigation

    Science.gov (United States)

    Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…

  7. Audio-visual temporal recalibration can be constrained by content cues regardless of spatial overlap

    Directory of Open Access Journals (Sweden)

    Warrick eRoseboom

    2013-04-01

    Full Text Available It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated, and opposing, estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this was necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; Experiment 1 and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; Experiment 2 we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.

  8. Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap.

    Science.gov (United States)

    Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin'ya

    2013-01-01

    It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this is necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; see Experiment 1) and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; see Experiment 2) we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair alone can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.

  9. Superior Temporal Activation in Response to Dynamic Audio-Visual Emotional Cues

    Science.gov (United States)

    Robins, Diana L.; Hunyadi, Elinora; Schultz, Robert T.

    2009-01-01

    Perception of emotion is critical for successful social interaction, yet the neural mechanisms underlying the perception of dynamic, audio-visual emotional cues are poorly understood. Evidence from language and sensory paradigms suggests that the superior temporal sulcus and gyrus (STS/STG) play a key role in the integration of auditory and visual…

  10. Robot Command Interface Using an Audio-Visual Speech Recognition System

    Science.gov (United States)

    Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

    In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.

  11. Using multiple visual tandem streams in audio-visual speech recognition

    OpenAIRE

    Topkaya, İbrahim Saygın; Topkaya, Ibrahim Saygin; Erdoğan, Hakan; Erdogan, Hakan

    2011-01-01

    The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers ...

  12. Audio-visual speech in noise perception in dyslexia

    NARCIS (Netherlands)

    van Laarhoven, T.; Keetels, M.N.; Schakel, L.; Vroomen, J.

    2017-01-01

    Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and

  13. Classifying laughter and speech using audio-visual feature prediction

    NARCIS (Netherlands)

    Petridis, Stavros; Asghar, Ali; Pantic, Maja

    2010-01-01

    In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and

  14. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Directory of Open Access Journals (Sweden)

    Yue Zhao

    2012-12-01

    Full Text Available Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.

  15. Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition

    Directory of Open Access Journals (Sweden)

    Berthommier Frédéric

    2002-01-01

    Full Text Available It has been shown that integration of acoustic and visual information especially in noisy conditions yields improved speech recognition results. This raises the question of how to weight the two modalities in different noise conditions. Throughout this paper we develop a weighting process adaptive to various background noise situations. In the presented recognition system, audio and video data are combined following a Separate Integration (SI architecture. A hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM system is used for the experiments. The neural networks were in all cases trained on clean data. Firstly, we evaluate the performance of different weighting schemes in a manually controlled recognition task with different types of noise. Next, we compare different criteria to estimate the reliability of the audio stream. Based on this, a mapping between the measurements and the free parameter of the fusion process is derived and its applicability is demonstrated. Finally, the possibilities and limitations of adaptive weighting are compared and discussed.

  16. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

    Directory of Open Access Journals (Sweden)

    Petar S. Aleksic

    2002-11-01

    Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0–30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.

  17. Infant perception of audio-visual speech synchrony in familiar and unfamiliar fluent speech.

    Science.gov (United States)

    Pons, Ferran; Lewkowicz, David J

    2014-06-01

    We investigated the effects of linguistic experience and language familiarity on the perception of audio-visual (A-V) synchrony in fluent speech. In Experiment 1, we tested a group of monolingual Spanish- and Catalan-learning 8-month-old infants to a video clip of a person speaking Spanish. Following habituation to the audiovisually synchronous video, infants saw and heard desynchronized clips of the same video where the audio stream now preceded the video stream by 366, 500, or 666 ms. In Experiment 2, monolingual Catalan and Spanish infants were tested with a video clip of a person speaking English. Results indicated that in both experiments, infants detected a 666 and a 500 ms asynchrony. That is, their responsiveness to A-V synchrony was the same regardless of their specific linguistic experience or familiarity with the tested language. Compared to previous results from infant studies with isolated audiovisual syllables, these results show that infants are more sensitive to A-V temporal relations inherent in fluent speech. Furthermore, the absence of a language familiarity effect on the detection of A-V speech asynchrony at eight months of age is consistent with the broad perceptual tuning usually observed in infant response to linguistic input at this age. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

    Directory of Open Access Journals (Sweden)

    D.V. Ivanko

    2016-05-01

    Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.

  19. Large scale functional brain networks underlying temporal integration of audio-visual speech perception: An EEG study

    OpenAIRE

    G. Vinodh Kumar; Tamesh Halder; Amit Kumar Jaiswal; Abhishek Mukherjee; Dipanjan Roy; Arpan Banerjee

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. How...

  20. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  1. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Iwano Koji

    2007-01-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  2. Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs

    NARCIS (Netherlands)

    Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

    2013-01-01

    Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal

  3. Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

    Directory of Open Access Journals (Sweden)

    Warrick Roseboom

    Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.

  4. The effect of combined sensory and semantic components on audio-visual speech perception in older adults.

    Science.gov (United States)

    Maguinness, Corrina; Setti, Annalisa; Burke, Kate E; Kenny, Rose Anne; Newell, Fiona N

    2011-01-01

    Previous studies have found that perception in older people benefits from multisensory over unisensory information. As normal speech recognition is affected by both the auditory input and the visual lip movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual 'blur' compared to audio-visual 'no blur' condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.

  5. The effect of combined sensory and semantic components on audio-visual speech perception in older adults

    Directory of Open Access Journals (Sweden)

    Corrina eMaguinness

    2011-12-01

    Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.

  6. Audio-visual speech perception in prelingually deafened Japanese children following sequential bilateral cochlear implantation.

    Science.gov (United States)

    Yamamoto, Ryosuke; Naito, Yasushi; Tona, Risa; Moroto, Saburo; Tamaya, Rinko; Fujiwara, Keizo; Shinohara, Shogo; Takebayashi, Shinji; Kikuchi, Masahiro; Michida, Tetsuhiko

    2017-11-01

    An effect of audio-visual (AV) integration is observed when the auditory and visual stimuli are incongruent (the McGurk effect). In general, AV integration is helpful especially in subjects wearing hearing aids or cochlear implants (CIs). However, the influence of AV integration on spoken word recognition in individuals with bilateral CIs (Bi-CIs) has not been fully investigated so far. In this study, we investigated AV integration in children with Bi-CIs. The study sample included thirty one prelingually deafened children who underwent sequential bilateral cochlear implantation. We assessed their responses to congruent and incongruent AV stimuli with three CI-listening modes: only the 1st CI, only the 2nd CI, and Bi-CIs. The responses were assessed in the whole group as well as in two sub-groups: a proficient group (syllable intelligibility ≥80% with the 1st CI) and a non-proficient group (syllable intelligibility Japanese children who underwent sequential bilateral cochlear implantation exhibit AV integration abilities, both in monaural listening as well as in binaural listening. We also observed a higher influence of visual stimuli on speech perception with the 2nd CI in the non-proficient group, suggesting that Bi-CIs listeners with poorer speech recognition rely on visual information more compared to the proficient subjects to compensate for poorer auditory input. Nevertheless, poorer quality auditory input with the 2nd CI did not interfere with AV integration with binaural listening (with Bi-CIs). Overall, the findings of this study might be used to inform future research to identify the best strategies for speech training using AV integration effectively in prelingually deafened children. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Syllable Congruency of Audio-Visual Speech Stimuli Facilitates the Spatial Ventriloquism Only with Bilateral Visual Presentations

    Directory of Open Access Journals (Sweden)

    Shoko Kanaya

    2011-10-01

    Full Text Available Spatial ventriloquism refers to a shift of perceptual location of a sound toward a synchronized visual stimulus. It has been assumed to reflect early processes uninfluenced by cognitive factors such as syllable congruency between audio-visual speech stimuli. Conventional experiments have examined compelling situations which typically entail pairs of single audio and visual stimuli to be bound. However, for natural environments our multisensory system is designed to select relevant sensory signals to be bound among adjacent stimuli. This selection process may depend upon higher (cognitive mechanisms. We investigated whether a cognitive factor affects the size of the ventriloquism when an additional visual stimulus is presented with a conventional audio-visual pair. Participants were presented with a set of audio-visual speech stimuli, comprising one or two bilateral movies of a person uttering single syllables together with recordings of this person speaking the same syllables. One of movies and the speech sound were combined in either congruent or incongruent ways. Participants had to identify sound locations. Results show that syllable congruency affected the size of the ventriloquism only when two movies were presented simultaneously. The selection of a relevant stimulus pair among two or more candidates can be regulated by some higher processes.

  8. Contributions of local speech encoding and functional connectivity to audio-visual speech perception.

    Science.gov (United States)

    Giordano, Bruno L; Ince, Robin A A; Gross, Joachim; Schyns, Philippe G; Panzeri, Stefano; Kayser, Christoph

    2017-06-07

    Seeing a speaker's face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker's face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments.

  9. Contributions of local speech encoding and functional connectivity to audio-visual speech perception

    Science.gov (United States)

    Giordano, Bruno L; Ince, Robin A A; Gross, Joachim; Schyns, Philippe G; Panzeri, Stefano; Kayser, Christoph

    2017-01-01

    Seeing a speaker’s face enhances speech intelligibility in adverse environments. We investigated the underlying network mechanisms by quantifying local speech representations and directed connectivity in MEG data obtained while human participants listened to speech of varying acoustic SNR and visual context. During high acoustic SNR speech encoding by temporally entrained brain activity was strong in temporal and inferior frontal cortex, while during low SNR strong entrainment emerged in premotor and superior frontal cortex. These changes in local encoding were accompanied by changes in directed connectivity along the ventral stream and the auditory-premotor axis. Importantly, the behavioral benefit arising from seeing the speaker’s face was not predicted by changes in local encoding but rather by enhanced functional connectivity between temporal and inferior frontal cortex. Our results demonstrate a role of auditory-frontal interactions in visual speech representations and suggest that functional connectivity along the ventral pathway facilitates speech comprehension in multisensory environments. DOI: http://dx.doi.org/10.7554/eLife.24763.001 PMID:28590903

  10. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study.

    Science.gov (United States)

    Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our

  11. Large scale functional brain networks underlying temporal integration of audio-visual speech perception: An EEG study

    Directory of Open Access Journals (Sweden)

    G. Vinodh Kumar

    2016-10-01

    Full Text Available Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal speech sound (McGurk-effect when presented with incongruent audio-visual (AV speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal and the integrative brain sites in the vicinity of the superior temporal sulcus (STS for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV

  12. Correlation between audio-visual enhancement of speech in different noise environments and SNR: a combined behavioral and electrophysiological study.

    Science.gov (United States)

    Liu, B; Lin, Y; Gao, X; Dang, J

    2013-09-05

    In the present study, we investigated the multisensory gain as the difference of speech recognition accuracies between the audio-visual (AV) and auditory-only (A) conditions, and the multisensory gain as the difference between the event-related potentials (ERPs) evoked under the AV condition and the sum of the ERPs evoked under the A and visual-only (V) conditions in different noise environments. Videos of a female speaker articulating the Chinese monosyllable words accompanied with different levels of pink noise were used as the stimulus materials. The selected signal-to-noise ratios (SNRs) were -16, -12, -8, -4 and 0 dB. Under the A, V and AV conditions the accuracy of the speech recognition was measured and the ERPs evoked under different conditions were analyzed, respectively. The behavioral results showed that the maximum gain as the difference of speech recognition accuracies between the AV and A conditions was at the -12 dB SNR. The ERP results showed that the multisensory gain as the difference between the ERPs evoked under the AV condition and the sum of ERPs evoked under the A and V conditions at the -12 dB SNR was significantly higher than those at the other SNRs in the time window of 130-200 ms in the area from frontal to central region. The multisensory gains in audio-visual speech recognition at different SNRs were not completely accordant with the principle of inverse effectiveness, but confirmed to cross-modal stochastic resonance. Copyright © 2013 IBRO. Published by Elsevier Ltd. All rights reserved.

  13. Audio-visual speech perception in noise: Implanted children and young adults versus normal hearing peers.

    Science.gov (United States)

    Taitelbaum-Swead, Riki; Fostick, Leah

    2017-01-01

    The purpose of the current study was to evaluate auditory, visual and audiovisual speech perception abilities among two groups of cochlear implant (CI) users: prelingual children and long-term young adults, as compared to their normal hearing (NH) peers. Prospective cohort study that included 50 participants, divided into two groups of CI (10 children and 10 adults), and two groups of normal hearing peers (15 participants each). Speech stimuli included monosyllabic meaningful and nonsense words in a signal to noise ratio of 0 dB. Speech stimuli were introduced via auditory, visual and audiovisual modalities. (1) CI children and adults show lower speech perception accuracy with background noise in audiovisual and auditory modalities, as compared to NH peers, but significantly higher visual speech perception scores. (2) CI children are superior to CI adults in speech perception in noise via auditory modality, but inferior in the visual one. Both CI children and CI adults had similar audiovisual integration. The findings of the current study show that in spite of the fact that the CI children were implanted bilaterally, at a very young age, and using advanced technology, they still have difficulties in perceiving speech in adverse listening conditions even when adding the visual modality. This suggests that adding audiovisual training might be beneficial for this group by improving their audiovisual integration in difficult listening situations. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  14. The Effect of Onset Asynchrony in Audio Visual Speech and the Uncanny Valley in Virtual Characters

    DEFF Research Database (Denmark)

    Tinwell, Angela; Grimshaw, Mark; Abdel Nabi, Deborah

    2015-01-01

    This study investigates if the Uncanny Valley phenomenon is increased for realistic, human-like characters with an asynchrony of lip movement during speech. An experiment was conducted in which 113 participants rated, a human and a realistic, talking-head, human-like, virtual character over a ran...

  15. A Possible Neurophysiological Correlate of AudioVisual Binding and Unbinding in Speech Perception

    Directory of Open Access Journals (Sweden)

    Attigodu Chandrashekara eGanesh

    2014-11-01

    Full Text Available Audiovisual speech integration of auditory and visual streams generally ends up in a fusion into a single percept. One classical example is the McGurk effect in which incongruent auditory and visual speech signals may lead to a fused percept different from either visual or auditory inputs. In a previous set of experiments, we showed that if a McGurk stimulus is preceded by an incongruent audiovisual context (composed of incongruent auditory and visual speech materials the amount of McGurk fusion is largely decreased. We interpreted this result in the framework of a two-stage binding and fusion model of audiovisual speech perception, with an early audiovisual binding stage controlling the fusion/decision process and likely to produce unbinding with less fusion if the context is incoherent. In order to provide further electrophysiological evidence for this binding/unbinding stage, early auditory evoked N1/P2 responses were here compared during auditory, congruent and incongruent audiovisual speech perception, according to either prior coherent or incoherent audiovisual contexts. Following the coherent context, in line with previous EEG/MEG studies, visual information in the congruent audiovisual condition was found to modify auditory evoked potentials, with a latency decrease of P2 responses compared to the auditory condition. Importantly, both P2 amplitude and latency in the congruent audiovisual condition increased from the coherent to the incoherent context. Although potential contamination by visual responses from the visual cortex cannot be discarded, our results might provide a possible neurophysiological correlate of early binding/unbinding process applied on audiovisual interactions.

  16. Speech Acquisition in Meetings with an Audio-Visual Sensor Array

    OpenAIRE

    McCowan, Iain A.; Krishna, Maganti Hari; Gatica-Perez, Daniel; Moore, Darren; Ba, Silèye O.

    2005-01-01

    Close-talk headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio -needed for recognition tasks- than single distant microphones. However, in multi-party conversational settings like meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intr...

  17. Audio-Visual Perception of Gender by Infants Emerges Earlier for Adult-Directed Speech.

    Directory of Open Access Journals (Sweden)

    Anne-Raphaëlle Richoz

    Full Text Available Early multisensory perceptual experiences shape the abilities of infants to perform socially-relevant visual categorization, such as the extraction of gender, age, and emotion from faces. Here, we investigated whether multisensory perception of gender is influenced by infant-directed (IDS or adult-directed (ADS speech. Six-, 9-, and 12-month-old infants saw side-by-side silent video-clips of talking faces (a male and a female and heard either a soundtrack of a female or a male voice telling a story in IDS or ADS. Infants participated in only one condition, either IDS or ADS. Consistent with earlier work, infants displayed advantages in matching female relative to male faces and voices. Moreover, the new finding that emerged in the current study was that extraction of gender from face and voice was stronger at 6 months with ADS than with IDS, whereas at 9 and 12 months, matching did not differ for IDS versus ADS. The results indicate that the ability to perceive gender in audiovisual speech is influenced by speech manner. Our data suggest that infants may extract multisensory gender information developmentally earlier when looking at adults engaged in conversation with other adults (i.e., ADS than when adults are directly talking to them (i.e., IDS. Overall, our findings imply that the circumstances of social interaction may shape early multisensory abilities to perceive gender.

  18. Audio-Visual Perception of Gender by Infants Emerges Earlier for Adult-Directed Speech.

    Science.gov (United States)

    Richoz, Anne-Raphaëlle; Quinn, Paul C; Hillairet de Boisferon, Anne; Berger, Carole; Loevenbruck, Hélène; Lewkowicz, David J; Lee, Kang; Dole, Marjorie; Caldara, Roberto; Pascalis, Olivier

    2017-01-01

    Early multisensory perceptual experiences shape the abilities of infants to perform socially-relevant visual categorization, such as the extraction of gender, age, and emotion from faces. Here, we investigated whether multisensory perception of gender is influenced by infant-directed (IDS) or adult-directed (ADS) speech. Six-, 9-, and 12-month-old infants saw side-by-side silent video-clips of talking faces (a male and a female) and heard either a soundtrack of a female or a male voice telling a story in IDS or ADS. Infants participated in only one condition, either IDS or ADS. Consistent with earlier work, infants displayed advantages in matching female relative to male faces and voices. Moreover, the new finding that emerged in the current study was that extraction of gender from face and voice was stronger at 6 months with ADS than with IDS, whereas at 9 and 12 months, matching did not differ for IDS versus ADS. The results indicate that the ability to perceive gender in audiovisual speech is influenced by speech manner. Our data suggest that infants may extract multisensory gender information developmentally earlier when looking at adults engaged in conversation with other adults (i.e., ADS) than when adults are directly talking to them (i.e., IDS). Overall, our findings imply that the circumstances of social interaction may shape early multisensory abilities to perceive gender.

  19. McGurk stimuli for the investigation of multisensory integration in cochlear implant users: The Oldenburg Audio Visual Speech Stimuli (OLAVS).

    Science.gov (United States)

    Stropahl, Maren; Schellhardt, Sebastian; Debener, Stefan

    2017-06-01

    The concurrent presentation of different auditory and visual syllables may result in the perception of a third syllable, reflecting an illusory fusion of visual and auditory information. This well-known McGurk effect is frequently used for the study of audio-visual integration. Recently, it was shown that the McGurk effect is strongly stimulus-dependent, which complicates comparisons across perceivers and inferences across studies. To overcome this limitation, we developed the freely available Oldenburg audio-visual speech stimuli (OLAVS), consisting of 8 different talkers and 12 different syllable combinations. The quality of the OLAVS set was evaluated with 24 normal-hearing subjects. All 96 stimuli were characterized based on their stimulus disparity, which was obtained from a probabilistic model (cf. Magnotti & Beauchamp, 2015). Moreover, the McGurk effect was studied in eight adult cochlear implant (CI) users. By applying the individual, stimulus-independent parameters of the probabilistic model, the predicted effect of stronger audio-visual integration in CI users could be confirmed, demonstrating the validity of the new stimulus material.

  20. Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

    Science.gov (United States)

    D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

    2016-08-01

    Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

    Science.gov (United States)

    Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

    2011-01-01

    Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…

  2. Atypical audio-visual speech perception and McGurk effects in children with specific language impairment.

    Science.gov (United States)

    Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric

    2014-01-01

    Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed.

  3. Audio-visual gender recognition

    Science.gov (United States)

    Liu, Ming; Xu, Xun; Huang, Thomas S.

    2007-11-01

    Combining different modalities for pattern recognition task is a very promising field. Basically, human always fuse information from different modalities to recognize object and perform inference, etc. Audio-Visual gender recognition is one of the most common task in human social communication. Human can identify the gender by facial appearance, by speech and also by body gait. Indeed, human gender recognition is a multi-modal data acquisition and processing procedure. However, computational multimodal gender recognition has not been extensively investigated in the literature. In this paper, speech and facial image are fused to perform a mutli-modal gender recognition for exploring the improvement of combining different modalities.

  4. Perception of Audio-Visual Speech Synchrony in Spanish-Speaking Children with and without Specific Language Impairment

    Science.gov (United States)

    Pons, Ferran; Andreu, Llorenc; Sanz-Torrent, Monica; Buil-Legaz, Lucia; Lewkowicz, David J.

    2013-01-01

    Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the…

  5. [Audio-visual aids and tropical medicine].

    Science.gov (United States)

    Morand, J J

    1989-01-01

    The author presents a list of the audio-visual productions about Tropical Medicine, as well as of their main characteristics. He thinks that the audio-visual educational productions are often dissociated from their promotion; therefore, he invites the future creator to forward his work to the Audio-Visual Health Committee.

  6. Audio-visual Materials and Rural Libraries

    Science.gov (United States)

    Escolar-Sobrino, Hipolito

    1972-01-01

    Audio-visual materials enlarge the educational work being done in the classroom and the library. This article examines the various types of audio-visual material and equipment and suggests ways in which audio-visual media can be used economically and efficiently in rural libraries. (Author)

  7. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

    Science.gov (United States)

    Gebru, Israel; Ba, Sileye; Li, Xiaofei; Horaud, Radu

    2017-01-05

    Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semisupervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.

  8. Audio-visual identification of place of articulation and voicing in white and babble noise.

    Science.gov (United States)

    Alm, Magnus; Behne, Dawn M; Wang, Yue; Eg, Ragnhild

    2009-07-01

    Research shows that noise and phonetic attributes influence the degree to which auditory and visual modalities are used in audio-visual speech perception (AVSP). Research has, however, mainly focused on white noise and single phonetic attributes, thus neglecting the more common babble noise and possible interactions between phonetic attributes. This study explores whether white and babble noise differentially influence AVSP and whether these differences depend on phonetic attributes. White and babble noise of 0 and -12 dB signal-to-noise ratio were added to congruent and incongruent audio-visual stop consonant-vowel stimuli. The audio (A) and video (V) of incongruent stimuli differed either in place of articulation (POA) or voicing. Responses from 15 young adults show that, compared to white noise, babble resulted in more audio responses for POA stimuli, and fewer for voicing stimuli. Voiced syllables received more audio responses than voiceless syllables. Results can be attributed to discrepancies in the acoustic spectra of both the noise and speech target. Voiced consonants may be more auditorily salient than voiceless consonants which are more spectrally similar to white noise. Visual cues contribute to identification of voicing, but only if the POA is visually salient and auditorily susceptible to the noise type.

  9. Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

    Directory of Open Access Journals (Sweden)

    José Antonio Palao Errando

    2007-10-01

    Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.

  10. Audio-Visual Technician | IDRC - International Development ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Controls the inventory of portable audio-visual equipment and mobile telephones within IDRC's loans library. Delivers, installs, uninstalls and removes equipment reserved by IDRC staff through the automated booking system. Participates in the planning process for upgrade and /or acquisition of new audio-visual ...

  11. "Look who's talking!" Gaze Patterns for Implicit and Explicit Audio-Visual Speech Synchrony Detection in Children With High-Functioning Autism.

    Science.gov (United States)

    Grossman, Ruth B; Steinhart, Erin; Mitchell, Teresa; McIlvane, William

    2015-06-01

    Conversation requires integration of information from faces and voices to fully understand the speaker's message. To detect auditory-visual asynchrony of speech, listeners must integrate visual movements of the face, particularly the mouth, with auditory speech information. Individuals with autism spectrum disorder may be less successful at such multisensory integration, despite their demonstrated preference for looking at the mouth region of a speaker. We showed participants (individuals with and without high-functioning autism (HFA) aged 8-19) a split-screen video of two identical individuals speaking side by side. Only one of the speakers was in synchrony with the corresponding audio track and synchrony switched between the two speakers every few seconds. Participants were asked to watch the video without further instructions (implicit condition) or to specifically watch the in-synch speaker (explicit condition). We recorded which part of the screen and face their eyes targeted. Both groups looked at the in-synch video significantly more with explicit instructions. However, participants with HFA looked at the in-synch video less than typically developing (TD) peers and did not increase their gaze time as much as TD participants in the explicit task. Importantly, the HFA group looked significantly less at the mouth than their TD peers, and significantly more at non-face regions of the image. There were no between-group differences for eye-directed gaze. Overall, individuals with HFA spend less time looking at the crucially important mouth region of the face during auditory-visual speech integration, which is maladaptive gaze behavior for this type of task. © 2015 International Society for Autism Research, Wiley Periodicals, Inc.

  12. Using audio visuals to illustrate concepts

    OpenAIRE

    Hodgson, Tom

    2005-01-01

    This short pedagogic paper investigates the use of audio visual presentation techniques to enhance teaching and learning in the classroom. It looks at the current 'MTV' generation of students who find it difficult to concentrate for long periods of time.

  13. Joint evaluation of communication quality and user experience in an audio-visual virtual reality meeting

    DEFF Research Database (Denmark)

    Møller, Anders Kalsgaard; Hoffmann, Pablo F.; Carrozzino, Marcello

    2013-01-01

    The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...

  14. Audio-visual Classification and Fusion of Spontaneous Affect Data in Likelihood Space

    NARCIS (Netherlands)

    Nicolaou, Mihalis A.; Gunes, Hatice; Pantic, Maja

    2010-01-01

    This paper focuses on audio-visual (using facial expression, shoulder and audio cues) classification of spontaneous affect, utilising generative models for classification (i) in terms of Maximum Likelihood Classification with the assumption that the generative model structure in the classifier is

  15. The effects of hearing protectors on auditory localization: evidence from audio-visual target acquisition.

    Science.gov (United States)

    Bolia, R S; McKinley, R L

    2000-01-01

    Response times (RT) in an audio-visual target acquisition task were collected from 3 participants while wearing either circumaural earmuffs, foam earplugs, or no hearing protection. Analyses revealed that participants took significantly longer to locate and identify an audio-visual target in both hearing protector conditions than they did in the unoccluded condition, suggesting a disturbance of the cues used by listeners to localize sounds in space. RTs were significantly faster in both hearing protector conditions than in a non-audio control condition, indicating that auditory localization was not completely disrupted. Results are discussed in terms of safety issues involved with wearing hearing protectors in an occupational environment.

  16. Acoustic cues identifying phonetic transitions for speech segmentation

    CSIR Research Space (South Africa)

    Van Niekerk, DR

    2008-11-01

    Full Text Available The quality of corpus-based text-to-speech (TTS) systems depends strongly on the consistency of boundary placements during phonetic alignments. Expert human transcribers use visually represented acoustic cues in order to consistently place...

  17. Segmentation cues in conversational speech: Robust semantics and fragile phonotactics

    Directory of Open Access Journals (Sweden)

    Laurence eWhite

    2012-10-01

    Full Text Available Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs hypoarticulation (H&H – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs read, using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylised landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech.Independent of speech style, we found an interaction between cue valence (favourable/unfavourable and cue type (phonotactics/semantics. Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behaviour. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically-based cues in the segmentation of natural

  18. Linguistic cues and memory for synthetic and natural speech.

    Science.gov (United States)

    Paris, C R; Thomas, M H; Gilson, R D; Kincaid, J P

    2000-01-01

    Past research has demonstrated that there are cognitive processing costs associated with comprehension of speech generated by text-to-speech synthesizers, relative to comprehension of natural speech. This finding has important performance implications for the many applications that use such systems. The purpose of this study was to ascertain whether certain characteristics of synthetic speech slow on-line, real-time cognitive processing. Whereas past research has focused on the phonemic acoustic structure of synthetic speech, we manipulated prosodic, syntactic, and semantic cues in a task requiring participants to recall sentences spoken either by a human or by one of two speech synthesizers. The findings were interpreted to suggest that inappropriate prosodic modeling in synthetic speech was the major source of a performance differential between natural and synthetic speech. Prosodic cues, along with others, guide the parsing of speech and provide redundancy. When these cues are absent or inaccurate, the additional burden placed on working memory may exceed its capacity, particularly in time-limited, demanding tasks. Actual or potential applications of this research include improvement of text-to-speech output systems in warning systems, feedback devices in aerospace vehicles, educational and training modules, aids for the handicapped, consumer products, and technologies designed to increase the functional independence of older adults.

  19. Modeling the Development of Audiovisual Cue Integration in Speech Perception.

    Science.gov (United States)

    Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

    2017-03-21

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

  20. Decision-Level Fusion for Audio-Visual Laughter Detection

    NARCIS (Netherlands)

    Reuderink, B.; Poel, Mannes; Truong, Khiet Phuong; Poppe, Ronald Walter; Pantic, Maja; Popescu-Belis, Andrei; Stiefelhagen, Rainer

    Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laugh- ter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio- visual laughter detection is

  1. Decision-level fusion for audio-visual laughter detection

    NARCIS (Netherlands)

    Reuderink, B.; Poel, M.; Truong, K.; Poppe, R.; Pantic, M.

    2008-01-01

    Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is

  2. The contribution of dynamic visual cues to audiovisual speech perception.

    Science.gov (United States)

    Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

    2015-08-01

    Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Cross-language differences in cue use for speech segmentation

    NARCIS (Netherlands)

    Tyler, M.D.; Cutler, A.

    2009-01-01

    Two artificial-language learning experiments directly compared English, French, and Dutch listeners' use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable

  4. Audio-visual perception system for a humanoid robotic head.

    Science.gov (United States)

    Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M; Bandera, Juan P; Romero-Garces, Adrian; Reche-Lopez, Pedro

    2014-05-28

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  5. Audio-Visual Perception System for a Humanoid Robotic Head

    Directory of Open Access Journals (Sweden)

    Raquel Viciana-Abad

    2014-05-01

    Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  6. Audio-visual detection benefits in the rat

    National Research Council Canada - National Science Library

    Gleiss, Stephanie; Kayser, Christoph

    2012-01-01

    ... multisensory protocols. We here demonstrate the feasibility of an audio-visual stimulus detection task for rats, in which the animals detect lateralized uni- and multi-sensory stimuli in a two-response forced choice paradigm...

  7. Audio-visual perception of new wind parks

    OpenAIRE

    Yu, T.; Behm, H.; Bill, R.; Kang, J.

    2017-01-01

    Previous studies have reported negative impacts of wind parks on the public. These studies considered the noise levels or visual levels separately but not audio-visual interactive factors. This study investigated the audio-visual impact of a new wind park using virtual technology that combined audio and visual features of the environment. Participants were immersed through Google Cardboard in an actual landscape without wind parks (ante operam) and in the same landscape with wind parks (post ...

  8. Face, body and speech cues independently predict judgments of attractiveness

    OpenAIRE

    Saxton, Tamsin; Burriss, Robert; Murray, Alice; Rowland, Hannah; Roberts, S. Craig

    2009-01-01

    Research on human attraction frequently makes use of single-modality stimuli such as neutral-expression facial photographs as proxy indicators of an individual’s attractiveness. However, we know little about how judgments of these single-modality stimuli correspond to judgments of stimuli that incorporate multi-modal cues of face, body and speech. In the present study, ratings of attractiveness judged from videos of participants introducing themselves were independently predicted by judgments...

  9. The Fungible Audio-Visual Mapping and its Experience

    Directory of Open Access Journals (Sweden)

    Adriana Sa

    2014-12-01

    Full Text Available This article draws a perceptual approach to audio-visual mapping. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping. Our aim here is to glean its constitution and aspect. We will report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, after which we pose quantitative and qualitative questions regarding their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole. 

  10. Audio Visual Materials for Pupil Personnel Services.

    Science.gov (United States)

    Huckins, Robert L.; And Others

    This publication lists various types of visual aids including films, filmstrips, and programs. They are listed by the following areas: (1) education, (2) guidance-professional, (3) occupational, (4) personal-social, (5) special education, and (6) speech and hearing. A brief description of content is provided. Age level is sometimes mentioned. (KJ)

  11. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception.

    Science.gov (United States)

    Treille, Avril; Vilain, Coriandre; Sato, Marc

    2014-01-01

    Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker's face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant and predictive visual and haptic cues may facilitate auditory speech processing. To further investigate this hypothesis, auditory evoked potentials were here compared during auditory-only, audio-visual and audio-haptic speech perception in live dyadic interactions between a listener and a speaker. In line with previous studies, auditory evoked potentials were attenuated and speeded up during both audio-haptic and audio-visual compared to auditory speech perception. Importantly, the observed latency and amplitude reduction did not significantly depend on the degree of visual and haptic recognition of the speech targets. Altogether, these results further demonstrate cross-modal interactions between the auditory, visual and haptic speech signals. Although they do not contradict the hypothesis that visual and haptic sensory inputs convey predictive information with respect to the incoming auditory speech input, these results suggest that, at least in live conversational interactions, systematic conclusions on sensory predictability in bimodal speech integration have to be taken with caution, with the extraction of predictive cues likely depending on the variability of the speech stimuli.

  12. Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration

    Directory of Open Access Journals (Sweden)

    Maren Stropahl

    2017-01-01

    Full Text Available There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users (n = 18, untreated mild to moderately hearing impaired individuals (n = 18 and normal hearing controls (n = 17. Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the

  13. Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus).

    Science.gov (United States)

    Flaherty, Mary; Dent, Micheal L; Sawusch, James R

    2017-01-01

    The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.

  14. Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus.

    Directory of Open Access Journals (Sweden)

    Mary Flaherty

    Full Text Available The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated, Passive speech exposure (regular exposure to human speech, and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.

  15. Is Alzheimer's disease a disconnection syndrome? Evidence from a crossmodal audio-visual illusory experiment.

    Science.gov (United States)

    Delbeuck, X; Collette, F; Van der Linden, M

    2007-11-05

    In Alzheimer's disease (AD), loss of connectivity in the patient's brain has been evidenced by a range of electrophysiological and neuroimaging studies. However, few neuropsychological research projects have sought to interpret the cognitive modifications following the appearance of AD in terms of a disconnection syndrome. In this paper, we sought to investigate brain connectivity in AD via the study of a crossmodal effect. More precisely, we examined the integration of auditory and visual speech information (the McGurk effect) in AD patients and matched control subjects. Our results revealed impaired crossmodal integration during speech perception in AD, which was not associated with disturbances in the separate processing of auditory and visual speech stimuli. In conclusion, our data suggest the occurrence of a specific, audio-visual integration deficit in AD, which might be the consequence of a connectivity breakdown and corroborate the observation from other studies of crossmodal deficits between the auditory and visual modalities in this population.

  16. Market potential for interactive audio-visual media

    NARCIS (Netherlands)

    Leurdijk, A.; Limonard, S.

    2005-01-01

    NM2 (New Media for a New Millennium) develops tools for interactive, personalised and non-linear audio-visual content that will be tested in seven pilot productions. This paper looks at the market potential for these productions from a technological, a business and a users' perspective. It shows

  17. Recent Audio-Visual Materials on the Soviet Union.

    Science.gov (United States)

    Clarke, Edith Campbell

    1981-01-01

    Identifies and describes audio-visual materials (films, filmstrips, and audio cassette tapes) about the Soviet Union which have been produced since 1977. For each entry, information is presented on title, time required, date of release, cost (purchase and rental), and an abstract. (DB)

  18. Selected Audio-Visual Materials for Consumer Education. [New Version.

    Science.gov (United States)

    Johnston, William L.

    Ninety-two films, filmstrips, multi-media kits, slides, and audio cassettes, produced between 1964 and 1974, are listed in this selective annotated bibliography on consumer education. The major portion of the bibliography is devoted to films and filmstrips. The main topics of the audio-visual materials include purchasing, advertising, money…

  19. Making Audio-Visual Teaching Materials for Elementary Science

    OpenAIRE

    永田, 四郎

    1980-01-01

    For the elementary science, some audio-visual teaching materials were made by author and our students. These materials are slides for projector, transparencies and materials for OHP, 8 mm sound films and video tapes. We hope this kind of study will continue.

  20. Effect of Audio-Visual Intervention Program on Cognitive ...

    African Journals Online (AJOL)

    Thus the purpose of the study was to study the effectiveness of the audio-visual intervention program on the cognitive development of preschool children in relation to their socio economic status. The researcher employed experimental method to conduct the study. The sample consisted of 100 students from preschool of ...

  1. Audio-Visual Communications, A Tool for the Professional

    Science.gov (United States)

    Journal of Environmental Health, 1976

    1976-01-01

    The manner in which the Cuyahoga County, Ohio Department of Environmental Health utilizes audio-visual presentations for communication with business and industry, professional public health agencies and the general public is presented. Subjects including food sanitation, radiation protection and safety are described. (BT)

  2. Audio-Visual Aid in Teaching "Fatty Liver"

    Science.gov (United States)

    Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

    2016-01-01

    Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various…

  3. The Audio-Visual Marketing Handbook for Independent Schools.

    Science.gov (United States)

    Griffith, Tom

    This how-to booklet offers specific advice on producing video or slide/tape programs for marketing independent schools. Five chapters present guidelines for various stages in the process: (1) Audio-Visual Marketing in Context (aesthetics and economics of audiovisual marketing); (2) A Question of Identity (identifying the audience and deciding on…

  4. Audio-visual materials usage preference among agricultural ...

    African Journals Online (AJOL)

    It was found that respondents preferred radio, television, poster, advert, photographs, specimen, bulletin, magazine, cinema, videotape, chalkboard, and bulletin board as audio-visual materials for extension work. These are the materials that can easily be manipulated and utilized for extension work. Nigerian Journal of ...

  5. The unity assumption facilitates cross-modal binding of musical, non-speech stimuli: The role of spectral and amplitude envelope cues.

    Science.gov (United States)

    Chuen, Lorraine; Schutz, Michael

    2016-07-01

    An observer's inference that multimodal signals originate from a common underlying source facilitates cross-modal binding. This 'unity assumption' causes asynchronous auditory and visual speech streams to seem simultaneous (Vatakis & Spence, Perception & Psychophysics, 69(5), 744-756, 2007). Subsequent tests of non-speech stimuli such as musical and impact events found no evidence for the unity assumption, suggesting the effect is speech-specific (Vatakis & Spence, Acta Psychologica, 127(1), 12-23, 2008). However, the role of amplitude envelope (the changes in energy of a sound over time) was not previously appreciated within this paradigm. Here, we explore whether previous findings suggesting speech-specificity of the unity assumption were confounded by similarities in the amplitude envelopes of the contrasted auditory stimuli. Experiment 1 used natural events with clearly differentiated envelopes: single notes played on either a cello (bowing motion) or marimba (striking motion). Participants performed an un-speeded temporal order judgments task; viewing audio-visually matched (e.g., marimba auditory with marimba video) and mismatched (e.g., cello auditory with marimba video) versions of stimuli at various stimulus onset asynchronies, and were required to indicate which modality was presented first. As predicted, participants were less sensitive to temporal order in matched conditions, demonstrating that the unity assumption can facilitate the perception of synchrony outside of speech stimuli. Results from Experiments 2 and 3 revealed that when spectral information was removed from the original auditory stimuli, amplitude envelope alone could not facilitate the influence of audiovisual unity. We propose that both amplitude envelope and spectral acoustic cues affect the percept of audiovisual unity, working in concert to help an observer determine when to integrate across modalities.

  6. Planning Schools for Use of Audio-Visual Materials. No. 3: The Audio-Visual Materials Center.

    Science.gov (United States)

    National Education Association, Washington, DC. Dept. of Audiovisual Instruction.

    This manual discusses the role, organizational patterns, expected services, and space and housing needs of the audio-visual instructional materials center. In considering the housing of basic functions, photographs, floor layouts, diagrams, and specifications of equipment are presented. An appendix includes a 77-item bibliography, a 7-page list of…

  7. Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Grossmann, Tobias

    2015-01-01

    Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…

  8. Audio-visual feedback improves the BCI performance in the navigational control of a humanoid robot.

    Science.gov (United States)

    Tidoni, Emmanuele; Gergondet, Pierre; Kheddar, Abderrahmane; Aglioti, Salvatore M

    2014-01-01

    Advancement in brain computer interfaces (BCI) technology allows people to actively interact in the world through surrogates. Controlling real humanoid robots using BCI as intuitively as we control our body represents a challenge for current research in robotics and neuroscience. In order to successfully interact with the environment the brain integrates multiple sensory cues to form a coherent representation of the world. Cognitive neuroscience studies demonstrate that multisensory integration may imply a gain with respect to a single modality and ultimately improve the overall sensorimotor performance. For example, reactivity to simultaneous visual and auditory stimuli may be higher than to the sum of the same stimuli delivered in isolation or in temporal sequence. Yet, knowledge about whether audio-visual integration may improve the control of a surrogate is meager. To explore this issue, we provided human footstep sounds as audio feedback to BCI users while controlling a humanoid robot. Participants were asked to steer their robot surrogate and perform a pick-and-place task through BCI-SSVEPs. We found that audio-visual synchrony between footsteps sound and actual humanoid's walk reduces the time required for steering the robot. Thus, auditory feedback congruent with the humanoid actions may improve motor decisions of the BCI's user and help in the feeling of control over it. Our results shed light on the possibility to increase robot's control through the combination of multisensory feedback to a BCI user.

  9. Audio-visual feedback improves the BCI performance in the navigational control of a humanoid robot

    Directory of Open Access Journals (Sweden)

    Emmanuele eTidoni

    2014-06-01

    Full Text Available Advancement in brain computer interfaces (BCI technology allows people to actively interact in the world through surrogates. Controlling real humanoid robots using BCI as intuitively as we control our body represents a challenge for current research in robotics and neuroscience. In order to successfully interact with the environment the brain integrates multiple sensory cues to form a coherent representation of the world. Cognitive neuroscience studies demonstrate that multisensory integration may imply a gain with respect to a single modality and ultimately improve the overall sensorimotor performance. For example, reactivity to simultaneous visual and auditory stimuli may be higher than to the sum of the same stimuli delivered in isolation or in temporal sequence. Yet, knowledge about whether audio-visual integration may improve the control of a surrogate is meager. To explore this issue, we provided human footstep sounds as audio feedback to BCI users while controlling a humanoid robot. Participants were asked to steer their robot surrogate and perform a pick-and-place task through BCI-SSVEPs. We found that audio-visual synchrony between footsteps sound and actual humanoid’s walk reduces the time required for steering the robot. Thus, auditory feedback congruent with the humanoid actions may improve motor decisions of the BCI’s user and help in the feeling of control over it. Our results shed light on the possibility to increase robot’s control through the combination of multisensory feedback to a BCI user.

  10. Modular Sensor Environment : Audio Visual Industry Monitoring Applications

    OpenAIRE

    Guillot, Calvin

    2017-01-01

    This work was made for Electro Waves Oy. The company specializes in Audio-visual services and interactive systems. The purpose of this work is to design and implement a modular sensor environment for the company, which will be used for developing automated systems. This thesis begins with an introduction to sensor systems and their different topologies. It is followed by an introduction to the technologies used in this project. The system is divided in three parts. The client, tha...

  11. Voice over: Audio-visual congruency and content recall in the gallery setting.

    Science.gov (United States)

    Fairhurst, Merle T; Scott, Minnie; Deroy, Ophelia

    2017-01-01

    Experimental research has shown that pairs of stimuli which are congruent and assumed to 'go together' are recalled more effectively than an item presented in isolation. Will this multisensory memory benefit occur when stimuli are richer and longer, in an ecological setting? In the present study, we focused on an everyday situation of audio-visual learning and manipulated the relationship between audio guide tracks and viewed portraits in the galleries of the Tate Britain. By varying the gender and narrative style of the voice-over, we examined how the perceived congruency and assumed unity of the audio guide track with painted portraits affected subsequent recall. We show that tracks perceived as best matching the viewed portraits led to greater recall of both sensory and linguistic content. We provide the first evidence that manipulating crossmodal congruence and unity assumptions can effectively impact memory in a multisensory ecological setting, even in the absence of precise temporal alignment between sensory cues.

  12. Voice activity detection using audio-visual information

    DEFF Research Database (Denmark)

    Petsatodis, Theodore; Pnevmatikakis, Aristodemos; Boukis, Christos

    2009-01-01

    An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post......-decision scheme. The Mel-Frequency Cepstral Coefficients and the vertical mouth opening are the chosen audio and visual features respectively, both augmented with their first-order derivatives. The proposed system is assessed using far-field recordings from four different speakers and under various levels...

  13. Linguistic experience and audio-visual perception of non-native fricatives.

    Science.gov (United States)

    Wang, Yue; Behne, Dawn M; Jiang, Haisheng

    2008-09-01

    This study examined the effects of linguistic experience on audio-visual (AV) perception of non-native (L2) speech. Canadian English natives and Mandarin Chinese natives differing in degree of English exposure [long and short length of residence (LOR) in Canada] were presented with English fricatives of three visually distinct places of articulation: interdentals nonexistent in Mandarin and labiodentals and alveolars common in both languages. Stimuli were presented in quiet and in a cafe-noise background in four ways: audio only (A), visual only (V), congruent AV (AVc), and incongruent AV (AVi). Identification results showed that overall performance was better in the AVc than in the A or V condition and better in quiet than in cafe noise. While the Mandarin long LOR group approximated the native English patterns, the short LOR group showed poorer interdental identification, more reliance on visual information, and greater AV-fusion with the AVi materials, indicating the failure of L2 visual speech category formation with the short LOR non-natives and the positive effects of linguistic experience with the long LOR non-natives. These results point to an integrated network in AV speech processing as a function of linguistic background and provide evidence to extend auditory-based L2 speech learning theories to the visual domain.

  14. Information-Driven Active Audio-Visual Source Localization.

    Directory of Open Access Journals (Sweden)

    Niclas Schult

    Full Text Available We present a system for sensorimotor audio-visual source localization on a mobile robot. We utilize a particle filter for the combination of audio-visual information and for the temporal integration of consecutive measurements. Although the system only measures the current direction of the source, the position of the source can be estimated because the robot is able to move and can therefore obtain measurements from different directions. These actions by the robot successively reduce uncertainty about the source's position. An information gain mechanism is used for selecting the most informative actions in order to minimize the number of actions required to achieve accurate and precise position estimates in azimuth and distance. We show that this mechanism is an efficient solution to the action selection problem for source localization, and that it is able to produce precise position estimates despite simplified unisensory preprocessing. Because of the robot's mobility, this approach is suitable for use in complex and cluttered environments. We present qualitative and quantitative results of the system's performance and discuss possible areas of application.

  15. Effects of virtual speaker density and room reverberation on spatiotemporal thresholds of audio-visual motion coherence.

    Science.gov (United States)

    Sankaran, Narayan; Leung, Johahn; Carlile, Simon

    2014-01-01

    The present study examined the effects of spatial sound-source density and reverberation on the spatiotemporal window for audio-visual motion coherence. Three different acoustic stimuli were generated in Virtual Auditory Space: two acoustically "dry" stimuli via the measurement of anechoic head-related impulse responses recorded at either 1° or 5° spatial intervals (Experiment 1), and a reverberant stimulus rendered from binaural room impulse responses recorded at 5° intervals in situ in order to capture reverberant acoustics in addition to head-related cues (Experiment 2). A moving visual stimulus with invariant localization cues was generated by sequentially activating LED's along the same radial path as the virtual auditory motion. Stimuli were presented at 25°/s, 50°/s and 100°/s with a random spatial offset between audition and vision. In a 2AFC task, subjects made a judgment of the leading modality (auditory or visual). No significant differences were observed in the spatial threshold based on the point of subjective equivalence (PSE) or the slope of psychometric functions (β) across all three acoustic conditions. Additionally, both the PSE and β did not significantly differ across velocity, suggesting a fixed spatial window of audio-visual separation. Findings suggest that there was no loss in spatial information accompanying the reduction in spatial cues and reverberation levels tested, and establish a perceptual measure for assessing the veracity of motion generated from discrete locations and in echoic environments.

  16. Effects of virtual speaker density and room reverberation on spatiotemporal thresholds of audio-visual motion coherence.

    Directory of Open Access Journals (Sweden)

    Narayan Sankaran

    Full Text Available The present study examined the effects of spatial sound-source density and reverberation on the spatiotemporal window for audio-visual motion coherence. Three different acoustic stimuli were generated in Virtual Auditory Space: two acoustically "dry" stimuli via the measurement of anechoic head-related impulse responses recorded at either 1° or 5° spatial intervals (Experiment 1, and a reverberant stimulus rendered from binaural room impulse responses recorded at 5° intervals in situ in order to capture reverberant acoustics in addition to head-related cues (Experiment 2. A moving visual stimulus with invariant localization cues was generated by sequentially activating LED's along the same radial path as the virtual auditory motion. Stimuli were presented at 25°/s, 50°/s and 100°/s with a random spatial offset between audition and vision. In a 2AFC task, subjects made a judgment of the leading modality (auditory or visual. No significant differences were observed in the spatial threshold based on the point of subjective equivalence (PSE or the slope of psychometric functions (β across all three acoustic conditions. Additionally, both the PSE and β did not significantly differ across velocity, suggesting a fixed spatial window of audio-visual separation. Findings suggest that there was no loss in spatial information accompanying the reduction in spatial cues and reverberation levels tested, and establish a perceptual measure for assessing the veracity of motion generated from discrete locations and in echoic environments.

  17. Normal-Hearing Listeners’ and Cochlear Implant Users’ Perception of Pitch Cues in Emotional Speech

    NARCIS (Netherlands)

    Gilbers, Steven; Fuller, Christina; Gilbers, Dicky; Broersma, M.; Goudbeek, Martijn; Free, Rolien; Başkent, Deniz

    2015-01-01

    In cochlear implants (CIs), acoustic speech cues, especially for pitch, are delivered in a degraded form. This study’s aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these

  18. Teaching Children with Autism Conversational Speech Using a Cue Card/Written Script Program.

    Science.gov (United States)

    Charlop-Christy, Marjorie H.; Kelso, Susan E.

    2003-01-01

    A study assessed the efficacy of a written script/cue card program to teach conversational speech skills to three verbal, literate boys (ages 8-10) with autism. Initially boys demonstrated low frequencies of conversational speech. Following intervention, all three quickly met the training criteria and maintained correct responding without cue…

  19. Training the Brain to Weight Speech Cues Differently: A Study of Finnish Second-language Users of English

    Science.gov (United States)

    Ylinen, Sari; Uther, Maria; Latvala, Antti; Vepsalainen, Sara; Iverson, Paul; Akahane-Yamada, Reiko; Naatanen, Risto

    2010-01-01

    Foreign-language learning is a prime example of a task that entails perceptual learning. The correct comprehension of foreign-language speech requires the correct recognition of speech sounds. The most difficult speech-sound contrasts for foreign-language learners often are the ones that have multiple phonetic cues, especially if the cues are…

  20. Contribution of Prosody in Audio-Visual Integration to Emotional Perception of Virtual Characters

    Directory of Open Access Journals (Sweden)

    Ekaterina Volkova

    2011-10-01

    Full Text Available Recent technology provides us with realistic looking virtual characters. Motion capture and elaborate mathematical models supply data for natural looking, controllable facial and bodily animations. With the help of computational linguistics and artificial intelligence, we can automatically assign emotional categories to appropriate stretches of text for a simulation of those social scenarios where verbal communication is important. All this makes virtual characters a valuable tool for creation of versatile stimuli for research on the integration of emotion information from different modalities. We conducted an audio-visual experiment to investigate the differential contributions of emotional speech and facial expressions on emotion identification. We used recorded and synthesized speech as well as dynamic virtual faces, all enhanced for seven emotional categories. The participants were asked to recognize the prevalent emotion of paired faces and audio. Results showed that when the voice was recorded, the vocalized emotion influenced participants' emotion identification more than the facial expression. However, when the voice was synthesized, facial expression influenced participants' emotion identification more than vocalized emotion. Additionally, individuals did worse on identifying either the facial expression or vocalized emotion when the voice was synthesized. Our experimental method can help to determine how to improve synthesized emotional speech.

  1. The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

    Science.gov (United States)

    Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

    2017-01-01

    We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework

  2. Audio Visual Media Components in Educational Game for Elementary Students

    Directory of Open Access Journals (Sweden)

    Meilani Hartono

    2016-12-01

    Full Text Available The purpose of this research was to review and implement interactive audio visual media used in an educational game to improve elementary students’ interest in learning mathematics. The game was developed for desktop platform. The art of the game was set as 2D cartoon art with animation and audio in order to make students more interest. There were four mini games developed based on the researches on mathematics study. Development method used was Multimedia Development Life Cycle (MDLC that consists of requirement, design, development, testing, and implementation phase. Data collection methods used are questionnaire, literature study, and interview. The conclusion is elementary students interest with educational game that has fun and active (moving objects, with fast tempo of music, and carefree color like blue. This educational game is hoped to be an alternative teaching tool combined with conventional teaching method.

  3. Emotional speech processing: disentangling the effects of prosody and semantic cues.

    Science.gov (United States)

    Pell, Marc D; Jaywant, Abhishek; Monetta, Laura; Kotz, Sonja A

    2011-08-01

    To inform how emotions in speech are implicitly processed and registered in memory, we compared how emotional prosody, emotional semantics, and both cues in tandem prime decisions about conjoined emotional faces. Fifty-two participants rendered facial affect decisions (Pell, 2005a), indicating whether a target face represented an emotion (happiness or sadness) or not (a facial grimace), after passively listening to happy, sad, or neutral prime utterances. Emotional information from primes was conveyed by: (1) prosody only; (2) semantic cues only; or (3) combined prosody and semantic cues. Results indicated that prosody, semantics, and combined prosody-semantic cues facilitate emotional decisions about target faces in an emotion-congruent manner. However, the magnitude of priming did not vary across tasks. Our findings highlight that emotional meanings of prosody and semantic cues are systematically registered during speech processing, but with similar effects on associative knowledge about emotions, which is presumably shared by prosody, semantics, and faces.

  4. The audio-visual revolution: do we really need it?

    Science.gov (United States)

    Townsend, I

    1979-03-01

    In the United Kingdom, The audio-visual revolution has steadily gained converts in the nursing profession. Nurse tutor courses now contain information on the techniques of educational technology and schools of nursing increasingly own (or wish to own) many of the sophisticated electronic aids to teaching that abound. This is taking place at a time of hitherto inexperienced crisis and change. Funds have been or are being made available to buy audio-visual equipment. But its purchase and use relies on satisfying personal whim, prejudice or educational fashion, not on considerations of educational efficiency. In the rush of enthusiasm, the overwhelmed teacher (everywhere; the phenomenon is not confined to nursing) forgets to ask the searching, critical questions: 'Why should we use this aid?','How effective is it?','And, at what?'. Influential writers in this profession have repeatedly called for a more responsible attitude towards published research work of other fields. In an attempt to discover what is known about the answers to this group of questions, an eclectic look at media research is taken and the widespread dissatisfaction existing amongst international educational technologists is noted. The paper isolates out of the literature several causative factors responsible for the present state of affairs. Findings from the field of educational television are cited as representative of an aid which has had a considerable amount of time and research directed at it. The concluding part of the paper shows the decisions to be taken in using or not using educational media as being more complicated than might at first appear.

  5. Easy Method for Inventory-Taking and Classification of Audio-Visual Material. First Edition, Revised.

    Science.gov (United States)

    Lamy-Rousseau, Francoise

    The alphanumeric code is a system put forward with the hope that it will bring uniformity in methods of inventory-taking and describing all sorts of audio-visual material which can be used in either French or English. The alphanumeric code classifies audio-visual materials in such a way as to indicate the exact nature of the media, the format, the…

  6. Planning Schools for Use of Audio-Visual Materials. No. 1--Classrooms, 3rd Edition.

    Science.gov (United States)

    National Education Association, Washington, DC.

    Intended to inform school board administrators and teachers of the current (1958) thinking on audio-visual instruction for use in planning new buildings, purchasing equipment, and planning instruction. Attention is given the problem of overcoming obstacles to the incorporation of audio-visual materials into the curriculum. Discussion includes--(1)…

  7. Culture through comparison: creating audio-visual listening materials for a CLIL course

    National Research Council Canada - National Science Library

    Zhyrun, Iryna

    2016-01-01

    ... of audio-visual materials design for listening comprehension taking into consideration educational and cultural contexts, course content, and language learning outcomes of the program. In addition, it discusses advantages and limitations of created audio-visual materials by contrasting them with authentic materials of similar type foun...

  8. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation

    Directory of Open Access Journals (Sweden)

    Briony eBanks

    2015-08-01

    Full Text Available Perceptual adaptation allows humans to understand a variety of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker’s facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese accent with audiovisual or audio-only cues, without separate training. Participants’ eye gaze was recorded to verify that they looked at the speaker’s face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, they do not improve perceptual adaptation.

  9. Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation

    Science.gov (United States)

    Kong, Ying-Yee; Donaldson, Gail; Somarowthu, Ala

    2015-01-01

    Low-frequency acoustic cues have shown to improve speech perception in cochlear-implant listeners. However, the mechanisms underlying this benefit are still not well understood. This study investigated the extent to which low-frequency cues can facilitate listeners' use of linguistic knowledge in simulated electric-acoustic stimulation (EAS). Experiment 1 examined differences in the magnitude of EAS benefit at the phoneme, word, and sentence levels. Speech materials were processed via noise-channel vocoding and lowpass (LP) filtering. The amount of spectral degradation in the vocoder speech was varied by applying different numbers of vocoder channels. Normal-hearing listeners were tested on vocoder-alone, LP-alone, and vocoder + LP conditions. Experiment 2 further examined factors that underlie the context effect on EAS benefit at the sentence level by limiting the low-frequency cues to temporal envelope and periodicity (AM + FM). Results showed that EAS benefit was greater for higher-context than for lower-context speech materials even when the LP ear received only low-frequency AM + FM cues. Possible explanations for the greater EAS benefit observed with higher-context materials may lie in the interplay between perceptual and expectation-driven processes for EAS speech recognition, and/or the band-importance functions for different types of speech materials. PMID:25994712

  10. Effects of pitch, level, and tactile cues on speech segregation

    Science.gov (United States)

    Drullman, Rob; Bronkhorst, Adelbert W.

    2003-04-01

    Sentence intelligibility for interfering speech was investigated as a function of level difference, pitch difference, and presence of tactile support. A previous study by the present authors [J. Acoust. Soc. Am. 111, 2432-2433 (2002)] had shown a small benefit of tactile support in the speech-reception threshold measured against a background of one to eight competing talkers. The present experiment focused on the effects of informational and energetic masking for one competing talker. Competing speech was obtained by manipulating the speech of the male target talker (different sentences). The PSOLA technique was used to increase the average pitch of competing speech by 2, 4, 8, or 12 semitones. Level differences between target and competing speech ranged from -16 to +4 dB. Tactile support (B&K 4810 shaker) was given to the index finger by presenting the temporal envelope of the low-pass-filtered speech (0-200 Hz). Sentences were presented diotically and the percentage of correctly perceived words was measured. Results show a significant overall increase in intelligibility score from 71% to 77% due to tactile support. Performance improves monotonically with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences.

  11. The influence of spectral distinctiveness on acoustic cue weighting in children's and adults' speech perception

    Science.gov (United States)

    Mayo, Catherine; Turk, Alice

    2005-09-01

    Children and adults appear to weight some acoustic cues differently in perceiving certain speech contrasts. One possible explanation for this difference is that children and adults make use of different strategies in the way that they process speech. An alternative explanation is that adult-child cue weighting differences are due to more general sensory (auditory) processing differences between the two groups. It has been proposed that children may be less able to deal with incomplete or insufficient acoustic information than are adults, and thus may require cues that are longer, louder, or more spectrally distinct to identify or discriminate between auditory stimuli. The current study tested this hypothesis by examining adults' and 3- to 7-year-old children's cue weighting for contrasts in which vowel-onset formant transitions varied from spectrally distinct (/no/-/mo/, /do/-/bo/, and /ta/-/da/) to spectrally similar (/ni/-/mi/, /de/-/be/, and /ti/-/di/). Spectrally distinct cues were more likely to yield different consonantal responses than were spectrally similar cues, for all listeners. Furthermore, as predicted by a sensory hypothesis, children were less likely to give different consonantal responses to stimuli distinguished by spectrally similar transitional cues than were adults. However, this pattern of behavior did not hold for all contrasts. Implications for theories of adult-child cue weighting differences are discussed.

  12. The role of reverberation-related binaural cues in the externalization of speech

    DEFF Research Database (Denmark)

    Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

    2015-01-01

    The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners’ ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones....... The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient...... for the externalization of a lateral sound source. In contrast, for a frontal source, an increased amount of binaural cues from reflections was required in order to obtain well externalized sound images. It was demonstrated that the interaction between the interaural cues of the direct sound and the reverberation...

  13. The effects of working memory capacity and semantic cues on the intelligibility of speech in noise.

    Science.gov (United States)

    Zekveld, Adriana A; Rudner, Mary; Johnsrude, Ingrid S; Rönnberg, Jerker

    2013-09-01

    This study examined how semantically related information facilitates the intelligibility of spoken sentences in the presence of masking sound, and how this facilitation is influenced by masker type and by individual differences in cognitive functioning. Dutch sentences were masked by stationary noise, fluctuating noise, or an interfering talker. Each sentence was preceded by a text cue; cues were either three words that were semantically related to the sentence or three unpronounceable nonwords. Speech reception thresholds were adaptively measured. Additional measures included working memory capacity (reading span and size comparison span), linguistic closure ability (text reception threshold), and delayed sentence recognition. Word cues facilitated speech perception in noise similarly for all masker types. Cue benefit was related to reading span performance when the masker was interfering speech, but not when other maskers were used, and it did not correlate with text reception threshold or size comparison span. Better reading span performance was furthermore associated with enhanced delayed recognition of sentences preceded by word relative to nonword cues, across masker types. The results suggest that working memory capacity is associated with release from informational masking by semantically related information, and additionally with the encoding, storage, or retrieval of speech content in memory.

  14. Perception of the Multisensory Coherence of Fluent Audiovisual Speech in Infancy: Its Emergence & the Role of Experience

    Science.gov (United States)

    Lewkowicz, David J.; Minar, Nicholas J.; Tift, Amy H.; Brandon, Melissa

    2014-01-01

    To investigate the developmental emergence of the ability to perceive the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8–10, and 12–14 month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor the 8–10 month-old infants exhibited audio-visual matching in that neither group exhibited greater looking at the matching monologue. In contrast, the 12–14 month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, they perceived the multisensory coherence of native-language monologues earlier in the test trials than of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12–14 month olds did not depend on audio-visual synchrony whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audio-visual synchrony cues are more important in the perception of the multisensory coherence of non-native than native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. PMID:25462038

  15. The role of temporal speech cues in facilitating the fluency of adults who stutter.

    Science.gov (United States)

    Park, Jin; Logan, Kenneth J

    2015-12-01

    Adults who stutter speak more fluently during choral speech contexts than they do during solo speech contexts. The underlying mechanisms for this effect remain unclear, however. In this study, we examined the extent to which the choral speech effect depended on presentation of intact temporal speech cues. We also examined whether speakers who stutter followed choral signals more closely than typical speakers did. 8 adults who stuttered and 8 adults who did not stutter read 60 sentences aloud during a solo speaking condition and three choral speaking conditions (240 total sentences), two of which featured either temporally altered or indeterminate word duration patterns. Effects of these manipulations on speech fluency, rate, and temporal entrainment with the choral speech signal were assessed. Adults who stutter spoke more fluently in all choral speaking conditions than they did when speaking solo. They also spoke slower and exhibited closer temporal entrainment with the choral signal during the mid- to late-stages of sentence production than the adults who did not stutter. Both groups entrained more closely with unaltered choral signals than they did with altered choral signals. Findings suggest that adults who stutter make greater use of speech-related information in choral signals when talking than adults with typical fluency do. The presence of fluency facilitation during temporally altered choral speech and conversation babble, however, suggests that temporal/gestural cueing alone cannot account for fluency facilitation in speakers who stutter. Other potential fluency enhancing mechanisms are discussed. The reader will be able to (a) summarize competing views on stuttering as a speech timing disorder, (b) describe the extent to which adults who stutter depend on an accurate rendering of temporal information in order to benefit from choral speech, and (c) discuss possible explanations for fluency facilitation in the presence of inaccurate or indeterminate

  16. Multidimensional Attributes of the Sense of Presence in Audio-Visual Content

    Directory of Open Access Journals (Sweden)

    Kazutomo Fukue

    2011-10-01

    Full Text Available The sense of presence is crucial for evaluating audio-visual equipment and content. To clarify the multidimensional attributes of the sense, we conducted three experiments on audio, visual, and audio-visual content items. Initially 345 adjectives, which express the sense of presence, were collected and the number of adjectives was reduced to 40 pairs based on the KJ method. Forty scenes were recorded with a high-definition video camera while their sounds were recorded using a dummy head. Each content item was reproduced with a 65-inch display and headphones in three conditions of audio-only, visual-only and audio-visual. Twenty-one subjects evaluated them using the 40 pairs of adjectives by the Semantic Differential method with seven-point scales. The sense of presence in each content item was also evaluated using a Likert scale. The experimental data was analyzed by the factor analysis and four, five and five factors were extracted for audio, visual, and audio-visual conditions, respectively. The multiple regression analysis revealed that audio and audio-visual presences were explained by the extracted factors, although further consideration is required for the visual presence. These results indicated that the factors of psychological loading and activity are relevant for the sense of presence.

  17. Audio-Visual Integration Modifies Emotional Judgment in Music

    Directory of Open Access Journals (Sweden)

    Shen-Yuan Su

    2011-10-01

    Full Text Available The conventional view that perceived emotion in music is derived mainly from auditory signals has led to neglect of the contribution of visual image. In this study, we manipulated mode (major vs. minor and examined the influence of a video image on emotional judgment in music. Melodies in either major or minor mode were controlled for tempo and rhythm and played to the participants. We found that Taiwanese participants, like Westerners, judged major melodies as expressing positive, and minor melodies negative, emotions. The major or minor melodies were then paired with video images of the singers, which were either emotionally congruent or incongruent with their modes. Results showed that participants perceived stronger positive or negative emotions with congruent audio-visual stimuli. Compared to listening to music alone, stronger emotions were perceived when an emotionally congruent video image was added and weaker emotions were perceived when an incongruent image was added. We therefore demonstrate that mode is important to perceive the emotional valence in music and that treating musical art as a purely auditory event might lose the enhanced emotional strength perceived in music, since going to a concert may lead to stronger perceived emotion than listening to the CD at home.

  18. Audio-visual assistance in co-creating transition knowledge

    Science.gov (United States)

    Hezel, Bernd; Broschkowski, Ephraim; Kropp, Jürgen P.

    2013-04-01

    Earth system and climate impact research results point to the tremendous ecologic, economic and societal implications of climate change. Specifically people will have to adopt lifestyles that are very different from those they currently strive for in order to mitigate severe changes of our known environment. It will most likely not suffice to transfer the scientific findings into international agreements and appropriate legislation. A transition is rather reliant on pioneers that define new role models, on change agents that mainstream the concept of sufficiency and on narratives that make different futures appealing. In order for the research community to be able to provide sustainable transition pathways that are viable, an integration of the physical constraints and the societal dynamics is needed. Hence the necessary transition knowledge is to be co-created by social and natural science and society. To this end, the Climate Media Factory - in itself a massively transdisciplinary venture - strives to provide an audio-visual connection between the different scientific cultures and a bi-directional link to stake holders and society. Since methodology, particular language and knowledge level of the involved is not the same, we develop new entertaining formats on the basis of a "complexity on demand" approach. They present scientific information in an integrated and entertaining way with different levels of detail that provide entry points to users with different requirements. Two examples shall illustrate the advantages and restrictions of the approach.

  19. The role of reverberation-related binaural cues in the externalization of speech.

    Science.gov (United States)

    Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

    2015-08-01

    The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners' ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones. The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient for the externalization of a lateral sound source. In contrast, for a frontal source, an increased amount of binaural cues from reflections was required in order to obtain well externalized sound images. It was demonstrated that the interaction between the interaural cues of the direct sound and the reverberation strongly affects the perception of externalization. An analysis of the short-term binaural cues showed that the amount of fluctuations of the binaural cues corresponded well to the externalization ratings obtained in the listening tests. The results further suggested that the precedence effect is involved in the auditory processing of the dynamic binaural cues that are utilized for externalization perception.

  20. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception

    Directory of Open Access Journals (Sweden)

    Avrill eTreille

    2014-05-01

    Full Text Available Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker’s face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant and predictive visual and haptic cues may facilitate auditory speech processing. To further investigate this hypothesis, auditory evoked potentials were here compared during auditory-only, audio-visual and audio-haptic speech perception in live dyadic interactions between a listener and a speaker. In line with previous studies, auditory evoked potentials were attenuated and speeded up during both audio-haptic and audio-visual compared to auditory speech perception. Importantly, the observed latency and amplitude reduction did not significantly depend on the degree of visual and haptic recognition of the speech targets. Altogether, these results further demonstrate cross-modal interactions between the auditory, visual and haptic speech signals. Although they do not contradict the hypothesis that visual and haptic sensory inputs convey predictive information with respect to the incoming auditory speech input, these results suggest that, at least in live conversational interactions, systematic conclusions on sensory predictability in bimodal speech integration have to be taken with caution, with the extraction of predictive cues likely depending on the variability of the speech stimuli.

  1. The effect of context and audio-visual modality on emotions elicited by a musical performance.

    Science.gov (United States)

    Coutinho, Eduardo; Scherer, Klaus R

    2017-07-01

    In this work, we compared emotions induced by the same performance of Schubert Lieder during a live concert and in a laboratory viewing/listening setting to determine the extent to which laboratory research on affective reactions to music approximates real listening conditions in dedicated performances. We measured emotions experienced by volunteer members of an audience that attended a Lieder recital in a church (Context 1) and emotional reactions to an audio-video-recording of the same performance in a university lecture hall (Context 2). Three groups of participants were exposed to three presentation versions in Context 2: (1) an audio-visual recording, (2) an audio-only recording, and (3) a video-only recording. Participants achieved statistically higher levels of emotional convergence in the live performance than in the laboratory context, and the experience of particular emotions was determined by complex interactions between auditory and visual cues in the performance. This study demonstrates the contribution of the performance setting and the performers' appearance and nonverbal expression to emotion induction by music, encouraging further systematic research into the factors involved.

  2. Real-time decreased sensitivity to an audio-visual illusion during goal-directed reaching.

    Directory of Open Access Journals (Sweden)

    Luc Tremblay

    Full Text Available In humans, sensory afferences are combined and integrated by the central nervous system (Ernst MO, Bülthoff HH (2004 Trends Cogn. Sci. 8: 162-169 and appear to provide a holistic representation of the environment. Empirical studies have repeatedly shown that vision dominates the other senses, especially for tasks with spatial demands. In contrast, it has also been observed that sound can strongly alter the perception of visual events. For example, when presented with 2 flashes and 1 beep in a very brief period of time, humans often report seeing 1 flash (i.e. fusion illusion, Andersen TS, Tiippana K, Sams M (2004 Brain Res. Cogn. Brain Res. 21: 301-308. However, it is not known how an unfolding movement modulates the contribution of vision to perception. Here, we used the audio-visual illusion to demonstrate that goal-directed movements can alter visual information processing in real-time. Specifically, the fusion illusion was linearly reduced as a function of limb velocity. These results suggest that cue combination and integration can be modulated in real-time by goal-directed behaviors; perhaps through sensory gating (Chapman CE, Beauchamp E (2006 J. Neurophysiol. 96: 1664-1675 and/or altered sensory noise (Ernst MO, Bülthoff HH (2004 Trends Cogn. Sci. 8: 162-169 during limb movements.

  3. The effect of context and audio-visual modality on emotions elicited by a musical performance

    Science.gov (United States)

    Coutinho, Eduardo; Scherer, Klaus R.

    2016-01-01

    In this work, we compared emotions induced by the same performance of Schubert Lieder during a live concert and in a laboratory viewing/listening setting to determine the extent to which laboratory research on affective reactions to music approximates real listening conditions in dedicated performances. We measured emotions experienced by volunteer members of an audience that attended a Lieder recital in a church (Context 1) and emotional reactions to an audio-video-recording of the same performance in a university lecture hall (Context 2). Three groups of participants were exposed to three presentation versions in Context 2: (1) an audio-visual recording, (2) an audio-only recording, and (3) a video-only recording. Participants achieved statistically higher levels of emotional convergence in the live performance than in the laboratory context, and the experience of particular emotions was determined by complex interactions between auditory and visual cues in the performance. This study demonstrates the contribution of the performance setting and the performers’ appearance and nonverbal expression to emotion induction by music, encouraging further systematic research into the factors involved. PMID:28781419

  4. Voice over: Audio-visual congruency and content recall in the gallery setting

    Science.gov (United States)

    Fairhurst, Merle T.; Scott, Minnie; Deroy, Ophelia

    2017-01-01

    Experimental research has shown that pairs of stimuli which are congruent and assumed to ‘go together’ are recalled more effectively than an item presented in isolation. Will this multisensory memory benefit occur when stimuli are richer and longer, in an ecological setting? In the present study, we focused on an everyday situation of audio-visual learning and manipulated the relationship between audio guide tracks and viewed portraits in the galleries of the Tate Britain. By varying the gender and narrative style of the voice-over, we examined how the perceived congruency and assumed unity of the audio guide track with painted portraits affected subsequent recall. We show that tracks perceived as best matching the viewed portraits led to greater recall of both sensory and linguistic content. We provide the first evidence that manipulating crossmodal congruence and unity assumptions can effectively impact memory in a multisensory ecological setting, even in the absence of precise temporal alignment between sensory cues. PMID:28636667

  5. Pengaruh layanan informasi bimbingan konseling berbantuan media audio visual terhadap empati siswa

    Directory of Open Access Journals (Sweden)

    Rita Kumalasari

    2017-05-01

    The results of research effective of audio-visual media counseling techniques effective and practical to increase the empathy of students are rational design, key concepts, understanding, purpose, content models, the role and qualifications tutor (counselor is expected, procedures or steps in the implementation of the audio-visual, evaluation, follow-up, support system. This research is proven effective in improving student behavior. Empathy behavior of students increases 28.9% from the previous 45.08% increase to 73.98%. This increase occurred in all aspects of empathy Keywords: Effective, Audio visual, Empathy

  6. An additive-factors design to disambiguate neuronal and areal convergence: measuring multisensory interactions between audio, visual, and haptic sensory streams using fMRI.

    Science.gov (United States)

    Stevenson, Ryan A; Kim, Sunah; James, Thomas W

    2009-09-01

    It can be shown empirically and theoretically that inferences based on established metrics used to assess multisensory integration with BOLD fMRI data, such as superadditivity, are dependent on the particular experimental situation. For example, the law of inverse effectiveness shows that the likelihood of finding superadditivity in a known multisensory region increases with decreasing stimulus discriminability. In this paper, we suggest that Sternberg's additive-factors design allows for an unbiased assessment of multisensory integration. Through the manipulation of signal-to-noise ratio as an additive factor, we have identified networks of cortical regions that show properties of audio-visual or visuo-haptic neuronal convergence. These networks contained previously identified multisensory regions and also many new regions, for example, the caudate nucleus for audio-visual integration, and the fusiform gyrus for visuo-haptic integration. A comparison of integrative networks across audio-visual and visuo-haptic conditions showed very little overlap, suggesting that neural mechanisms of integration are unique to particular sensory pairings. Our results provide evidence for the utility of the additive-factors approach by demonstrating its effectiveness across modality (vision, audition, and haptics), stimulus type (speech and non-speech), experimental design (blocked and event-related), method of analysis (SPM and ROI), and experimenter-chosen baseline. The additive-factors approach provides a method for investigating multisensory interactions that goes beyond what can be achieved with more established metric-based, subtraction-type methods.

  7. Psychoacoustic cues to emotion in speech prosody and music.

    Science.gov (United States)

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.

  8. Adaptive spatial filtering improves speech reception in noise while preserving binaural cues.

    Science.gov (United States)

    Bissmeyer, Susan R S; Goldsworthy, Raymond L

    2017-09-01

    Hearing loss greatly reduces an individual's ability to comprehend speech in the presence of background noise. Over the past decades, numerous signal-processing algorithms have been developed to improve speech reception in these situations for cochlear implant and hearing aid users. One challenge is to reduce background noise while not introducing interaural distortion that would degrade binaural hearing. The present study evaluates a noise reduction algorithm, referred to as binaural Fennec, that was designed to improve speech reception in background noise while preserving binaural cues. Speech reception thresholds were measured for normal-hearing listeners in a simulated environment with target speech generated in front of the listener and background noise originating 90° to the right of the listener. Lateralization thresholds were also measured in the presence of background noise. These measures were conducted in anechoic and reverberant environments. Results indicate that the algorithm improved speech reception thresholds, even in highly reverberant environments. Results indicate that the algorithm also improved lateralization thresholds for the anechoic environment while not affecting lateralization thresholds for the reverberant environments. These results provide clear evidence that this algorithm can improve speech reception in background noise while preserving binaural cues used to lateralize sound.

  9. Co-Occurrence Statistics as a Language-Dependent Cue for Speech Segmentation

    Science.gov (United States)

    Saksida, Amanda; Langus, Alan; Nespor, Marina

    2017-01-01

    To what extent can language acquisition be explained in terms of different associative learning mechanisms? It has been hypothesized that distributional regularities in spoken languages are strong enough to elicit statistical learning about dependencies among speech units. Distributional regularities could be a useful cue for word learning even…

  10. Magazine Production: A Selected, Annotated Bibliography of Audio-Visual Materials.

    Science.gov (United States)

    Applegate, Edd

    This bibliography, which contains 13 annotations, is designed to help instructors choose appropriate audio-visual materials for a course in magazine production. Names and addresses of institutions from which the materials may be secured have been included. (MS)

  11. Voice over: Audio-visual congruency and content recall in the gallery setting

    National Research Council Canada - National Science Library

    Merle T Fairhurst; Minnie Scott; Ophelia Deroy

    2017-01-01

    ...? In the present study, we focused on an everyday situation of audio-visual learning and manipulated the relationship between audio guide tracks and viewed portraits in the galleries of the Tate Britain...

  12. Audio/visual analysis for high-speed TV advertisement detection from MPEG bitstream

    OpenAIRE

    Sadlier, David A.

    2002-01-01

    Advertisement breaks dunng or between television programmes are typically flagged by senes of black-and-silent video frames, which recurrendy occur in order to audio-visually separate individual advertisement spots from one another. It is the regular prevalence of these flags that enables automatic differentiauon between what is programme content and what is advertisement break. Detection of these audio-visual depressions within broadcast television content provides a basis on which advertise...

  13. THE AUDIO-VISUAL DISTRACTION MINIMIZES THE CHILDREN’S LEVEL OF ANXIETY DURING CIRCUMCISION

    Directory of Open Access Journals (Sweden)

    Farida Juanita

    2017-07-01

    Full Text Available Introduction: Circumcision is one of minor surgery that usually done for school age children. Most of the children appear to be anxious enough. Audio-visual distraction is one of the methods that researcher want to applied to decrease children’s anxiety level during circumcision. The objective of this study was to identify the effect of audio-visual distraction to decrease children’s anxiety level during circumcision. Method: Non randomized pretest-posttest control group design was used in this study. There were 21 children divided into two groups, control group (n=13 receive intervention as usual, otherwise the intervention group (n=8 receive audio-visual distraction during circumcision. By using self report (scale of anxiety and physiological measures of anxiety (pulse rate per minute, children are evaluated before and after the intervention. Result:  The result showed that audio-visual distraction is efective to decrease the anxiety level of school age children during cicumcision with significance difference on the decrease of anxiety level between control and intervention group (p=0.000 and significance difference on the pulse rate per minute between control and intervention group (p=0.006. Discussion: It can be concluded that by applying the audio-visual distraction during circumcision could be minimized the children’s anxiety. The audio visual is needed for children to manage and reduce anxiety during invasive therapy through mecanism of distraction.

  14. Inconspicuous portable audio/visual recording: transforming an IV pole into a mobile video capture stand.

    Science.gov (United States)

    Pettineo, Christopher M; Vozenilek, John A; Kharasch, Morris; Wang, Ernest; Aitchison, Pam; Arreguin, Andrew

    2008-01-01

    Although a traditional simulation laboratory may have excellent installed audio/visual capabilities, often large classes overwhelm the limited space in the laboratory. With minimal monetary investment, it is possible to create a portable audio/visual stand from an old IV pole. An IV pole was transformed into an audio/visual stand to overcome the burden of transporting individual electronic components during a patient safety research project conducted in an empty patient room with a standardized patient. The materials and methods for making the modified IV pole are outlined in this article. The limiting factor of production is access to an old IV pole; otherwise a few purchases from an electronics store complete the audio/visual IV pole. The modified IV pole is a cost-effective and portable solution to limited space or the need for audio/visual capabilities outside of a simulation laboratory. The familiarity of an IV pole in a clinical setting reduces the visual disturbance of relocated audio/visual equipment in a room previously void of such instrumentation.

  15. Audio-visual interaction in visual motion detection: Synchrony versus Asynchrony.

    Science.gov (United States)

    Rosemann, Stephanie; Wefel, Inga-Maria; Elis, Volkan; Fahle, Manfred

    Detection and identification of moving targets is of paramount importance in everyday life, even if it is not widely tested in optometric practice, mostly for technical reasons. There are clear indications in the literature that in perception of moving targets, vision and hearing interact, for example in noisy surrounds and in understanding speech. The main aim of visual perception, the ability that optometry aims to optimize, is the identification of objects, from everyday objects to letters, but also the spatial orientation of subjects in natural surrounds. To subserve this aim, corresponding visual and acoustic features from the rich spectrum of signals supplied by natural environments have to be combined. Here, we investigated the influence of an auditory motion stimulus on visual motion detection, both with a concrete (left/right movement) and an abstract auditory motion (increase/decrease of pitch). We found that incongruent audiovisual stimuli led to significantly inferior detection compared to the visual only condition. Additionally, detection was significantly better in abstract congruent than incongruent trials. For the concrete stimuli the detection threshold was significantly better in asynchronous audiovisual conditions than in the unimodal visual condition. We find a clear but complex pattern of partly synergistic and partly inhibitory audio-visual interactions. It seems that asynchrony plays only a positive role in audiovisual motion while incongruence mostly disturbs in simultaneous abstract configurations but not in concrete configurations. As in speech perception in hearing-impaired patients, patients suffering from visual deficits should be able to benefit from acoustic information. Copyright © 2017 Spanish General Council of Optometry. Published by Elsevier España, S.L.U. All rights reserved.

  16. When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

    Science.gov (United States)

    Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

    2017-11-01

    Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Language identification with suprasegmental cues: a study based on speech resynthesis.

    Science.gov (United States)

    Ramus, F; Mehler, J

    1999-01-01

    This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm, or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm, and intonation (condition 1), rhythm and intonation (condition 2), intonation only (condition 3), or rhythm only (condition 4). The findings support the notion that syllabic rhythm is a necessary and sufficient cue for French adult subjects to discriminate English from Japanese sentences. The results are consistent with previous research using low-pass filtered speech, as well as with phonological theories predicting rhythmic differences between languages. Thus, the new methodology proposed appears to be well suited to study language discrimination. Applications for other domains of psycholinguistic research and for automatic language identification are considered.

  18. Language identification with suprasegmental cues: A study based on speech resynthesis

    OpenAIRE

    Ramus, Franck; Mehler, Jacques

    1999-01-01

    This paper proposes a new experimental paradigm to explore the discriminability of languages, a question which is crucial to the child born in a bilingual environment. This paradigm employs the speech resynthesis technique, enabling the experimenter to preserve or degrade acoustic cues such as phonotactics, syllabic rhythm or intonation from natural utterances. English and Japanese sentences were resynthesized, preserving broad phonotactics, rhythm and intonation (Condition 1), rhythm and int...

  19. Listener deficits in hypokinetic dysarthria: Which cues are most important in speech segmentation?

    Science.gov (United States)

    Wade, Carolyn Ann

    Listeners use prosodic cues to help them quickly process running speech. In English, listeners effortlessly use strong syllables to help them to find words in the continuous stream of speech produced by neurologically-intact individuals. However, listeners are not always presented with speech under such ideal circumstances. This thesis explores the question of word segmentation of English speech under one of these less ideal conditions; specifically, when the speaker may be impaired in his/her production of strong syllables, as in the case of hypokinetic dysarthria. Further, we attempt to discern which acoustic cue(s) are most degraded in hypokinetic dysarthria and the effect that this degradation has on listeners' segmentation when no additional semantic or pragmatic cues are present. Two individuals with Parkinson's disease, one with a rate disturbance and one with articulatory disruption, along with a typically aging control, were recorded repeating a series of nonsense syllables. Young adult listeners were then presented with recordings from one of these three speakers producing non-words (imprecise consonant articulation, rate disturbance, and control). After familiarization, the listeners were asked to rate the familiarity of the non-words produced by a second typically aging speaker. Results indicated speakers with hypokinetic dysarthria were able to modulate their intensity and duration for stressed and unstressed syllables in a way similar to that of control speakers. In addition, their mean and peak fundamental frequency for both stressed and unstressed syllables were significantly higher than that of the normally aging controls. ANOVA results revealed a marginal main effect of frequency in normal and consonant conditions for word versus nonwords listener ratings.

  20. A novel speech processing algorithm based on harmonicity cues in cochlear implant

    Science.gov (United States)

    Wang, Jian; Chen, Yousheng; Zhang, Zongping; Chen, Yan; Zhang, Weifeng

    2017-08-01

    This paper proposed a novel speech processing algorithm in cochlear implant, which used harmonicity cues to enhance tonal information in Mandarin Chinese speech recognition. The input speech was filtered by a 4-channel band-pass filter bank. The frequency ranges for the four bands were: 300-621, 621-1285, 1285-2657, and 2657-5499 Hz. In each pass band, temporal envelope and periodicity cues (TEPCs) below 400 Hz were extracted by full wave rectification and low-pass filtering. The TEPCs were modulated by a sinusoidal carrier, the frequency of which was fundamental frequency (F0) and its harmonics most close to the center frequency of each band. Signals from each band were combined together to obtain an output speech. Mandarin tone, word, and sentence recognition in quiet listening conditions were tested for the extensively used continuous interleaved sampling (CIS) strategy and the novel F0-harmonic algorithm. Results found that the F0-harmonic algorithm performed consistently better than CIS strategy in Mandarin tone, word, and sentence recognition. In addition, sentence recognition rate was higher than word recognition rate, as a result of contextual information in the sentence. Moreover, tone 3 and 4 performed better than tone 1 and tone 2, due to the easily identified features of the former. In conclusion, the F0-harmonic algorithm could enhance tonal information in cochlear implant speech processing due to the use of harmonicity cues, thereby improving Mandarin tone, word, and sentence recognition. Further study will focus on the test of the F0-harmonic algorithm in noisy listening conditions.

  1. Effects of syntactic cueing therapy on picture naming and connected speech in acquired aphasia.

    Science.gov (United States)

    Herbert, Ruth; Webster, Dianne; Dyson, Lucy

    2012-01-01

    Language therapy for word-finding difficulties in aphasia usually involves picture naming of single words with the support of cues. Most studies have addressed nouns in isolation, even though in connected speech nouns are more frequently produced with determiners. We hypothesised that improved word finding in connected speech would be most likely if intervention treated nouns in usual syntactic contexts. Six speakers with aphasia underwent language therapy using a software program developed for the purpose, which provided lexical and syntactic (determiner) cues. Exposure to determiners with nouns would potentially lead to improved picture naming of both treated and untreated nouns, and increased production of determiner plus noun combinations in connected speech. After intervention, picture naming of treated words improved for five of the six speakers, but naming of untreated words was unchanged. The number of determiner plus noun combinations in connected speech increased for four speakers. These findings attest to the close relationship between frequently co-occurring content and function words, and indicate that intervention for word-finding deficits can profitably proceed beyond single word naming, to retrieval in appropriate syntactic contexts. We also examined the relationship between effects of therapy, and amount and intensity of therapy. We found no relationship between immediate effects and amount or intensity of therapy. However, those participants whose naming maintained at follow-up completed the therapy regime in fewer sessions, of relatively longer duration. We explore the relationship between therapy regime and outcomes, and propose future considerations for research.

  2. Audio visual information fusion for human activity analysis

    OpenAIRE

    Thagadur Shivappa, Shankar

    2010-01-01

    Human activity analysis in unconstrained environments using far-field sensors is a challenging task. The fusion of audio and visual cues enables us to build robust and efficient human activity analysis systems. Traditional fusion schemes including feature-level, classifier-level and decision-level fusion have been explored in task- specific contexts to provide robustness to sensor and environmental noise. However, human activity analysis involves the extraction of information from audio and v...

  3. The development of audio-visual materials to prepare patients for medical procedures: an oncology application.

    Science.gov (United States)

    Carey, M; Schofield, P; Jefford, M; Krishnasamy, M; Aranda, S

    2007-09-01

    This paper describes a systematic process for the development of educational audio-visual materials that are designed to prepare patients for potentially threatening procedures. Literature relating to the preparation of patients for potentially threatening medical procedures, psychological theory, theory of diffusion of innovations and patient information was examined. Four key principles were identified as being important: (1) stakeholder consultation, (2) provision of information to prepare patients for the medical procedure, (3) evidence-based content, and (4) promotion of patient confidence. These principles are described along with an example of the development of an audio-visual resource to prepare patients for chemotherapy treatment. Using this example, practical strategies for the application of each of the principles are described. The principles and strategies described may provide a practical, evidence-based guide to the development of other types of patient audio-visual materials.

  4. Embodiment and Materialization in "Neutral" Materials: Using Audio-Visual Analysis to Discern Social Representations

    Directory of Open Access Journals (Sweden)

    Anna Hedenus

    2015-11-01

    Full Text Available The use of audio-visual media puts bodies literally in focus, but there is as yet surprisingly little in the methodology literature about how to analyze the body in this kind of material. The aim of this article is to illustrate how qualitative audio-visual analysis, focusing on embodiment and materialization, may be used to discern social representations; this is of especial interest when studying materials which have an explicit ambition to achieve "neutrality" without reference to certain kinds of bodies. Filmed occupational descriptions—produced by the Swedish Employment Agency (SEA—are analyzed and discussed. The examples presented in the article illustrate how various forms of audio-visual analysis—content analysis, sequential analysis and narrative analysis—can be used to reveal how social representations of occupations and practitioners are embodied and materialized in these films. URN: http://nbn-resolving.de/urn:nbn:de:0114-fqs160139

  5. Normal-Hearing Listeners’ and Cochlear Implant Users’ Perception of Pitch Cues in Emotional Speech

    Directory of Open Access Journals (Sweden)

    Steven Gilbers

    2015-10-01

    Full Text Available In cochlear implants (CIs, acoustic speech cues, especially for pitch, are delivered in a degraded form. This study’s aim is to assess whether due to degraded pitch cues, normal-hearing listeners and CI users employ different perceptual strategies to recognize vocal emotions, and, if so, how these differ. Voice actors were recorded pronouncing a nonce word in four different emotions: anger, sadness, joy, and relief. These recordings’ pitch cues were phonetically analyzed. The recordings were used to test 20 normal-hearing listeners’ and 20 CI users’ emotion recognition. In congruence with previous studies, high-arousal emotions had a higher mean pitch, wider pitch range, and more dominant pitches than low-arousal emotions. Regarding pitch, speakers did not differentiate emotions based on valence but on arousal. Normal-hearing listeners outperformed CI users in emotion recognition, even when presented with CI simulated stimuli. However, only normal-hearing listeners recognized one particular actor’s emotions worse than the other actors’. The groups behaved differently when presented with similar input, showing that they had to employ differing strategies. Considering the respective speaker’s deviating pronunciation, it appears that for normal-hearing listeners, mean pitch is a more salient cue than pitch range, whereas CI users are biased toward pitch range cues.

  6. Audio-visual training-aid for speechreading

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich; Gebert, H.

    2011-01-01

    on the employment of computer‐based communication aids for hearing‐impaired, deaf and deaf‐blind people [6]. This paper presents the complete system that is composed of a 3D‐facial animation with synchronized speech synthesis, a natural language dialogue unit and a student‐teacher‐training module. Due to the very...... of fundamental knowledge of other words. The present version of the training aid can be used for the training of speechreading in English, this as a consequence of the integrated English language models for facial animation and speech synthesis. Nevertheless, the training aid is prepared to handle all possible...... by the teacher must re‐written in the new language. In the paper we present the current version of the training‐aid together with results from evaluation experiments with hearing impaired persons and explain functionality and interaction of the modules, the interaction between student, teacher, and virtual...

  7. When Meaning Is Not Enough: Distributional and Semantic Cues to Word Categorization in Child Directed Speech.

    Science.gov (United States)

    Feijoo, Sara; Muñoz, Carmen; Amadó, Anna; Serrat, Elisabet

    2017-01-01

    One of the most important tasks in first language development is assigning words to their grammatical category. The Semantic Bootstrapping Hypothesis postulates that, in order to accomplish this task, children are guided by a neat correspondence between semantic and grammatical categories, since nouns typically refer to objects and verbs to actions. It is this correspondence that guides children's initial word categorization. Other approaches, on the other hand, suggest that children might make use of distributional cues and word contexts to accomplish the word categorization task. According to such approaches, the Semantic Bootstrapping assumption offers an important limitation, as it might not be true that all the nouns that children hear refer to specific objects or people. In order to explore that, we carried out two studies based on analyses of children's linguistic input. We analyzed child-directed speech addressed to four children under the age of 2;6, taken from the CHILDES database. The corpora were selected from the Manchester corpus. The corpora from the four selected children contained a total of 10,681 word types and 364,196 word tokens. In our first study, discriminant analyses were performed using semantic cues alone. The results show that many of the nouns found in parents' speech do not relate to specific objects and that semantic information alone might not be sufficient for successful word categorization. Given that there must be an additional source of information which, alongside with semantics, might assist young learners in word categorization, our second study explores the availability of both distributional and semantic cues in child-directed speech. Our results confirm that this combination might yield better results for word categorization. These results are in line with theories that suggest the need for an integration of multiple cues from different sources in language development.

  8. Sensitivity to audio-visual synchrony and its relation to language abilities in children with and without ASD.

    Science.gov (United States)

    Righi, Giulia; Tenenbaum, Elena J; McCormick, Carolyn; Blossom, Megan; Amso, Dima; Sheinkopf, Stephen J

    2018-01-13

    Autism Spectrum Disorder (ASD) is often accompanied by deficits in speech and language processing. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to examine whether young children with ASD show reduced sensitivity to temporal asynchronies in a speech processing task when compared to typically developing controls, and to examine how this sensitivity might relate to language proficiency. Using automated eye tracking methods, we found that children with ASD failed to demonstrate sensitivity to asynchronies of 0.3s, 0.6s, or 1.0s between a video of a woman speaking and the corresponding audio track. In contrast, typically developing children who were language-matched to the ASD group, were sensitive to both 0.6s and 1.0s asynchronies. We also demonstrated that individual differences in sensitivity to audiovisual asynchronies and individual differences in orientation to relevant facial features were both correlated with scores on a standardized measure of language abilities. Results are discussed in the context of attention to visual language and audio-visual processing as potential precursors to language impairment in ASD. Autism Res 2018. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to explore whether children with ASD process audio-visual synchrony in ways comparable to their typically developing peers, and the relationship between preference for synchrony and language ability. Results showed that there are

  9. Peningkatan Keterampilan Membaca Puisi Melalui Media Audio Visual Siswa Kelas V Sdn Rowosari 02 Semarang

    OpenAIRE

    Ainun Alifah, Djariyo

    2013-01-01

    This research is motivated by the skills of reading poetry is very low, students can not read poetry properly, the students have not dared to read poetry in front of the class with its own style and expression. issues that will be studied in the research of this class action is whether the audio-visual media can improve the skills of reading poetry Grade V SDN Rowosari 02 academic year 2011/2012. The hypothesis of this study is if the action using optimal audio-visual media in the learning pr...

  10. Audio-Visual Perception System for a Humanoid Robotic Head

    OpenAIRE

    Raquel Viciana-Abad; Rebeca Marfil; Perez-Lorenzo, Jose M.; Juan P. Bandera; Adrian Romero-Garces; Pedro Reche-Lopez

    2014-01-01

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can...

  11. Listeners' expectation of room acoustical parameters based on visual cues

    Science.gov (United States)

    Valente, Daniel L.

    Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audio-visual study, in which participants are instructed to make spatial congruency and quantity judgments in dynamic cross-modal environments. The results of these psychophysical tests suggest the importance of consilient audio-visual presentation to the legibility of an auditory scene. Several studies have looked into audio-visual interaction in room perception in recent years, but these studies rely on static images, speech signals, or photographs alone to represent the visual scene. Building on these studies, the aim is to propose a testing method that uses monochromatic compositing (blue-screen technique) to position a studio recording of a musical performance in a number of virtual acoustical environments and ask subjects to assess these environments. In the first experiment of the study, video footage was taken from five rooms varying in physical size from a small studio to a small performance hall. Participants were asked to perceptually align two distinct acoustical parameters---early-to-late reverberant energy ratio and reverberation time---of two solo musical performances in five contrasting visual environments according to their expectations of how the room should sound given its visual appearance. In the second experiment in the study, video footage shot from four different listening positions within a general-purpose space was coupled with sounds derived from measured binaural impulse responses (IRs). The relationship between the presented image, sound, and virtual receiver position was examined. It was found that many visual cues caused different perceived events of the acoustic environment. This included the visual attributes of the space in which the performance was located as well as the visual attributes of the performer

  12. Audio-visual object search is changed by bilingual experience.

    Science.gov (United States)

    Chabal, Sarah; Schroeder, Scott R; Marian, Viorica

    2015-11-01

    The current study examined the impact of language experience on the ability to efficiently search for objects in the face of distractions. Monolingual and bilingual participants completed an ecologically-valid, object-finding task that contained conflicting, consistent, or neutral auditory cues. Bilinguals were faster than monolinguals at locating the target item, and eye movements revealed that this speed advantage was driven by bilinguals' ability to overcome interference from visual distractors and focus their attention on the relevant object. Bilinguals fixated the target object more often than did their monolingual peers, who, in contrast, attended more to a distracting image. Moreover, bilinguals', but not monolinguals', object-finding ability was positively associated with their executive control ability. We conclude that bilinguals' executive control advantages extend to real-world visual processing and object finding within a multi-modal environment.

  13. ACES Human Sexuality Training Network Handbook. A Compilation of Sexuality Course Syllabi and Audio-Visual Material.

    Science.gov (United States)

    American Association for Counseling and Development, Alexandria, VA.

    This handbook contains a compilation of human sexuality course syllabi and audio-visual materials. It was developed to enable sex educators to identify and contact one another, to compile Human Sexuality Course Syllabi from across the country, and to bring to attention audio-visual materials which are available for teaching Human Sexuality…

  14. Acceptance of online audio-visual cultural heritage archive services: a study of the general public

    NARCIS (Netherlands)

    Ongena, G.; van de Wijngaert, Lidwien; Huizer, E.

    2013-01-01

    Introduction. This study examines the antecedents of user acceptance of an audio-visual heritage archive for a wider audience (i.e., the general public) by extending the technology acceptance model with the concepts of perceived enjoyment, nostalgia proneness and personal innovativeness. Method. A

  15. An Annotated Guide to Audio-Visual Materials for Teaching Shakespeare.

    Science.gov (United States)

    Albert, Richard N.

    Audio-visual materials, found in a variety of periodicals, catalogs, and reference works, are listed in this guide to expedite the process of finding appropriate classroom materials for a study of William Shakespeare in the classroom. Separate listings of films, filmstrips, and recordings are provided, with subdivisions for "The Plays"…

  16. The Use of Video as an Audio-visual Material in Foreign Language Teaching Classroom

    Science.gov (United States)

    Cakir, Ismail

    2006-01-01

    In recent years, a great tendency towards the use of technology and its integration into the curriculum has gained a great importance. Particularly, the use of video as an audio-visual material in foreign language teaching classrooms has grown rapidly because of the increasing emphasis on communicative techniques, and it is obvious that the use of…

  17. Filmstrips, Phonograph Records, Cassettes: An Annotated List of Audio-Visual Materials.

    Science.gov (United States)

    Nazzaro, Lois B., Ed.

    The Reader Development Program of The Free Library of Philadelphia makes available audio-visual materials designed to aid under-educated adults and young adults in overcoming the educational, cultural and economic deficiencies in their lives. These materials are loaned for a week at a time to instructors, tutors, reading specialists, social…

  18. An Annotated List of Audio-Visual Materials, Supplement One. Reader Development Program.

    Science.gov (United States)

    Forinash, Melissa R., Ed.

    This annual supplement to the annotated list of audio-visual materials includes the filmstrips added to the Reader Development collection since June, 1971. The list is arranged alphabetically by filmstrip title, and a brief subject index follows the list. A catalog giving the addresses of filmstrip distributors is also included. A total of 43…

  19. Designing between Pedagogies and Cultures: Audio-Visual Chinese Language Resources for Australian Schools

    Science.gov (United States)

    Yuan, Yifeng; Shen, Huizhong

    2016-01-01

    This design-based study examines the creation and development of audio-visual Chinese language teaching and learning materials for Australian schools by incorporating users' feedback and content writers' input that emerged in the designing process. Data were collected from workshop feedback of two groups of Chinese-language teachers from primary…

  20. Attention to affective audio-visual information: Comparison between musicians and non-musicians

    NARCIS (Netherlands)

    Weijkamp, J.; Sadakata, M.

    2017-01-01

    Individuals with more musical training repeatedly demonstrate enhanced auditory perception abilities. The current study examined how these enhanced auditory skills interact with attention to affective audio-visual stimuli. A total of 16 participants with more than 5 years of musical training

  1. Multimodal indexing of digital audio-visual documents: A case study for cultural heritage data

    NARCIS (Netherlands)

    Carmichael, J.; Larson, M.; Marlow, J.; Newman, E.; Clough, P.; Oomen, J.; Sav, S.

    2008-01-01

    This paper describes a multimedia multimodal information access sub-system (MIAS) for digital audio-visual documents, typically presented in streaming media format. The system is designed to provide both professional and general users with entry points into video documents that are relevant to their

  2. Challenges of Using Audio-Visual Aids as Warm-Up Activity in Teaching Aviation English

    Science.gov (United States)

    Sahin, Mehmet; Sule, St.; Seçer, Y. E.

    2016-01-01

    This study aims to find out the challenges encountered in the use of video as audio-visual material as a warm-up activity in aviation English course at high school level. This study is based on a qualitative study in which focus group interview is used as the data collection procedure. The participants of focus group are four instructors teaching…

  3. Online Dissection Audio-Visual Resources for Human Anatomy: Undergraduate Medical Students' Usage and Learning Outcomes

    Science.gov (United States)

    Choi-Lundberg, Derek L.; Cuellar, William A.; Williams, Anne-Marie M.

    2016-01-01

    In an attempt to improve undergraduate medical student preparation for and learning from dissection sessions, dissection audio-visual resources (DAVR) were developed. Data from e-learning management systems indicated DAVR were accessed by 28% ± 10 (mean ± SD for nine DAVR across three years) of students prior to the corresponding dissection…

  4. Technical Considerations in the Delivery of Audio-Visual Course Content.

    Science.gov (United States)

    Lightfoot, Jay M.

    2002-01-01

    In an attempt to provide students with the benefit of the latest technology, some instructors include multimedia content on their class Web sites. This article introduces the basic terms and concepts needed to understand the multimedia domain. Provides a brief tutorial designed to help instructors create good, consistent audio-visual content. (AEF)

  5. Development of an Estimation Model for Instantaneous Presence in Audio-Visual Content

    National Research Council Canada - National Science Library

    OZAWA, Kenji; TSUKAHARA, Shota; KINOSHITA, Yuichiro; MORISE, Masanori

    2016-01-01

    ...: system presence and content presence. In this study we focused on content presence. To estimate the overall presence of a content item, we have developed estimation models for the sense of presence in audio-only and audio-visual content...

  6. A Guide to Audio-Visual References: Selection and Ordering Sources.

    Science.gov (United States)

    Bonn, Thomas L., Comp.

    Audio-visual reference sources and finding guides to identify media for classroom utilization are compiled in this list of sources at State University of New York College at Cortland libraries. Citations with annotations and library locations are included under the subject headings: (1) general sources for all media formats; (2) reviews, guides,…

  7. Primary School Pupils' Response to Audio-Visual Learning Process in Port-Harcourt

    Science.gov (United States)

    Olube, Friday K.

    2015-01-01

    The purpose of this study is to examine primary school children's response on the use of audio-visual learning processes--a case study of Chokhmah International Academy, Port-Harcourt (owned by Salvation Ministries). It looked at the elements that enhance pupils' response to educational television programmes and their hindrances to these…

  8. Photojournalism: The Basic Course. A Selected, Annotated Bibliography of Audio-Visual Materials.

    Science.gov (United States)

    Applegate, Edd

    Designed to help instructors choose appropriate audio-visual materials for the basic course in photojournalism, this bibliography contains 11 annotated entries. Annotations include the name of the materials, running time, whether black-and-white or color, and names of institutions from which the materials can be secured, as well as brief…

  9. Evaluation of Modular EFL Educational Program (Audio-Visual Materials Translation & Translation of Deeds & Documents)

    Science.gov (United States)

    Imani, Sahar Sadat Afshar

    2013-01-01

    Modular EFL Educational Program has managed to offer specialized language education in two specific fields: Audio-visual Materials Translation and Translation of Deeds and Documents. However, no explicit empirical studies can be traced on both internal and external validity measures as well as the extent of compatibility of both courses with the…

  10. Automatic audio-visual fusion for aggression detection using meta-information

    NARCIS (Netherlands)

    Lefter, I.; Burghouts, G.J.; Rothkrantz, L.J.M.

    2012-01-01

    We propose a new method for audio-visual sensor fusion and apply it to automatic aggression detection. While a variety of definitions of aggression exist, in this paper we see it as any kind of behavior that has a disturbing effect on others. We have collected multi- and unimodal assessments by

  11. A comparative study on automatic audio-visual fusion for aggression detection using meta-information

    NARCIS (Netherlands)

    Lefter, I.; Rothkrantz, L.J.M.; Burghouts, G.J.

    2013-01-01

    Multimodal fusion is a complex topic. For surveillance applications audio-visual fusion is very promising given the complementary nature of the two streams. However, drawing the correct conclusion from multi-sensor data is not straightforward. In previous work we have analysed a database with audio-

  12. Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

    Science.gov (United States)

    Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

    This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of

  13. Person identification for mobile robot using audio-visual modality

    Science.gov (United States)

    Kim, Young-Ouk; Chin, Sehoon; Lee, Jihoon; Paik, Joonki

    2005-10-01

    Recently, we experienced significant advancement in intelligent service robots. The remarkable features of an intelligent robot include tracking and identification of person using biometric features. The human-robot interaction is very important because it is one of the final goals of an intelligent service robot. Many researches are concentrating in two fields. One is self navigation of a mobile robot and the other is human-robot interaction in natural environment. In this paper we will present an effective person identification method for HRI (Human Robot Interaction) using two different types of expert systems. However, most of mobile robots run under uncontrolled and complicated environment. It means that face and speech information can't be guaranteed under varying conditions, such as lighting, noisy sound, orientation of a robot. According to a value of illumination and signal to noise ratio around mobile a robot, our proposed fuzzy rule make a reasonable person identification result. Two embedded HMM (Hidden Marhov Model) are used for each visual and audio modality to identify person. The performance of our proposed system and experimental results are compared with single modality identification and simply mixed method of two modality.

  14. Using auditory classification images for the identification of fine acoustic cues used in speech perception.

    Directory of Open Access Journals (Sweden)

    Léo eVarnet

    2013-12-01

    Full Text Available An essential step in understanding the processes underlying the general mechanism of perceptual categorization is to identify which portions of a physical stimulation modulate the behavior of our perceptual system. More specifically, in the context of speech comprehension, it is still a major open challenge to understand which information is used to categorize a speech stimulus as one phoneme or another, the auditory primitives relevant for the categorical perception of speech being still unknown. Here we propose to adapt technique relying on a Generalized Linear Model with smoothness priors technique, already used in the visual domain for estimation of so-called classification images, to auditory experiments. This statistical model offers a rigorous framework for dealing with non-Gaussian noise, as it is often the case in the auditory modality, and limits the amount of noise in the estimated template by enforcing smoother solution. By applying this technique to a specific two-alternative forced choice experiment between stimuli ‘aba’ and ‘ada’ in noise with an adaptive SNR, we confirm that the second formantic transition is a key for classifying phonemes into /b/ or /d/ in noise, and that its estimation by the auditory system is a relative measurement across spectral bands and in relation to the perceived height of the second formant in the preceding syllable. Through this example, we show how the GLM with smoothness priors approach can be applied to the identification of fine functional acoustic cues in speech perception. Finally we discuss some assumptions of the model in the specific case of speech perception.

  15. Neural correlates of multisensory reliability and perceptual weights emerge at early latencies during audio-visual integration.

    Science.gov (United States)

    Boyle, Stephanie C; Kayser, Stephanie J; Kayser, Christoph

    2017-11-01

    To make accurate perceptual estimates, observers must take the reliability of sensory information into account. Despite many behavioural studies showing that subjects weight individual sensory cues in proportion to their reliabilities, it is still unclear when during a trial neuronal responses are modulated by the reliability of sensory information or when they reflect the perceptual weights attributed to each sensory input. We investigated these questions using a combination of psychophysics, EEG-based neuroimaging and single-trial decoding. Our results show that the weighted integration of sensory information in the brain is a dynamic process; effects of sensory reliability on task-relevant EEG components were evident 84 ms after stimulus onset, while neural correlates of perceptual weights emerged 120 ms after stimulus onset. These neural processes had different underlying sources, arising from sensory and parietal regions, respectively. Together these results reveal the temporal dynamics of perceptual and neural audio-visual integration and support the notion of temporally early and functionally specific multisensory processes in the brain. © 2017 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

  16. Audio-visual relaxation training for anxiety, sleep, and relaxation among Chinese adults with cardiac disease.

    Science.gov (United States)

    Tsai, Sing-Ling

    2004-12-01

    The long-term effect of an audio-visual relaxation training (RT) treatment involving deep breathing, exercise, muscle relaxation, guided imagery, and meditation was compared with routine nursing care for reducing anxiety, improving sleep, and promoting relaxation in Chinese adults with cardiac disease. This research was a quasi-experimental, two-group, pretest-posttest study. A convenience sample of 100 cardiology patients (41 treatment, 59 control) admitted to one large medical center hospital in the Republic of China (ROC) was studied for 1 year. The hypothesized relationships were supported. RT significantly (p anxiety, sleep, and relaxation in the treatment group as compared to the control group. It appears audio-visual RT might be a beneficial adjunctive therapy for adult cardiac patients. However, considerable further work using stronger research designs is needed to determine the most appropriate instructional methods and the factors that contribute to long-term consistent practice of RT with Chinese populations.

  17. Rehabilitation of balance-impaired stroke patients through audio-visual biofeedback

    DEFF Research Database (Denmark)

    Gheorghe, Cristina; Nissen, Thomas; Juul Rosengreen Christensen, Daniel

    2015-01-01

    training exercise without any technological input, (2) a visual biofeedback group, performing via visual input, and (3) an audio-visual biofeedback group, performing via audio and visual input. Results retrieved from comparisons between the data sets (2) and (3) suggested superior postural stability......This study explored how audio-visual biofeedback influences physical balance of seven balance-impaired stroke patients, between 33–70 years-of-age. The setup included a bespoke balance board and a music rhythm game. The procedure was designed as follows: (1) a control group who performed a balance...... between test sessions for (2). Regarding the data set (1), the testers were less motivated to perform training exercises although their performance was superior to (2) and (3). Conclusions are that the audio component motivated patients to train although the physical performance was decreased....

  18. Impact of audio-visual storytelling in simulation learning experiences of undergraduate nursing students.

    Science.gov (United States)

    Johnston, Sandra; Parker, Christina N; Fox, Amanda

    2017-09-01

    Use of high fidelity simulation has become increasingly popular in nursing education to the extent that it is now an integral component of most nursing programs. Anecdotal evidence suggests that students have difficulty engaging with simulation manikins due to their unrealistic appearance. Introduction of the manikin as a 'real patient' with the use of an audio-visual narrative may engage students in the simulated learning experience and impact on their learning. A paucity of literature currently exists on the use of audio-visual narratives to enhance simulated learning experiences. This study aimed to determine if viewing an audio-visual narrative during a simulation pre-brief altered undergraduate nursing student perceptions of the learning experience. A quasi-experimental post-test design was utilised. A convenience sample of final year baccalaureate nursing students at a large metropolitan university. Participants completed a modified version of the Student Satisfaction with Simulation Experiences survey. This 12-item questionnaire contained questions relating to the ability to transfer skills learned in simulation to the real clinical world, the realism of the simulation and the overall value of the learning experience. Descriptive statistics were used to summarise demographic information. Two tailed, independent group t-tests were used to determine statistical differences within the categories. Findings indicated that students reported high levels of value, realism and transferability in relation to the viewing of an audio-visual narrative. Statistically significant results (t=2.38, psimulation to clinical practice. The subgroups of age and gender although not significant indicated some interesting results. High satisfaction with simulation was indicated by all students in relation to value and realism. There was a significant finding in relation to transferability on knowledge and this is vital to quality educational outcomes. Copyright © 2017. Published by

  19. An Audio-visual Approach to Teaching the Social Aspects of Sustainable Product Design

    Directory of Open Access Journals (Sweden)

    Matthew Alan Watkins

    2015-07-01

    Full Text Available This paper considers the impact of audio-visual resources in enabling students to develop an understanding of the social aspects of sustainable product design. Building on literature con­cern­ing the learning preferences of ‘Net Generation’ learners, three audio-visual workshops were developed to introduce students to the wider social aspects of sustainability and encour­age students to reflect upon the impact of their practice. The workshops were delivered in five universities in Britain and Ireland among undergraduate and postgraduate students. They were designed to encourage students to reflect upon carefully designed audio-visual materials in a group-based environment, seeking to foster the preferences of Net Generation learners through collaborative learning and learning through discovery. It also sought to address the perceived weaknesses of this generation of learners by encouraging critical reflection. The workshops proved to be popular with students and were successful in enabling them to grasp the complexity of the social aspects of sustainable design in a short span of time, as well as in encouraging personal responses and creative problem solving through an exploration of design thinking solutions.

  20. The Problems and Challenges of Managing Crowd Sourced Audio-Visual Evidence

    Directory of Open Access Journals (Sweden)

    Harjinder Singh Lallie

    2014-04-01

    Full Text Available A number of recent incidents, such as the Stanley Cup Riots, the uprisings in the Middle East and the London riots have demonstrated the value of crowd sourced audio-visual evidence wherein citizens submit audio-visual footage captured on mobile phones and other devices to aid governmental institutions, responder agencies and law enforcement authorities to confirm the authenticity of incidents and, in the case of criminal activity, to identify perpetrators. The use of such evidence can present a significant logistical challenge to investigators, particularly because of the potential size of data gathered through such mechanisms and the added problems of time-lining disparate sources of evidence and, subsequently, investigating the incident(s. In this paper we explore this problem and, in particular, outline the pressure points for an investigator. We identify and explore a number of particular problems related to the secure receipt of the evidence, imaging, tagging and then time-lining the evidence, and the problem of identifying duplicate and near duplicate items of audio-visual evidence.

  1. Semantic congruency but not temporal synchrony enhances long-term memory performance for audio-visual scenes.

    Science.gov (United States)

    Meyerhoff, Hauke S; Huff, Markus

    2016-04-01

    Human long-term memory for visual objects and scenes is tremendous. Here, we test how auditory information contributes to long-term memory performance for realistic scenes. In a total of six experiments, we manipulated the presentation modality (auditory, visual, audio-visual) as well as semantic congruency and temporal synchrony between auditory and visual information of brief filmic clips. Our results show that audio-visual clips generally elicit more accurate memory performance than unimodal clips. This advantage even increases with congruent visual and auditory information. However, violations of audio-visual synchrony hardly have any influence on memory performance. Memory performance remained intact even with a sequential presentation of auditory and visual information, but finally declined when the matching tracks of one scene were presented separately with intervening tracks during learning. With respect to memory performance, our results therefore show that audio-visual integration is sensitive to semantic congruency but remarkably robust against asymmetries between different modalities.

  2. Effects of noise and audiovisual cues on speech processing in adults with and without ADHD.

    Science.gov (United States)

    Michalek, Anne M P; Watson, Silvana M; Ash, Ivan; Ringleb, Stacie; Raymer, Anastasia

    2014-03-01

    This study examined the interplay among internal (e.g. attention, working memory abilities) and external (e.g. background noise, visual information) factors in individuals with and without ADHD. A 2 × 2 × 6 mixed design with correlational analyses was used to compare participant results on a standardized listening in noise sentence repetition task (QuickSin; Killion et al, 2004 ), presented in an auditory and an audiovisual condition as signal-to-noise ratio (SNR) varied from 25-0 dB and to determine individual differences in working memory capacity and short-term recall. Thirty-eight young adults without ADHD and twenty-five young adults with ADHD. Diagnosis, modality, and signal-to-noise ratio all affected the ability to process speech in noise. The interaction between the diagnosis of ADHD, the presence of visual cues, and the level of noise had an effect on a person's ability to process speech in noise. conclusion: Young adults with ADHD benefited less from visual information during noise than young adults without ADHD, an effect influenced by working memory abilities.

  3. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.

  4. A psychophysical imaging method evidencing auditory cue extraction during speech perception: a group analysis of auditory classification images.

    Science.gov (United States)

    Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel

    2015-01-01

    Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.

  5. BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

    OpenAIRE

    Karpov, A.A.; M. Zelezny

    2014-01-01

    We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar) are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating ...

  6. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech

    Directory of Open Access Journals (Sweden)

    Matthew ePoon

    2015-11-01

    Full Text Available Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound happier than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here we describe a novel, score-based exploration of the use of pitch height and timing in a set of balanced major and minor key compositions. Our corpus contained all 24 Preludes and 24 Fugues from Bach’s Well Tempered Clavier (book 1, as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor and key chroma (A, B, C, etc.. Consistent with predictions derived from speech, we found major-key (nominally happy pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally sad pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post-hoc analyses illustrate interesting trade-offs, with

  7. Audio-Visual Feedback for Self-monitoring Posture in Ballet Training

    DEFF Research Database (Denmark)

    Knudsen, Esben Winther; Hølledig, Malte Lindholm; Bach-Nielsen, Sebastian Siem

    2017-01-01

    An application for ballet training is presented that monitors the posture position (straightness of the spine and rotation of the pelvis) deviation from the ideal position in real-time. The human skeletal data is acquired through a Microsoft Kinect v2. The movement of the student is mirrored......-coded. In an experiment with 9-12 year-old dance students from a ballet school, comparing the audio-visual feedback modality with no feedback leads to an increase in posture accuracy (p card feedback and expert interviews indicate that the feedback is considered fun and useful...... for training independently from the teacher....

  8. Making sense, nonsense, and no-sense when representing audio-visual collections

    DEFF Research Database (Denmark)

    Madsen, Theis Vallø

    2017-01-01

    This chapter taps into broader discussions about digital culture, big data, user-generated content, and presence theory. It reconsiders methods for organizing and visualizing large data sets, in particular audio-visual collections, by addressing sense-making, nonsense-making, and no......-sense-making in the work on mapping and representing these collections. Visualizing collections of art and other artifacts forces us to consider methods of sense-making and nonsense-making as a desirable byproduct of crowd-sourcing. In all this, we must not forget “no sense”, or perhaps more precise “presence...

  9. An interactive audio-visual installation using ubiquitous hardware and web-based software deployment

    Directory of Open Access Journals (Sweden)

    Tiago Fernandes Tavares

    2015-05-01

    Full Text Available This paper describes an interactive audio-visual musical installation, namely MOTUS, that aims at being deployed using low-cost hardware and software. This was achieved by writing the software as a web application and using only hardware pieces that are built-in most modern personal computers. This scenario implies in specific technical restrictions, which leads to solutions combining both technical and artistic aspects of the installation. The resulting system is versatile and can be freely used from any computer with Internet access. Spontaneous feedback from the audience has shown that the provided experience is interesting and engaging, regardless of the use of minimal hardware.

  10. PRESENTATIONS OF AGLONA’S PILGRIM GROUPS: AUDIO-VISUAL CODES

    OpenAIRE

    Juško-Štekele, Angelika

    2017-01-01

    The article „Pilgrimage to Aglona: Audio-Visual Codes” is dedicated to Aglona pilgrimage, which is considered a significant element of intangible cultural heritage of Latvia. The importance of this tradition has been acknowledged by its vitality: in spite of the historical complexities, the tradition of Aglona ritual pilgrimage has survived for more than a century and in due course has strengthened its value in practice and social memory of the community. At the same time it is not a rigid va...

  11. Using Play Activities and Audio-Visual Aids to Develop Speaking Skills

    Directory of Open Access Journals (Sweden)

    Casallas Mutis Nidia

    2000-08-01

    Full Text Available A project was conducted in order to improve oral proficiency in English through the use of play activities and audio-visual aids, with students of first grade in a bilingual school, in la Calera. They were between 6 and 7 years old. As the sample for this study, the fivestudents who had the lowest language oral proficiency were selected. According to the results, it is clear that the sample has improved their English oral proficiency a great deal. However, the process has to be continued because this skill needs constant practice in order to be developed.

  12. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception

    National Research Council Canada - National Science Library

    Treille, Avril; Vilain, Coriandre; Sato, Marc

    2014-01-01

    Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed...

  13. Concurrent audio-visual feedback for supporting drivers at intersections: A study using two linked driving simulators.

    Science.gov (United States)

    Houtenbos, M; de Winter, J C F; Hale, A R; Wieringa, P A; Hagenzieker, M P

    2017-04-01

    A large portion of road traffic crashes occur at intersections for the reason that drivers lack necessary visual information. This research examined the effects of an audio-visual display that provides real-time sonification and visualization of the speed and direction of another car approaching the crossroads on an intersecting road. The location of red blinking lights (left vs. right on the speedometer) and the lateral input direction of beeps (left vs. right ear in headphones) corresponded to the direction from where the other car approached, and the blink and beep rates were a function of the approaching car's speed. Two driving simulators were linked so that the participant and the experimenter drove in the same virtual world. Participants (N = 25) completed four sessions (two with the audio-visual display on, two with the audio-visual display off), each session consisting of 22 intersections at which the experimenter approached from the left or right and either maintained speed or slowed down. Compared to driving with the display off, the audio-visual display resulted in enhanced traffic efficiency (i.e., greater mean speed, less coasting) while not compromising safety (i.e., the time gap between the two vehicles was equivalent). A post-experiment questionnaire showed that the beeps were regarded as more useful than the lights. It is argued that the audio-visual display is a promising means of supporting drivers until fully automated driving is technically feasible. Copyright © 2016. Published by Elsevier Ltd.

  14. Effects of audio-visual aids on foreign language test anxiety, reading and listening comprehension, and retention in EFL learners.

    Science.gov (United States)

    Lee, Shu-Ping; Lee, Shin-Da; Liao, Yuan-Lin; Wang, An-Chi

    2015-04-01

    This study examined the effects of audio-visual aids on anxiety, comprehension test scores, and retention in reading and listening to short stories in English as a Foreign Language (EFL) classrooms. Reading and listening tests, general and test anxiety, and retention were measured in English-major college students in an experimental group with audio-visual aids (n=83) and a control group without audio-visual aids (n=94) with similar general English proficiency. Lower reading test anxiety, unchanged reading comprehension scores, and better reading short-term and long-term retention after four weeks were evident in the audiovisual group relative to the control group. In addition, lower listening test anxiety, higher listening comprehension scores, and unchanged short-term and long-term retention were found in the audiovisual group relative to the control group after the intervention. Audio-visual aids may help to reduce EFL learners' listening test anxiety and enhance their listening comprehension scores without facilitating retention of such materials. Although audio-visual aids did not increase reading comprehension scores, they helped reduce EFL learners' reading test anxiety and facilitated retention of reading materials.

  15. The effect of the lecture discussion teaching method with and without audio-visual augmentation on immediate and retention learning.

    Science.gov (United States)

    Andrusyszyn, M A

    1990-06-01

    This study determined whether students taught using the lecture-discussion method augmented with audio-visuals would achieve a higher mean score on an immediate post-test and delayed retention test than students presented with a lecture-discussion without audio-visuals. A convenience sample of 52 students divided into two groups voluntarily participated in the quasi-experiment. Two teaching sessions averaging 90 minutes in length were taught by the researcher. Learning and retention were measured by a 10-item multiple choice test with content validity. Immediate learning was measured with a post-test administered immediately following each of the teaching sessions. Delayed learning was measured with a retention test administered 25.5 days following the teaching sessions. Group data was analysed using an independent one tailed t-test for mean scores. Students attending the lecture-discussion with audio-visual augmentation did not achieve significantly higher mean scores on the two tests than the non-augmented group (p less than or equal to 0.05). Analysis using a paired t-test revealed that the difference in scores between the post-test and retention test for the group without audio-visual augmentation was significant (t = 2.31; p less than 0.05). Delayed retention appears to have been influenced by the use of audio-visuals. Nurse educators need to consider ways in which the lecture-discussion may be enhanced to maximise student learning and retention.

  16. Efektivitas Layanan Informasi dengan Menggunakan Media Audio Visual dalam Meningkatkan Sikap Siswa terhadap Kedisiplinan Sekolah

    Directory of Open Access Journals (Sweden)

    Nory Natalia

    2015-10-01

    Full Text Available The research based on phenomenon where students has low attitude in school discipline, such as many students who break the rules in their the absence, the uniform and learning activity, where it can be effect on learning quality and quantity. Guidance and counseling is to improve student’s attitude on school discipline, one of them is information services. The research purpose to test effectiveness of information services with using the audio visual media to improve students attitude on school discipline. This research use quantitative method. Type of this research is Quasi Experiment with Non Equivalent Control Group Design. The population are students at SMP Muhammadiyah Padang Panjang and sample selected with using purposive sampling. The instrument is questionnaire with Likert Scale and tested for validity and reliability. The validity test used Product Moment Correlation with mean correlation coefficient 0.642 and the reliability test used Cronbach's Alpha with r 0.965. Then analysis technique using Wilcoxon Signed Ranks Test and Kolmogorov Smirnov 2 Samples Independent using SPSS 20. The results showed that information service with using the audio visual media effective to improve students attitude on school discipline.

  17. Designing Promotion Strategy of Malang Raya’s Tourism Destination Branding through Audio Visual Media

    Directory of Open Access Journals (Sweden)

    Chanira Nuansa

    2014-04-01

    Full Text Available This study examines the suitability concept of destination branding with existing models of Malang tourism promotion. This research is qualitative by taking the data directly in the form of existing promotional models of Malang, namely: information portal sites, blogs, social networking, and video via the Internet. This study used SWOT analysis to find strengths, weaknesses, opportunities, and threats on existing models of the tourism promotion. The data is analyzed based on destination branding’s concept indicators. Results of analysis are used as a basis in designing solutions for Malang tourism promotion through a new integrated tourism advertising model. Through the analysis we found that video is the most suitable media that used to promote Malang tourism in the form of advertisements. Videos are able to show the objectivity of the fact that intact better through audio-visual form, making it easier to associate the viewer thoughts on the phenomenon of destination. Moreover, video creation of Malang tourism as well as conceptualized ad is still rare. This is an opportunity, because later models of audio-visual advertisements made of this study is expected to be an example for concerned parties to conceptualize the next Malang tourism advertising.Keywords: Advertise, SWOT Analysis, Malang City, tourism promotion

  18. GRAPE - GIS Repetition Using Audio-Visual Repetition Units and its Leanring Effectiveness

    Science.gov (United States)

    Niederhuber, M.; Brugger, S.

    2011-09-01

    A new audio-visual learning medium has been developed at the Department of Environmental Sciences at ETH Zurich (Switzerland), for use in geographical information sciences (GIS) courses. This new medium, presented in the form of Repetition Units, allows students to review and consolidate the most important learning concepts on an individual basis. The new material consists of: a) a short enhanced podcast (recorded and spoken slide show) with a maximum duration of 5 minutes, which focuses on only one important aspect of a lecture's theme; b) one or two relevant exercises, covering different cognitive levels of learning, with a maximum duration of 10 minutes; and c), solutions for the exercises. During a pilot phase in 2010, six Repetition Units were produced by the lecturers. Twenty more Repetition Units will be produced by our students during the fall semester of 2011 and 2012. The project is accompanied by a 5-year study (2009 - 2013) that investigates learning success using the new material, focussing on the question, whether or not the new material help to consolidate and refresh basic GIS knowledge. It will be analysed based on longitudinal studies. Initial results indicate that the new medium helps to refresh knowledge as the test groups scored higher than the control group. These results are encouraging and suggest that the new material with its combination of short audio-visual podcasts and relevant exercises help to consolidate students' knowledge.

  19. Finding the Correspondence of Audio-Visual Events by Object Manipulation

    Science.gov (United States)

    Nishibori, Kento; Takeuchi, Yoshinori; Matsumoto, Tetsuya; Kudo, Hiroaki; Ohnishi, Noboru

    A human being understands the objects in the environment by integrating information obtained by the senses of sight, hearing and touch. In this integration, active manipulation of objects plays an important role. We propose a method for finding the correspondence of audio-visual events by manipulating an object. The method uses the general grouping rules in Gestalt psychology, i.e. “simultaneity” and “similarity” among motion command, sound onsets and motion of the object in images. In experiments, we used a microphone, a camera, and a robot which has a hand manipulator. The robot grasps an object like a bell and shakes it or grasps an object like a stick and beat a drum in a periodic, or non-periodic motion. Then the object emits periodical/non-periodical events. To create more realistic scenario, we put other event source (a metronome) in the environment. As a result, we had a success rate of 73.8 percent in finding the correspondence between audio-visual events (afferent signal) which are relating to robot motion (efferent signal).

  20. The Improvement of Students’ Leadership Ethic in Studying History by Using Baratayuda Audio Visual Media

    Directory of Open Access Journals (Sweden)

    Wendhy Rachmadhany

    2018-04-01

    Full Text Available The purpose of this research is to know the improvement of students’ leadership ethic in studying History after the implementation of Baratayuda Audio Visual Media. The population of this research is XI-Social Science-1 Class of SMAN 1 Pare, Kediri Regency, in academic year 2016/2017, consisted of 39 students. This Classroom Action Research (CAR is arranged by Pre-test, Cycle-1 and Cycle-2 which consisted by some steps, such like; planning, implementation, observation, and reflection. Collecting the data is by using questionnaire of leadership ethic, interview, and documentation. The method of data analysis in this research is descriptive analysis by comparing the improvement from one cycle to another. The result of the research is showing that: There is an improvement of leadership ethic in studying History after the implementation of Baratayuda Audio Visual media. It is shown by the results as follows; Pre-test indicates that the passing score is about 17, 95%. On Cycle-1 indicates 46, 1% and on Cycle-2 indicates a significant improvement about 71, 83%.

  1. Speech segregation based-on binaural cue: interaural time difference (itd) and interaural level difference (ild)

    Science.gov (United States)

    Nur Farid, Mifta; Arifianto, Dhany

    2016-11-01

    A person who is suffering from hearing loss can be helped by using hearing aids and the most optimal performance of hearing aids are binaural hearing aids because it has similarities to human auditory system. In a conversation at a cocktail party, a person can focus on a single conversation even though the background sound and other people conversation is quite loud. This phenomenon is known as the cocktail party effect. In an early study, has been explained that binaural hearing have an important contribution to the cocktail party effect. So in this study, will be performed separation on the input binaural sound with 2 microphone sensors of two sound sources based on both the binaural cue, interaural time difference (ITD) and interaural level difference (ILD) using binary mask. To estimate value of ITD, is used cross-correlation method which the value of ITD represented as time delay of peak shifting at time-frequency unit. Binary mask is estimated based on pattern of ITD and ILD to relative strength of target that computed statistically using probability density estimation. Results of sound source separation performing well with the value of speech intelligibility using the percent correct word by 86% and 3 dB by SNR.

  2. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Georgios Mantokoudis

    Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  3. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Science.gov (United States)

    Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D; Senn, Pascal

    2013-01-01

    To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0-500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Higher frame rate (>7 fps), higher camera resolution (>640 × 480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  4. Functional load of fundamental frequency in the native language predicts learning and use of these cues in second-language speech segmentation

    NARCIS (Netherlands)

    Tremblay, A.; Broersma, M.; Coughlin, C.E.; Wagner, M.A.

    2016-01-01

    This study investigates whether second-language (L2) learners make greater use of prosodic cues to word boundaries if these cues have a higher functional load in the native language (L1). It examines the use of fundamental-frequency (F0) rise in the segmentation of French speech by English- and

  5. PERANCANGAN MEDIA PEMBELAJARAN BERBASIS AUDIO VISUAL UNTUK MATA KULIAH TIPOGRAFI PADA PROGRAM STUDI DESAIN KOMUNIKASI VISUAL UNIVERSITAS DIAN NUSWANTORO

    Directory of Open Access Journals (Sweden)

    Puri Sulistiyawati

    2017-02-01

    Full Text Available Abstrak Tipografi merupakan salah satu mata kuliah pada bidang desain komunikasi visual yang mengutamakan aspek visual. Namun berdasarkan hasil observasi diketahui bahwa media pembelajaran yang selama ini digunakan kurang efektif karena kurangnya pemanfaatan teknologi informasi, sehingga mahasiswa kurang maksimal dalam memahami materi kuliah yang disampaikan oleh pengajar. Perkembangan teknologi informasi saat ini banyak memberikan dampak positif bagi kemajuan bidang pendidikan diantaranya dapat digunakan untuk mendukung media dalam proses pembelajaran. Tujuan penelitian ini adalah merancang media pembelajaran untuk mata kuliah tipografi dengan memanfaatkan teknologi informasi yaitu media audio visual. Metode yang digunakan dalam penelitian ini adalah Research and Development dengan pendekatan model ADDIE (Analysis, Design, Development, Implementation, Evaluation. Dengan diciptakannya media pembelajaran audio visual ini diharapkan proses pembelajaran mata kuliah Tipografi dapat lebih efektif dan materi kuliah lebih mudah dipahami oleh mahasiswa. Kata Kunci : audio visual, media pembelajaran, tipografi Abstract Typography is one of the subjects in the field of visual communication design that prioritizes the visual aspect. However, based on the observation note that the media has been used less effective because the lack of use information technology, so students can't understand the course material that explained by lecturers. Today, the development of information technology is being positive impact for the advancement of education which can be used to support the media in the learning process. The purpose of this research is to design learning media for the course of typography by utilizing information technology, called audio-visual media.  The method that used in this research is Research and Development with ADDIE model (Analysis, Design, Development, Implementation, Evaluation. With the creation of audio-visual learning media is expected

  6. ‘Chronovist’ conceptualisation method: exploring new approaches to structuring narrative in interactive immersive audio/visual media.

    OpenAIRE

    Tchernakova, A. E.

    2015-01-01

    This research investigates whether the application of the initially literary concepts of Bakhtin’s ‘chronotope’ and ‘utterance’ to the field of interactive narrative audio-visual media can lead to the development of new approaches to structuring narratives. By extending Bakhtin’s concepts to the analysis of interactive immersive audio-visual media I analyse interactive immersive cinema as a first-person experience of a chronotope. Further, I propose to approach chronotope as a real physical s...

  7. Development of a Bayesian Estimator for Audio-Visual Integration: A Neurocomputational Study.

    Science.gov (United States)

    Ursino, Mauro; Crisafulli, Andrea; di Pellegrino, Giuseppe; Magosso, Elisa; Cuppini, Cristiano

    2017-01-01

    The brain integrates information from different sensory modalities to generate a coherent and accurate percept of external events. Several experimental studies suggest that this integration follows the principle of Bayesian estimate. However, the neural mechanisms responsible for this behavior, and its development in a multisensory environment, are still insufficiently understood. We recently presented a neural network model of audio-visual integration (Neural Computation, 2017) to investigate how a Bayesian estimator can spontaneously develop from the statistics of external stimuli. Model assumes the presence of two unimodal areas (auditory and visual) topologically organized. Neurons in each area receive an input from the external environment, computed as the inner product of the sensory-specific stimulus and the receptive field synapses, and a cross-modal input from neurons of the other modality. Based on sensory experience, synapses were trained via Hebbian potentiation and a decay term. Aim of this work is to improve the previous model, including a more realistic distribution of visual stimuli: visual stimuli have a higher spatial accuracy at the central azimuthal coordinate and a lower accuracy at the periphery. Moreover, their prior probability is higher at the center, and decreases toward the periphery. Simulations show that, after training, the receptive fields of visual and auditory neurons shrink to reproduce the accuracy of the input (both at the center and at the periphery in the visual case), thus realizing the likelihood estimate of unimodal spatial position. Moreover, the preferred positions of visual neurons contract toward the center, thus encoding the prior probability of the visual input. Finally, a prior probability of the co-occurrence of audio-visual stimuli is encoded in the cross-modal synapses. The model is able to simulate the main properties of a Bayesian estimator and to reproduce behavioral data in all conditions examined. In

  8. Pendidikan Kesehatan Media Audio Visual Lebih Efektif untuk Meningkatkan Pengetahuan Siswa tentang Kesehatan Reproduksi

    Directory of Open Access Journals (Sweden)

    Esa Rara Regina

    2013-04-01

    Full Text Available Background: Statistical data of 2008 in Indonesia showed from 43,3 million adolescents aged 15-24 years of unhealthy behavior, teenage Indonesia from 23 million aged 15-24 years, 83,3% had had sexual intercourse. Negative behavior patterns and risk of adolescent reproductive health will have an impact on their future, so it requires a special education discuss reproductive health. Audiovisual media and the leaflet is a medium that can be used in the delivery of reproductive health education information. Preliminary study of the 5 people from 216 students in grade VII showed that students did not know about health reproductive. Objectives: This study aims to determine the ratio of health ducation via media audio visual and leaflets to students' knowledge about health reproductive adolescent in SMP Negeri 2 Ampel Boyolali. Methods: The method in this study was quasi-experimental (quasi experiment and the use of design Non Equivalent Pre-Post Design. Samples were taken by technique proportional random sampling is a class VII student at SMPl 2 Ampel Boyolali much as 140 people. The instruments of the study questionnaire and test results were analyzed with Wilcoxon Signed Ranks Test and the Kolmogorov- Smirnow. Results: Knowledge of students' prior education health was reproductive adolescent in the SMP Negeri 2 Ampel Boyolali largely lacking as many as 71 students (50,7%. Knowledge students' after the adolescent reproductive health education in SMP 2 Ampel Boyolali well as most of the 72 students (51,4%. The results test Wilcoxon Signed Ranks Test via the media leaflet obtained p-value 0,000 <0,05 there are 55 students to increase knowledge. While the media audio visual obtained through p-value0,000 < 0,05 there are 39 students have increased knowledge. The results test Kolomogorov Smirnov obtained p-value 0,020 < 0,05. Conclusion: There is a difference in education health via media audio visual and leaflets to knowledge students about health

  9. Development of a Bayesian Estimator for Audio-Visual Integration: A Neurocomputational Study

    Directory of Open Access Journals (Sweden)

    Mauro Ursino

    2017-10-01

    Full Text Available The brain integrates information from different sensory modalities to generate a coherent and accurate percept of external events. Several experimental studies suggest that this integration follows the principle of Bayesian estimate. However, the neural mechanisms responsible for this behavior, and its development in a multisensory environment, are still insufficiently understood. We recently presented a neural network model of audio-visual integration (Neural Computation, 2017 to investigate how a Bayesian estimator can spontaneously develop from the statistics of external stimuli. Model assumes the presence of two unimodal areas (auditory and visual topologically organized. Neurons in each area receive an input from the external environment, computed as the inner product of the sensory-specific stimulus and the receptive field synapses, and a cross-modal input from neurons of the other modality. Based on sensory experience, synapses were trained via Hebbian potentiation and a decay term. Aim of this work is to improve the previous model, including a more realistic distribution of visual stimuli: visual stimuli have a higher spatial accuracy at the central azimuthal coordinate and a lower accuracy at the periphery. Moreover, their prior probability is higher at the center, and decreases toward the periphery. Simulations show that, after training, the receptive fields of visual and auditory neurons shrink to reproduce the accuracy of the input (both at the center and at the periphery in the visual case, thus realizing the likelihood estimate of unimodal spatial position. Moreover, the preferred positions of visual neurons contract toward the center, thus encoding the prior probability of the visual input. Finally, a prior probability of the co-occurrence of audio-visual stimuli is encoded in the cross-modal synapses. The model is able to simulate the main properties of a Bayesian estimator and to reproduce behavioral data in all conditions

  10. Speech perception in medico-legal assessment of hearing disabilities.

    Science.gov (United States)

    Pedersen, Ellen Raben; Juhl, Peter Møller; Wetke, Randi; Andersen, Ture Dammann

    2016-10-01

    Examination of Danish data for medico-legal compensations regarding hearing disabilities. The study purposes are: (1) to investigate whether discrimination scores (DSs) relate to patients' subjective experience of their hearing and communication ability (the latter referring to audio-visual perception), (2) to compare DSs from different discrimination tests (auditory/audio-visual perception and without/with noise), and (3) to relate different handicap measures in the scaling used for compensation purposes in Denmark. Data from a 15 year period (1999-2014) were collected and analysed. The data set includes 466 patients, from which 50 were omitted due to suspicion of having exaggerated their hearing disabilities. The DSs relate well to the patients' subjective experience of their speech perception ability. By comparing DSs for different test setups it was found that adding noise entails a relatively more difficult listening condition than removing visual cues. The hearing and communication handicap degrees were found to agree, whereas the measured handicap degrees tended to be higher than the self-assessed handicap degrees. The DSs can be used to assess patients' hearing and communication abilities. The difference in the obtained handicap degrees emphasizes the importance of collecting self-assessed as well as measured handicap degrees.

  11. Dynamics of audio-visual interactions in the guinea pig brain: an electrophysiological study.

    Science.gov (United States)

    Demirtas, Serdar; Goksoy, Cuneyt

    2003-11-14

    Audio-visual interactions and their specifications, evaluated by bioelectrical activities, in guinea pigs are presented in this study. The difference potential, as the evidence of an interaction, was calculated by subtracting the sum of averaged potentials recorded in visual and auditory events from the averaged potential recorded in an event where two stimuli combined in the same sweep. Dynamic investigations have shown an interaction when auditory stimulus is applied 24 ms before and 201 ms after visual stimulation. Latency between the difference potential and auditory stimulus was stable. Directional investigations have shown that the interaction is not observed when auditory and/or visual stimulation is used ipsilaterally, according to the recording side.

  12. The Effect of Hand Gesture Cues Within the Treatment of /r/ for a College-Aged Adult With Persisting Childhood Apraxia of Speech.

    Science.gov (United States)

    Rusiewicz, Heather Leavy; Rivera, Jessica Lynch

    2017-11-08

    Despite the widespread use of hand movements as visual and kinesthetic cues to facilitate accurate speech produced by individuals with speech sound disorders (SSDs), no experimental investigation of gestural cues that mimic that spatiotemporal parameters of speech sounds (e.g., holding fingers and thumb together and "popping" them to cue /p/) currently exists. The purpose of this study was to examine the effectiveness of manual mimicry cues within a multisensory intervention of persisting childhood apraxia of speech (CAS). A single-subject ABAB withdrawal design was implemented to assess the accuracy of vowel + /r/ combinations produced by a 21-year-old woman with persisting CAS. The effect of manual mimicry gestures paired with multisensory therapy consisting of verbal instructions and visual modeling was assessed via clinician and naïve listener ratings of target sound accuracy. According to the perceptual ratings of the treating clinician and 28 naïve listeners, the participant demonstrated improved speech sound accuracy as a function of the manual mimicry/multisensory therapy. These data offer preliminary support for the incorporation of gestural cues in therapy for CAS and other SSDs. The need for continued research on the interaction of speech and manual movements for individuals with SSDs is discussed.

  13. Discrimination and streaming of speech sounds based on differences in interaural and spectral cues.

    Science.gov (United States)

    David, Marion; Lavandier, Mathieu; Grimault, Nicolas; Oxenham, Andrew J

    2017-09-01

    Differences in spatial cues, including interaural time differences (ITDs), interaural level differences (ILDs) and spectral cues, can lead to stream segregation of alternating noise bursts. It is unknown how effective such cues are for streaming sounds with realistic spectro-temporal variations. In particular, it is not known whether the high-frequency spectral cues associated with elevation remain sufficiently robust under such conditions. To answer these questions, sequences of consonant-vowel tokens were generated and filtered by non-individualized head-related transfer functions to simulate the cues associated with different positions in the horizontal and median planes. A discrimination task showed that listeners could discriminate changes in interaural cues both when the stimulus remained constant and when it varied between presentations. However, discrimination of changes in spectral cues was much poorer in the presence of stimulus variability. A streaming task, based on the detection of repeated syllables in the presence of interfering syllables, revealed that listeners can use both interaural and spectral cues to segregate alternating syllable sequences, despite the large spectro-temporal differences between stimuli. However, only the full complement of spatial cues (ILDs, ITDs, and spectral cues) resulted in obligatory streaming in a task that encouraged listeners to integrate the tokens into a single stream.

  14. Undifferentiated Facial Electromyography Responses to Dynamic, Audio-Visual Emotion Displays in Individuals with Autism Spectrum Disorders

    Science.gov (United States)

    Rozga, Agata; King, Tricia Z.; Vuduc, Richard W.; Robins, Diana L.

    2013-01-01

    We examined facial electromyography (fEMG) activity to dynamic, audio-visual emotional displays in individuals with autism spectrum disorders (ASD) and typically developing (TD) individuals. Participants viewed clips of happy, angry, and fearful displays that contained both facial expression and affective prosody while surface electrodes measured…

  15. Concurrent audio-visual feedback for supporting drivers at intersections : a study using two linked driving simulators.

    NARCIS (Netherlands)

    Houtenbos, M. Winter, J.C.F. de Hale, A.R. Wieringa, P.A. & Hagenzieker, M.P.

    2016-01-01

    A large portion of road traffic crashes occur at intersections for the reason that drivers lack necessary visual information. This research examined the effects of an audio-visual display that provides real-time sonification and visualization of the speed and direction of another car approaching the

  16. Integration of Audio Visual Multimedia for Special Education Pre-Service Teachers' Self Reflections in Developing Teaching Competencies

    Science.gov (United States)

    Sediyani, Tri; Yufiarti; Hadi, Eko

    2017-01-01

    This study aims to develop a model of learning by integrating multimedia and audio-visual self-reflective learners. This multimedia was developed as a tool for prospective teachers as learners in the education of children with special needs to reflect on their teaching competencies before entering the world of education. Research methods to…

  17. Speech perception in rats: use of duration and rise time cues in labeling of affricate/fricative sounds.

    Science.gov (United States)

    Reed, Phil; Howell, Peter; Sackin, Stevie; Pizzimenti, Lisa; Rosen, Stuart

    2003-01-01

    The voiceless affricate/fricative contrast has played an important role in developing auditory theories of speech perception. This type of theory draws some of its support from experimental data on animals. However, nothing is known about differential responding of affricate/fricative continua by animals. In the current study, the ability of hooded rats to "label" an affricate/fricative continuum was tested. Transfer (without retraining) to analogous nonspeech continua was also tested. The nonspeech continua were chosen so that if transfer occurred, it would indicate whether the animals had learned to use rise time or duration cues to differentiate affricates from fricatives. The data from 9 of 10 rats indicated that rats can discriminate between these cues and do so in a similar manner to human subjects. The data from 9 of 10 rats also demonstrated that the rise time of the stimulus was the basis of the discrimination; the remaining rat appeared to use duration. PMID:14674729

  18. The encoding of vowels and temporal speech cues in the auditory cortex of professional musicians: an EEG study.

    Science.gov (United States)

    Kühnis, Jürg; Elmer, Stefan; Meyer, Martin; Jäncke, Lutz

    2013-07-01

    Here, we applied a multi-feature mismatch negativity (MMN) paradigm in order to systematically investigate the neuronal representation of vowels and temporally manipulated CV syllables in a homogeneous sample of string players and non-musicians. Based on previous work indicating an increased sensitivity of the musicians' auditory system, we expected to find that musically trained subjects will elicit increased MMN amplitudes in response to temporal variations in CV syllables, namely voice-onset time (VOT) and duration. In addition, since different vowels are principally distinguished by means of frequency information and musicians are superior in extracting tonal (and thus frequency) information from an acoustic stream, we also expected to provide evidence for an increased auditory representation of vowels in the experts. In line with our hypothesis, we could show that musicians are not only advantaged in the pre-attentive encoding of temporal speech cues, but most notably also in processing vowels. Additional "just noticeable difference" measurements suggested that the musicians' perceptual advantage in encoding speech sounds was more likely driven by the generic constitutional properties of a highly trained auditory system, rather than by its specialisation for speech representations per se. These results shed light on the origin of the often reported advantage of musicians in processing a variety of speech sounds. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Effectiveness of Two Topical Anaesthetic Agents used along with Audio Visual Aids in Paediatric Dental Patients.

    Science.gov (United States)

    Agarwal, Nidhi; Dhawan, Jayata; Kumar, Dipanshu; Anand, Ashish; Tangri, Karan

    2017-01-01

    Topical anaesthetic agents enable pain free intraoral procedures, symptomatic pain relief for toothache, superficial mucosal lesions and pain related to post extraction time. Most common anxiety provoking and fearful experience for children in dental operatory is administration of local anaesthesia because on seeing the needle, children usually become uncooperative. One of recent trend of behaviour management technique is using non-aversive techniques out of which audiovisual distraction has emerged as a very successful technique for managing children in dental settings. Audio visual distraction could decrease the procedure related anxiety of patients undergoing dental treatment and can be very relaxing for highly anxious patients. The aim of the present study was to compare the efficacy of topical anaesthetics EMLA (Eutectic Mixture of Local Anaesthetics) cream and benzocaine (20%) gel in reducing the pain during the needle insertion with and without the use of Audio Visual (AV) aids. The study was conducted on 120 children, the age range of 3-14 years attending the outpatient department for their treatment. EMLA and benzocaine gel (20%) were assessed for their effectiveness in reducing the pain on needle insertion during local anaesthesia administration. Based on the inclusion and the exclusion criteria, children requiring local anaesthesia for the dental treatment were randomly divided into four equal groups of 30 children based upon whether AV aids were used or not. AV aids were given using Sony Vaio laptop with earphones with nursery rhymes and cartoon movies DVD. The pain assessment was done by using the Visual Analogue Scale (VAS) scale and measurement of the physiological responses of pulse rate and oxygen saturation were done by pulse oximeter. There was a statistically significant difference in the mean pain score, pulse rate and mean oxygen saturation rate when it was compared between the four groups. EMLA with AV aids was found to be a better topical

  20. Probabilistic Phonotactics as a Cue for Recognizing Spoken Cantonese Words in Speech

    Science.gov (United States)

    Yip, Michael C. W.

    2017-01-01

    Previous experimental psycholinguistic studies suggested that the probabilistic phonotactics information might likely to hint the locations of word boundaries in continuous speech and hence posed an interesting solution to the empirical question on how we recognize/segment individual spoken word in speech. We investigated this issue by using…

  1. A system to simulate and reproduce audio-visual environments for spatial hearing research.

    Science.gov (United States)

    Seeber, Bernhard U; Kerber, Stefan; Hafter, Ervin R

    2010-02-01

    The article reports the experience gained from two implementations of the "Simulated Open-Field Environment" (SOFE), a setup that allows sounds to be played at calibrated levels over a wide frequency range from multiple loudspeakers in an anechoic chamber. Playing sounds from loudspeakers in the free-field has the advantage that each participant listens with their own ears, and individual characteristics of the ears are captured in the sound they hear. This makes an easy and accurate comparison between various listeners with and without hearing devices possible. The SOFE uses custom calibration software to assure individual equalization of each loudspeaker. Room simulation software creates the spatio-temporal reflection pattern of sound sources in rooms which is played via the SOFE loudspeakers. The sound playback system is complemented by a video projection facility which can be used to collect or give feedback or to study auditory-visual interaction. The article discusses acoustical and technical requirements for accurate sound playback against the specific needs in hearing research. An introduction to software concepts is given which allow easy, high-level control of the setup and thus fast experimental development, turning the SOFE into a "Swiss army knife" tool for auditory, spatial hearing and audio-visual research. Crown Copyright 2009. Published by Elsevier B.V. All rights reserved.

  2. PENERAPAN STRATEGI LSQ BERBANTUAN MEDIA AUDIO VISUAL UNTUK MENINGKATKAN HASIL BELAJAR EKONOMI

    Directory of Open Access Journals (Sweden)

    Sholikhah Fakhratus

    2012-10-01

    Full Text Available The process which is being a constraint at High School 1 Kroya is the activities of� students in the learning process still lacking, the students still feel scared and ashamed to ask if� there isn�t� an� encouragement from the teacher, the teacher is still lack in the development of� teach- ing variation. The above is caused of�� needing to the use of� appropriate and varied methods and media as a tool in teaching and learning, one of� the alternatives by applying the learning strategies Learning Start With A Question (LSQ assisted by audio visual media. The design of� this study is an action research class with two cycles, each cycle includes planning, imple- mentation, observation and reflection. The results on the cycle I shows the average of�� student learning outcomes is 71,5 with classical completeness 65.7%, 67.71% of� the student activity in the high category, teacher�s activity in the learning is 67.5% or high category. For the result on the cycle II showed an average of� student learning outcomes 78,6 with classical complete- ness 85.7%, 76.57% of� student activities or activities of� the students in the high category, for teachers� activity is 87.5% with very high criteria.

  3. Standard operating procedure for audio visual recording of informed consent: an initiative to facilitate regulatory compliance.

    Science.gov (United States)

    Parikh, P M; Prabhash, K; Govind, K B; Digumarti, R; Pandit, S; Banerjee, I; Biyani, R; Deshmukh, A; Doval, D; Bhattacharyya, G S; Gupta, S

    2014-01-01

    The office of the Drugs Controller General (India) vide order dated 19 th November 2013 has made audio visual (AV) recording of the informed consent mandatory for the conduct of all clinical trials in India. We therefore developed a standard operating procedure (SOP) to ensure that this is performed in compliance with the regulatory requirements, internationally accepted ethical standards and that the recording is stored as well as archived in an appropriate manner. The SOP was developed keeping in mind all relevant orders, regulations, laws and guidelines and have been made available online. Since, we are faced with unique legal and regulatory requirements that are unprecedented globally, this SOP will allow the AV recording of the informed consent to be performed, archived and retrieved to demonstrate ethical, legal and regulatory compliance. We also compared this to the draft guidelines for AV recording dated 9 th January 2014 developed by Central Drugs Standard Control Organization. Our future efforts will include regular testing, feedback and update of the SOP.

  4. Multilevel alterations in the processing of audio-visual emotion expressions in autism spectrum disorders.

    Science.gov (United States)

    Charbonneau, Geneviève; Bertone, Armando; Lepore, Franco; Nassim, Marouane; Lassonde, Maryse; Mottron, Laurent; Collignon, Olivier

    2013-04-01

    The abilities to recognize and integrate emotions from another person's facial and vocal expressions are fundamental cognitive skills involved in the effective regulation of social interactions. Deficits in such abilities have been suggested as a possible source for certain atypical social behaviors manifested by persons with autism spectrum disorders (ASD). In the present study, we assessed the recognition and integration of emotional expressions in ASD using a validated set of ecological stimuli comprised of dynamic visual and auditory (non-verbal) vocal clips. Autistic participants and typically developing controls (TD) were asked to discriminate between clips depicting expressions of disgust and fear presented either visually, auditorily or audio-visually. The group of autistic participants was less efficient to discriminate emotional expressions across all conditions (unimodal and bimodal). Moreover, they necessitated a higher signal-to-noise ratio for the discrimination of visual or auditory presentations of disgust versus fear expressions. These results suggest an altered sensitivity to emotion expressions in this population that is not modality-specific. In addition, the group of autistic participants benefited from exposure to bimodal information to a lesser extent than did the TD group, indicative of a decreased multisensory gain in this population. These results are the first to compellingly demonstrate joint alterations for both the perception and the integration of multisensory emotion expressions in ASD. Copyright © 2013 Elsevier Ltd. All rights reserved.

  5. Online dissection audio-visual resources for human anatomy: Undergraduate medical students' usage and learning outcomes.

    Science.gov (United States)

    Choi-Lundberg, Derek L; Cuellar, William A; Williams, Anne-Marie M

    2016-11-01

    In an attempt to improve undergraduate medical student preparation for and learning from dissection sessions, dissection audio-visual resources (DAVR) were developed. Data from e-learning management systems indicated DAVR were accessed by 28% ± 10 (mean ± SD for nine DAVR across three years) of students prior to the corresponding dissection sessions, representing at most 58% ± 20 of assigned dissectors. Approximately 50% of students accessed all available DAVR by the end of semester, while 10% accessed none. Ninety percent of survey respondents (response rate 58%) generally agreed that DAVR improved their preparation for and learning from dissection when used. Of several learning resources, only DAVR usage had a significant positive correlation (P = 0.002) with feeling prepared for dissection. Results on cadaveric anatomy practical examination questions in year 2 (Y2) and year 3 (Y3) cohorts were 3.9% (P learning outcomes of more students. Anat Sci Educ 9: 545-554. © 2016 American Association of Anatomists. © 2016 American Association of Anatomists.

  6. Speech Misperception: Speaking and Seeing Interfere Differently with Hearing

    OpenAIRE

    Takemi Mochida; Toshitaka Kimura; Sadao Hiroya; Norimichi Kitagawa; Hiroaki Gomi; Tadahisa Kondo

    2013-01-01

    Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phoneme...

  7. [Type V phosphodiesterase inhibitor erection-provoking test with audio-visual sexual stimulation for the diagnosis of erectile dysfunction].

    Science.gov (United States)

    Zhu, Xuan-Wen; Guo, Jun-Ping; Zhang, Feng-Bin; Zhong, Da-Chuan; Fang, Jia-Jie; Li, Fang-Yin

    2008-05-01

    To evaluate the type V phosphodiesterase (PDE-5) inhibitor erection-provoking test with audio-visual sexual stimulation in the diagnosis of erectile dysfunction. A total of 853 out-patients diagnosed with erectile dysfunction were divided into an injury and a non-injury group. After scored on IIEF-5 questionnaires, all the patients received oral administration of PDE-5 inhibitors and, 30 minutes later, audio-visual sexual stimulation. The data on penile erection were recorded with Rigiscan Plus. The patients with mild, moderate and severe ED accounted for 18.8, 31.9 and 49.3% in the injury group, and 50.6, 39.8 and 9.6% in the non-injury group, with statistic differences between the two groups in the mild and severe parts (P erectile dysfunction.

  8. Roar of the Thunder Dragon: The Bhutanese Audio-visual Industry and the Shaping and Representation of Contemporary Culture

    OpenAIRE

    Dendup, Tshewang

    2007-01-01

    The Bhutanese audio-visual industry plays a critical and important role in the creation of cultural products, which are consumed by the masses. The industry's significant role in the preservation and promotion of culture is worthy of state support. Although comprehensive data is not available on the industry, available data and anecdotal evidence prove that the industry is growing and playing its own role in shaping and representing contemporary culture in Bhutan.

  9. Improving Students' Speaking Skill Through Audio Visual Media at 4 Th Grade of Labschool Elementary School East Jakarta

    OpenAIRE

    Herlina

    2014-01-01

    This research aims at helping students to improve their speaking skill by using audio visual media. The subject of the research was fourth grade students of Labschool Elementary School. This research was conducted at Labschool Elementary School Rawamangun East Jakarta at 2nd semester of 2013 with 28 students as participants. The method conducted in this research was classroom action research by Kemmis and McTaggart. The research was carried out in two cycles. The research method is conducted...

  10. Evaluation of an Audio-Visual Novela to Improve Beliefs, Attitudes and Knowledge toward Dementia: A Mixed-Methods Approach.

    Science.gov (United States)

    Grigsby, Timothy J; Unger, Jennifer B; Molina, Gregory B; Baron, Mel

    2017-01-01

    Dementia is a clinical syndrome characterized by progressive degeneration in cognitive ability that limits the capacity for independent living. Interventions are needed to target the medical, social, psychological, and knowledge needs of caregivers and patients. This study used a mixed methods approach to evaluate the effectiveness of a dementia novela presented in an audio-visual format in improving dementia attitudes, beliefs and knowledge. Adults from Los Angeles (N = 42, 83% female, 90% Hispanic/Latino, mean age = 42.2 years, 41.5% with less than a high school education) viewed an audio-visual novela on dementia. Participants completed surveys immediately before and after viewing the material. The novela produced significant improvements in overall knowledge (t(41) = -9.79, p novela can be useful for improving attitudes and knowledge about dementia, but further work is needed to investigate the relation with health disparities in screening and treatment behaviors. Audio visual novelas are an innovative format for health education and change attitudes and knowledge about dementia.

  11. Synchronized audio-visual transients drive efficient visual search for motion-in-depth.

    Directory of Open Access Journals (Sweden)

    Marina Zannoli

    Full Text Available In natural audio-visual environments, a change in depth is usually correlated with a change in loudness. In the present study, we investigated whether correlating changes in disparity and loudness would provide a functional advantage in binding disparity and sound amplitude in a visual search paradigm. To test this hypothesis, we used a method similar to that used by van der Burg et al. to show that non-spatial transient (square-wave modulations of loudness can drastically improve spatial visual search for a correlated luminance modulation. We used dynamic random-dot stereogram displays to produce pure disparity modulations. Target and distractors were small disparity-defined squares (either 6 or 10 in total. Each square moved back and forth in depth in front of the background plane at different phases. The target's depth modulation was synchronized with an amplitude-modulated auditory tone. Visual and auditory modulations were always congruent (both sine-wave or square-wave. In a speeded search task, five observers were asked to identify the target as quickly as possible. Results show a significant improvement in visual search times in the square-wave condition compared to the sine condition, suggesting that transient auditory information can efficiently drive visual search in the disparity domain. In a second experiment, participants performed the same task in the absence of sound and showed a clear set-size effect in both modulation conditions. In a third experiment, we correlated the sound with a distractor instead of the target. This produced longer search times, indicating that the correlation is not easily ignored.

  12. THE EFFECT OF USING AUDIO-VISUAL AIDS VERSUS PICTURES ON FOREIGN LANGUAGE VOCABULARY LEARNING OF INDIVIDUALS WITH MILD INTELLECTUAL DISABILITY

    National Research Council Canada - National Science Library

    Zahra Sadat Noori; Mohammad Taghi Farvardin

    2016-01-01

    ... as is being discussed (7). [...]using audio-visual aids for practice and drill in the classrooms of students with intellectual disability has advantages such as immediacy of feedback to the learners, the novelty effect of using...

  13. THE EFFECT OF USING AUDIO-VISUAL AIDS VERSUS PICTURES ON FOREIGN LANGUAGE VOCABULARY LEARNING OF INDIVIDUALS WITH MILD INTELLECTUAL DISABILITY

    National Research Council Canada - National Science Library

    Zahra Sadat Noori; Mohammad Taghi Farvardin

    2016-01-01

    ... as is being discussed (7). [...]using audio-visual aids for practice and drill in the classrooms of students with intellectual disability has advantages such as immediacy of feedback to the learners...

  14. N1 enhancement in synesthesia during visual and audio-visual perception in semantic cross-modal conflict situations: an ERP study

    National Research Council Canada - National Science Library

    Sinke, Christopher; Neufeld, Janina; Wiswede, Daniel; Emrich, Hinderk M; Bleich, Stefan; Münte, Thomas F; Szycik, Gregor R

    2014-01-01

    ... perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and in animated...

  15. Visual-Auditory Integration during Speech Imitation in Autism

    Science.gov (United States)

    Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

    2004-01-01

    Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…

  16. Bilingualism and Children's Use of Paralinguistic Cues to Interpret Emotion in Speech

    Science.gov (United States)

    Yow, W. Quin; Markman, Ellen M.

    2011-01-01

    Preschoolers tend to rely on what speakers say rather than how they sound when interpreting a speaker's emotion while adults rely instead on tone of voice. However, children who have a greater need to attend to speakers' communicative requirements, such as bilingual children, may be more adept in using paralinguistic cues (e.g. tone of voice) when…

  17. Robust speech dereverberation with a neural network-based post-filter that exploits multi-conditional training of binaural cues

    DEFF Research Database (Denmark)

    May, Tobias

    2018-01-01

    This study presents an algorithm for binaural speech dereverberation based on the supervised learning of short-term binaural cues. The proposed system combined a delay-and-sum beamformer (DSB) with a neural network-based post-filter that attenuated reverberant components in individual time...

  18. Working Memory and Speech Recognition in Noise under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type among Adults with Hearing Loss

    Science.gov (United States)

    Miller, Christi W.; Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly

    2017-01-01

    Purpose: This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method: Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2…

  19. Low-frequency fine-structure cues allow for the online use of lexical stress during spoken-word recognition in spectrally degraded speech.

    Science.gov (United States)

    Kong, Ying-Yee; Jesse, Alexandra

    2017-01-01

    English listeners use suprasegmental cues to lexical stress during spoken-word recognition. Prosodic cues are, however, less salient in spectrally degraded speech, as provided by cochlear implants. The present study examined how spectral degradation with and without low-frequency fine-structure information affects normal-hearing listeners' ability to benefit from suprasegmental cues to lexical stress in online spoken-word recognition. To simulate electric hearing, an eight-channel vocoder spectrally degraded the stimuli while preserving temporal envelope information. Additional lowpass-filtered speech was presented to the opposite ear to simulate bimodal hearing. Using a visual world paradigm, listeners' eye fixations to four printed words (target, competitor, two distractors) were tracked, while hearing a word. The target and competitor overlapped segmentally in their first two syllables but mismatched suprasegmentally in their first syllables, as the initial syllable received primary stress in one word and secondary stress in the other (e.g., "'admiral," "'admi'ration"). In the vocoder-only condition, listeners were unable to use lexical stress to recognize targets before segmental information disambiguated them from competitors. With additional lowpass-filtered speech, however, listeners efficiently processed prosodic information to speed up online word recognition. Low-frequency fine-structure cues in simulated bimodal hearing allowed listeners to benefit from suprasegmental cues to lexical stress during word recognition.

  20. Classification of cooperative and competitive overlaps in speech using cues from the context, overlapper, and overlappee

    NARCIS (Netherlands)

    Truong, Khiet Phuong

    One of the major properties of overlapping speech is that it can be perceived as competitive or cooperative. For the development of real-time spoken dialog systems and the analysis of affective and social human behavior in conversations, it is important to (automatically) distinguish between these

  1. Universal and language-specific sublexical cues in speech perception: a novel electroencephalography-lesion approach.

    Science.gov (United States)

    Obrig, Hellmuth; Mentzel, Julia; Rossi, Sonja

    2016-06-01

    SEE CAPPA DOI101093/BRAIN/AWW090 FOR A SCIENTIFIC COMMENTARY ON THIS ARTICLE  : The phonological structure of speech supports the highly automatic mapping of sound to meaning. While it is uncontroversial that phonotactic knowledge acts upon lexical access, it is unclear at what stage these combinatorial rules, governing phonological well-formedness in a given language, shape speech comprehension. Moreover few studies have investigated the neuronal network affording this important step in speech comprehension. Therefore we asked 70 participants-half of whom suffered from a chronic left hemispheric lesion-to listen to 252 different monosyllabic pseudowords. The material models universal preferences of phonotactic well-formedness by including naturally spoken pseudowords and digitally reversed exemplars. The latter partially violate phonological structure of all human speech and are rich in universally dispreferred phoneme sequences while preserving basic auditory parameters. Language-specific constraints were modelled in that half of the naturally spoken pseudowords complied with the phonotactics of the native language of the monolingual participants (German) while the other half did not. To ensure universal well-formedness and naturalness, the latter stimuli comply with Slovak phonotactics and all stimuli were produced by an early bilingual speaker. To maximally attenuate lexico-semantic influences, transparent pseudowords were avoided and participants had to detect immediate repetitions, a task orthogonal to the contrasts of interest. The results show that phonological 'well-formedness' modulates implicit processing of speech at different levels: universally dispreferred phonological structure elicits early, medium and late latency differences in the evoked potential. On the contrary, the language-specific phonotactic contrast selectively modulates a medium latency component of the event-related potentials around 400 ms. Using a novel event-related potential

  2. The Relative Weight of Statistical and Prosodic Cues in Speech Segmentation: A Matter of Language-(Independency and of Signal Quality

    Directory of Open Access Journals (Sweden)

    Tânia Fernandes

    2011-06-01

    Full Text Available In an artificial language setting, we investigated the relative weight of statistical cues (transitional probabilities, TPs in comparison to two prosodic cues, Intonational Phrases (IPs, a language-independent cue and lexical stress (a language-dependent cue. The signal quality was also manipulated through white-noise superimposition. Both IPs and TPs were highly resilient to physical degradation of the signal. An overall performance gain was found when these cues were congruent, but when they were incongruent IPs prevailed over TPs (Experiment 1. After ensuring that duration is treated by Portuguese listeners as a correlate of lexical stress (Experiment 2A, the role of lexical stress and TPs in segmentation was evaluated in Experiment 2B. Lexical stress effects only emerged with physically degraded signal, constraining the extraction of TP-words to the ones supported by both TPs and IPs. Speech segmentation does not seem to be the product of one preponderant cue acting as a filter of the outputs of another, lower-weighted cue. Instead, it mainly depends on the listening conditions, and the weighting of the cues according to their role in a particular language.

  3. The influence of infant-directed speech on 12-month-olds' intersensory perception of fluent speech.

    Science.gov (United States)

    Kubicek, Claudia; Gervain, Judit; Hillairet de Boisferon, Anne; Pascalis, Olivier; Lœvenbruck, Hélène; Schwarzer, Gudrun

    2014-11-01

    The present study examined whether infant-directed (ID) speech facilitates intersensory matching of audio-visual fluent speech in 12-month-old infants. German-learning infants' audio-visual matching ability of German and French fluent speech was assessed by using a variant of the intermodal matching procedure, with auditory and visual speech information presented sequentially. In Experiment 1, the sentences were spoken in an adult-directed (AD) manner. Results showed that 12-month-old infants did not exhibit a matching performance for the native, nor for the non-native language. However, Experiment 2 revealed that when ID speech stimuli were used, infants did perceive the relation between auditory and visual speech attributes, but only in response to their native language. Thus, the findings suggest that ID speech might have an influence on the intersensory perception of fluent speech and shed further light on multisensory perceptual narrowing. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. Interactional convergence in conversational storytelling: when reported speech is a cue of alignment and/or affiliation.

    Science.gov (United States)

    Guardiola, Mathilde; Bertrand, Roxane

    2013-01-01

    This paper investigates how and when interactional convergence is established by participants in conversation. We analyze sequences of storytelling using an original method that combines Conversation Analysis and a corpus-based approach. In storytelling, the participant in the position of "listener" is expected to produce either generic or specific responses adapted to the storyteller's narrative. The listener's behavior produced within the current activity is a cue of his/her interactional alignment. We show here that the listener can produce a specific type of (aligned) response, which we term a reported speech utterance in echo. The participant who is not telling the story is nonetheless able to animate the characters, while reversing the usual asymmetric roles of storyteller and listener. The use of this device is a way for the listener to display his/her stance toward the events told by the storyteller. If the listener's stance is congruent with that of the storyteller, this reveals a high degree of affiliation between the participants. We present seventeen excerpts from a collection of 94 instances of Echo Reported Speech (ERS) which we examined using the concepts of alignment and affiliation in order to show how different kinds of convergent sequences are constructed. We demonstrate that this phenomenon is mainly used by the listener to align and affiliate with the storyteller by means of reformulative, enumerative, or overbidding ERS. We also show that in affiliative sequences, reported speech can be used by the listener in a humorous way in order to temporarily disalign. This disalignment constitutes a potential starting point for an oblique sequence, which, if accepted and continued by the storyteller, gives rise to a highly convergent sequence.

  5. Cues to speech segmentation: evidence from juncture misperceptions and word spotting.

    Science.gov (United States)

    Vroomen, J; van Zon, M; de Gelder, B

    1996-11-01

    The question of whether Dutch listeners rely on the rhythmic characteristics of their native language to segment speech was investigated in three experiments. In Experiment 1, listeners were induced to make missegmentations of continuous speech. The results showed that word boundaries were inserted before strong syllables and deleted before weak syllables. In Experiment 2, listeners were required to spot real CVC or CVCC words (C = consonant, V = vowel) embedded in bisyllabic nonsense strings. For CVCC words, fewer errors were made when the second syllable of the nonsense string was weak rather than strong, whereas for CVC words the effect was reversed. Experiment 3 ruled out an acoustic explanation for this effect. It is argued that these results are in line with an account in which both metrical segmentation and lexical competition play a role.

  6. Subcortical encoding of speech cues in children with congenital blindness.

    Science.gov (United States)

    Jafari, Zahra; Malayeri, Saeed

    2016-09-21

    Congenital visual deprivation underlies neural plasticity in different brain areas, and provides an outstanding opportunity to study the neuroplastic capabilities of the brain. The present study aimed to investigate the effect of congenital blindness on subcortical auditory processing using electrophysiological and behavioral assessments in children. A total of 47 children aged 8-12 years, including 22 congenitally blind (CB) children and 25 normal-sighted (NS) control, were studied. All children were tested using an auditory brainstem response (ABR) test with both click and speech stimuli. Speech recognition and musical abilities were tested using standard tools. Significant differences were observed between the two groups in speech ABR wave latencies A, F and O (p≤0.043), wave amplitude F (p = 0.039), V-A slope (p = 0.026), and three spectral magnitudes F0, F1 and HF (p≤0.002). CB children showed a superior performance compared to NS peers in all the subtests and the total score of musical abilities (p≤0.003). Moreover, they had significantly higher scores during the nonsense syllable test in noise than the NS children (p = 0.034). Significant negative correlations were found only in CB children between the total music score and both wave A (p = 0.039) and wave F (p = 0.029) latencies, as well as nonsense-syllable test in noise and the wave A latency (p = 0.041). Our results suggest that neuroplasticity resulting from congenital blindness can be measured subcortically and has a heightened effect on temporal, musical and speech processing abilities. The findings have been discussed based on models of plasticity and the influence of corticofugal modulation in synthesizing complex auditory stimuli.

  7. Self-organizing maps for measuring similarity of audiovisual speech percepts

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich

    . Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled......The goal of this work is to find a way to measure similarity of audiovisual speech percepts. Phoneme-related self-organizing maps (SOM) with a rectangular basis are trained with data material from a (labeled) video film. For the training, a combination of auditory speech features and corresponding...... audio-visual speech percepts and to measure coarticulatory effects....

  8. Pitch contour impairment in congenital amusia: New insights from the Self-paced Audio-visual Contour Task (SACT).

    Science.gov (United States)

    Lu, Xuejing; Sun, Yanan; Ho, Hao Tam; Thompson, William Forde

    2017-01-01

    Individuals with congenital amusia usually exhibit impairments in melodic contour processing when asked to compare pairs of melodies that may or may not be identical to one another. However, it is unclear whether the impairment observed in contour processing is caused by an impairment of pitch discrimination, or is a consequence of poor pitch memory. To help resolve this ambiguity, we designed a novel Self-paced Audio-visual Contour Task (SACT) that evaluates sensitivity to contour while placing minimal burden on memory. In this task, participants control the pace of an auditory contour that is simultaneously accompanied by a visual contour, and they are asked to judge whether the two contours are congruent or incongruent. In Experiment 1, melodic contours varying in pitch were presented with a series of dots that varied in spatial height. Amusics exhibited reduced sensitivity to audio-visual congruency in comparison to control participants. To exclude the possibility that the impairment arises from a general deficit in cross-modal mapping, Experiment 2 examined sensitivity to cross-modal mapping for two other auditory dimensions: timbral brightness and loudness. Amusics and controls were significantly more sensitive to large than small contour changes, and to changes in loudness than changes in timbre. However, there were no group differences in cross-modal mapping, suggesting that individuals with congenital amusia can comprehend spatial representations of acoustic information. Taken together, the findings indicate that pitch contour processing in congenital amusia remains impaired even when pitch memory is relatively unburdened.

  9. PENINGKATAN HASIL BELAJAR GULING DEPAN MELALUI MEDIA PEMBELAJARAN AUDIO VISUAL PADA KELAS VIII SMP NEGERI 5 SEMARANG TAHUN 2013

    Directory of Open Access Journals (Sweden)

    Riski Amanuloh

    2015-05-01

    Full Text Available The purpose of this research is to improve student learning outcomes through audio-visual media in teaching gymnastics at the front of the class bolsters SMP Negeri 5 Semarang. This study used a descriptive quantitative research methods and types of research is a class action consisting of four stages : planning , action , observation , and reflection. Data collection techniques used were tests , observation sheets and questionnaires. In the first cycle of learning outcomes obtained a mean score of 70.2 category ( good taken from three aspects, namely psychomotor assessment criteria a mean value ( 71.8 , affective a mean value ( 68.5 and cognitive the mean value ( 70.4 , the results it can be concluded that student learning outcomes are good enough but needs to be improved. While in Cycle II study results obtained mean score of 85.8 category ( good taken from three aspects , namely psychomotor assessment criteria with a mean value ( 88.3 , affective with a mean value ( 85.0 and cognitive with the mean value ( 84.2 , of the results can be concluded that student learning outcomes have been good ( Completed All in keeping with KKM in the set , namely ( 80.00 . Can be summed scores increased student learning outcomes . Thus the use of audio-visual media in teaching learning gymnastics forward roll can increase motivation , comprehension and learning outcomes eighth grade students of SMP Negeri 5 Semarang .

  10. Pitch contour impairment in congenital amusia: New insights from the Self-paced Audio-visual Contour Task (SACT.

    Directory of Open Access Journals (Sweden)

    Xuejing Lu

    Full Text Available Individuals with congenital amusia usually exhibit impairments in melodic contour processing when asked to compare pairs of melodies that may or may not be identical to one another. However, it is unclear whether the impairment observed in contour processing is caused by an impairment of pitch discrimination, or is a consequence of poor pitch memory. To help resolve this ambiguity, we designed a novel Self-paced Audio-visual Contour Task (SACT that evaluates sensitivity to contour while placing minimal burden on memory. In this task, participants control the pace of an auditory contour that is simultaneously accompanied by a visual contour, and they are asked to judge whether the two contours are congruent or incongruent. In Experiment 1, melodic contours varying in pitch were presented with a series of dots that varied in spatial height. Amusics exhibited reduced sensitivity to audio-visual congruency in comparison to control participants. To exclude the possibility that the impairment arises from a general deficit in cross-modal mapping, Experiment 2 examined sensitivity to cross-modal mapping for two other auditory dimensions: timbral brightness and loudness. Amusics and controls were significantly more sensitive to large than small contour changes, and to changes in loudness than changes in timbre. However, there were no group differences in cross-modal mapping, suggesting that individuals with congenital amusia can comprehend spatial representations of acoustic information. Taken together, the findings indicate that pitch contour processing in congenital amusia remains impaired even when pitch memory is relatively unburdened.

  11. Audio-Visual and Autogenic Relaxation Alter Amplitude of Alpha EEG Band, Causing Improvements in Mental Work Performance in Athletes.

    Science.gov (United States)

    Mikicin, Mirosław; Kowalczyk, Marek

    2015-09-01

    The aim of the present study was to investigate the effect of regular audio-visual relaxation combined with Schultz's autogenic training on: (1) the results of behavioral tests that evaluate work performance during burdensome cognitive tasks (Kraepelin test), (2) changes in classical EEG alpha frequency band, neocortex (frontal, temporal, occipital, parietal), hemisphere (left, right) versus condition (only relaxation 7-12 Hz). Both experimental (EG) and age-and skill-matched control group (CG) consisted of eighteen athletes (ten males and eight females). After 7-month training EG demonstrated changes in the amplitude of mean electrical activity of the EEG alpha bend at rest and an improvement was significantly changing and an improvement in almost all components of Kraepelin test. The same examined variables in CG were unchanged following the period without the intervention. Summing up, combining audio-visual relaxation with autogenic training significantly improves athlete's ability to perform a prolonged mental effort. These changes are accompanied by greater amplitude of waves in alpha band in the state of relax. The results suggest usefulness of relaxation techniques during performance of mentally difficult sports tasks (sports based on speed and stamina, sports games, combat sports) and during relax of athletes.

  12. Influences of audio-visual environments on feelings of deliciousness during having sweet foods: an electroencephalogram frequency analysis study.

    Science.gov (United States)

    Yoshimura, Hiroshi; Honjo, Miho; Sugai, Tokio; Kawabe, Mamichi; Kaneyama, Keiseki; Segami, Natsuki; Kato, Nobuo

    2011-09-01

    Feelings of deliciousness during having foods are mainly produced by perceptions of sensory information extracted from foods themselves, such as taste and olfaction. However, environmental factors might modify the feeling of deliciousness. In the present study, we investigated how the condition of audio-visual environments affects the feeling of deliciousness during having sweet foods. Electroencephalograms (EEGs) were recorded from the frontal region of the scalp of healthy participants under virtual scenes of tearoom and construction work, respectively. The participants were asked to rate deliciousness after the recordings. Frequency analyses were performed from the EEGs. During having the foods, occupancy rates of beta frequency band between tearoom scenes and construction work scenes were markedly different, but not in other frequency bands. During having no food, in contrast, there was no difference of occupancy rates in respective frequency bands between the two different scenes. With regard to deliciousness during having sweet foods, all participants rated high scores under the scenes of tearoom than those under the scenes of construction work. Interestingly, there is a positive correlation between occupancy rates of beta frequency band and scores of deliciousness. These findings suggest that comfortable audio-visual environments play an important role in increasing the feeling of deliciousness during having sweet foods, in which beta frequency rhythms may be concerned with producing comprehensive feelings of deliciousness.

  13. Brain potentials indicate immediate use of prosodic cues in natural speech processing.

    Science.gov (United States)

    Steinhauer, K; Alter, K; Friederici, A D

    1999-02-01

    Spoken language, in contrast to written text, provides prosodic information such as rhythm, pauses, accents, amplitude and pitch variations. However, little is known about when and how these features are used by the listener to interpret the speech signal. Here we use event-related brain potentials (ERP) to demonstrate that intonational phrasing guides the initial analysis of sentence structure. Our finding of a positive shift in the ERP at intonational phrase boundaries suggests a specific on-line brain response to prosodic processing. Additional ERP components indicate that a false prosodic boundary is sufficient to mislead the listener's sentence processor. Thus, the application of ERP measures is a promising approach for revealing the time course and neural basis of prosodic information processing.

  14. An audio-visual dataset of human-human interactions in stressful situations

    NARCIS (Netherlands)

    Lefter, I.; Burghouts, G.J.; Rothkrantz, L.J.M.

    2014-01-01

    Stressful situations are likely to occur at human operated service desks, as well as at human-computer interfaces used in public domain. Automatic surveillance can help notifying when extra assistance is needed. Human communication is inherently multimodal e.g. speech, gestures, facial expressions.

  15. AN EXPERIMENTAL EVALUATION OF AUDIO-VISUAL METHODS--CHANGING ATTITUDES TOWARD EDUCATION.

    Science.gov (United States)

    LOWELL, EDGAR L.; AND OTHERS

    AUDIOVISUAL PROGRAMS FOR PARENTS OF DEAF CHILDREN WERE DEVELOPED AND EVALUATED. EIGHTEEN SOUND FILMS AND ACCOMPANYING RECORDS PRESENTED INFORMATION ON HEARING, LIPREADING AND SPEECH, AND ATTEMPTED TO CHANGE PARENTAL ATTITUDES TOWARD CHILDREN AND SPOUSES. TWO VERSIONS OF THE FILMS AND RECORDS WERE NARRATED BY (1) "STARS" WHO WERE…

  16. Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2018-01-01

    learning-based model that continuously adapts crossmodal combinations in response to dynamic changes in noisy sensory stimuli but does not require a priori knowledge of sensory noise. The model correlates sensory cues within a single modality as well as across modalities to independently update modality......Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal......-specific neural weights. This model is instantiated as a neural circuit that continuously learns the best possible weights required for a weighted combination of noisy low-level auditory and visual spatial target direction cues. The combined sensory information is directly mapped to wheel velocities that orient...

  17. Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2017-01-01

    learning-based model that continuously adapts crossmodal combinations in response to dynamic changes in noisy sensory stimuli but does not require a priori knowledge of sensory noise. The model correlates sensory cues within a single modality as well as across modalities to independently update modality......Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal......-specific neural weights. This model is instantiated as a neural circuit that continuously learns the best possible weights required for a weighted combination of noisy low-level auditory and visual spatial target direction cues. The combined sensory information is directly mapped to wheel velocities that orient...

  18. The categorisation of speech sounds by adults and children: a study of the categorical perception hypothesis and the development weighting of acoustic speech cues

    NARCIS (Netherlands)

    Gerrits, E.

    2001-01-01

    This thesis investigates the way adults and children perceive speech. With adult listeners, the question was whether speech is perceived categorically (categorical speech perception). With children, the question was whether there are age-related differences between the weights assigned to

  19. Hand gestures as visual prosody: BOLD responses to audio-visual alignment are modulated by the communicative nature of the stimuli.

    Science.gov (United States)

    Biau, Emmanuel; Morís Fernández, Luis; Holle, Henning; Avila, César; Soto-Faraco, Salvador

    2016-05-15

    During public addresses, speakers accompany their discourse with spontaneous hand gestures (beats) that are tightly synchronized with the prosodic contour of the discourse. It has been proposed that speech and beat gestures originate from a common underlying linguistic process whereby both speech prosody and beats serve to emphasize relevant information. We hypothesized that breaking the consistency between beats and prosody by temporal desynchronization, would modulate activity of brain areas sensitive to speech-gesture integration. To this aim, we measured BOLD responses as participants watched a natural discourse where the speaker used beat gestures. In order to identify brain areas specifically involved in processing hand gestures with communicative intention, beat synchrony was evaluated against arbitrary visual cues bearing equivalent rhythmic and spatial properties as the gestures. Our results revealed that left MTG and IFG were specifically sensitive to speech synchronized with beats, compared to the arbitrary vision-speech pairing. Our results suggest that listeners confer beats a function of visual prosody, complementary to the prosodic structure of speech. We conclude that the emphasizing function of beat gestures in speech perception is instantiated through a specialized brain network sensitive to the communicative intent conveyed by a speaker with his/her hands. Copyright © 2016. Published by Elsevier Inc.

  20. Twenty-Fifth Annual Audio-Visual Aids Conference, Wednesday 9th to Friday 11th July 1975, Whitelands College, Putney SW15. Conference Preprints.

    Science.gov (United States)

    National Committee for Audio-Visual Aids in Education, London (England).

    Preprints of papers to be presented at the 25th annual Audio-Visual Aids Conference are collected along with the conference program. Papers include official messages, a review of the conference's history, and presentations on photography in education, using school broadcasts, flexibility in the use of television, the "communications…

  1. Relationship between Audio-Visual Materials and Environmental Factors on Students Academic Performance in Senior Secondary Schools in Borno State: Implications for Counselling

    Science.gov (United States)

    Bello, S.; Goni, Umar

    2016-01-01

    This is a survey study, designed to determine the relationship between audio-visual materials and environmental factors on students' academic performance in Senior Secondary Schools in Borno State: Implications for Counselling. The study set two research objectives, and tested two research hypotheses. The population of this study is 1,987 students…

  2. Knitting Relational Documentary Networks: The Database Meta-Documentary Filming Revolution as a paradigm of bringing interactive audio-visual archives alive

    NARCIS (Netherlands)

    Wiehl, Anna

    2016-01-01

    abstractOne phenomenon in the emerging field of digital documentary are experiments with rhizomatic interfaces and database-logics to bring audio-visual archives 'alive'. A paradigm hereof is Filming Revolution (2015), an interactive platform which gathers and interlinks films of the uprisings in

  3. IST BENOGO (IST – 2001-39184) Deliverable I-AAU-05-01: Role of sound in VR and Audio Visual Preferences

    DEFF Research Database (Denmark)

    Nordahl, Rolf

    This Periodic Progres Report (PPR) document reports on the studies done in Aalborg University on December 2004 concerning role of sound in VR, audio-visual correlations and attention triggering. The report contains a description and evaluation of the experiments run, together with the analysis...

  4. Atypical rapid audio-visual temporal recalibration in autism spectrum disorders.

    Science.gov (United States)

    Noel, Jean-Paul; De Niear, Matthew A; Stevenson, Ryan; Alais, David; Wallace, Mark T

    2017-01-01

    Changes in sensory and multisensory function are increasingly recognized as a common phenotypic characteristic of Autism Spectrum Disorders (ASD). Furthermore, much recent evidence suggests that sensory disturbances likely play an important role in contributing to social communication weaknesses-one of the core diagnostic features of ASD. An established sensory disturbance observed in ASD is reduced audiovisual temporal acuity. In the current study, we substantially extend these explorations of multisensory temporal function within the framework that an inability to rapidly recalibrate to changes in audiovisual temporal relations may play an important and under-recognized role in ASD. In the paradigm, we present ASD and typically developing (TD) children and adolescents with asynchronous audiovisual stimuli of varying levels of complexity and ask them to perform a simultaneity judgment (SJ). In the critical analysis, we test audiovisual temporal processing on trial t as a condition of trial t - 1. The results demonstrate that individuals with ASD fail to rapidly recalibrate to audiovisual asynchronies in an equivalent manner to their TD counterparts for simple and non-linguistic stimuli (i.e., flashes and beeps, hand-held tools), but exhibit comparable rapid recalibration for speech stimuli. These results are discussed in terms of prior work showing a speech-specific deficit in audiovisual temporal function in ASD, and in light of current theories of autism focusing on sensory noise and stability of perceptual representations. Autism Res 2017, 10: 121-129. © 2016 International Society for Autism Research, Wiley Periodicals, Inc. © 2016 International Society for Autism Research, Wiley Periodicals, Inc.

  5. A survey of affect recognition methods: audio, visual, and spontaneous expressions.

    Science.gov (United States)

    Zeng, Zhihong; Pantic, Maja; Roisman, Glenn I; Huang, Thomas S

    2009-01-01

    Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.

  6. Speech entrainment enables patients with Broca's aphasia to produce fluent speech.

    Science.gov (United States)

    Fridriksson, Julius; Hubbard, H Isabel; Hudspeth, Sarah Grace; Holland, Audrey L; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-12-01

    A distinguishing feature of Broca's aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect 'speech entrainment' and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca's aphasia. In Experiment 1, 13 patients with Broca's aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca's area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and

  7. Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

    2016-01-01

    Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…

  8. A role for the inferior colliculus in multisensory speech integration.

    Science.gov (United States)

    Champoux, François; Tremblay, Corinne; Mercier, Claude; Lassonde, Maryse; Lepore, Franco; Gagné, Jean-Pierre; Théoret, Hugo

    2006-10-23

    Multisensory integration can occur at relatively low levels within the central nervous system. Recent evidence suggests that multisensory audio-visual integration for speech may have a subcortical component, as acoustic processing in the human brainstem is influenced by lipreading during speech perception. Here, stimuli depicting the McGurk illusion (a demonstration of auditory-visual integration using speech stimuli) were presented to a 12-year-old child (FX) with a circumscribed unilateral lesion of the right inferior colliculus. When McGurk-type stimuli were presented in the contralesional hemifield, illusory perception reflecting bimodal integration was significantly reduced compared with the ipsilesional hemifield and a group of age-matched controls. These data suggest a functional role for the inferior colliculus in the audio-visual integration of speech stimuli.

  9. Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

    Directory of Open Access Journals (Sweden)

    Elena V Kushnerenko

    2013-07-01

    Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.

  10. Development and testing of an audio-visual aid for improving infant oral health through primary caregiver education.

    Science.gov (United States)

    Alsada, Lisa H; Sigal, Michael J; Limeback, Hardy; Fiege, James; Kulkarni, Gajanan V

    2005-04-01

    To create and test an audio-visual (AV) aid for providing anticipatory guidance on infant oral health to caregivers. A DVD-video containing evidence-based information about infant oral health care and prevention in accordance with the American Academy of Pediatric Dentistry guidelines has been developed (www.utoronto.ca/dentistry/newsresources/kids/). It contains comprehensive anticipatory guidance in the areas of pregnancy, oral development, teething, diet and nutrition, oral hygiene, fluoride use, acquisition of oral bacteria, feeding and oral habits, causes and sequelae of early childhood caries, trauma prevention, early dental visits and regular dental visits. A questionnaire was developed to test the knowledge of expectant and young mothers (n = 11) and early childhood educators (n = 16) before and after viewing the video. A significant lack of knowledge about infant oral health was indicated by the proportion of "I don"t know" (22%) and incorrect (19%) responses to the questionnaire before the viewing. Significant improvement in knowledge (32%; range -3% to 57%; p aid. This AV aid promises to be an effective tool in providing anticipatory guidance regarding infant oral health in high-risk populations. Unlike existing educational materials, this aid provides a comprehensive, self-directed, evidence-based approach to the promotion of infant oral health. Widespread application of this prevention protocol has the potential to result in greater awareness, increased use of dental services and reduced incidence of preventable oral disease in the target populations.

  11. Fused quad audio/visual and tracking data collection to enhance mobile robot and operator performance analyses

    Science.gov (United States)

    Weiss, Brian A.; Antonishek, Brian; Norcross, Richard

    2008-04-01

    Collecting accurate, adequate ground truth and experimental data to support technology evaluations is critical in formulating exact and methodical analyses of the system's performance. Personnel at the National Institute of Standards and Technology (NIST), tasked with developing performance measures and standards for both Urban Search and Rescue (US&R) and bomb disposal robots, have been designing advanced ground truth data collection methods to support these efforts. These new techniques fuse multiple real-time streams of video and robot tracking data to facilitate more complete human robot interaction (HRI) analyses following a robot's experiences. As a robot maneuvers through a test method, video and audio streams are simultaneously collected and fed into a quad compressor providing real-time display. This fused quad audio/visual data provides a complete picture of what the operators and robots are doing throughout their evaluation to not only enhance HRI analyses, but also provide valuable data that can be used to aid operator training, encourage implementation improvements by highlighting successes and failures to the developers/vendors, and demonstrate capabilities to end-users and buyers. Quad data collection system deployments to support US&R test methods/scenarios at the 2007 Robot Response Evaluation in Disaster City, Texas will be highlighted.

  12. Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes.

    Science.gov (United States)

    Setti, Annalisa; Burke, Kate E; Kenny, Roseanne; Newell, Fiona N

    2013-01-01

    Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the "McGurk illusion," in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.

  13. Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes

    Directory of Open Access Journals (Sweden)

    Annalisa eSetti

    2013-09-01

    Full Text Available Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the ‘McGurk illusion’, in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.

  14. Concurrent Unimodal Learning Enhances Multisensory Responses of Bi-Directional Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2018-01-01

    .e. the reliabilities of the participating stimulus cues. We present a Hebbian-like temporal correlation learning-based adaptive neural circuit for crossmodal cue integration that does not require such a priori information. The circuit correlates stimulus cues within each modality as well as bidirectionally across......Crossmodal sensory cue integration is a fundamental process in the brain by which stimulus cues from different sensory modalities are combined together to form an coherent and unified representation of observed events in the world. Crossmodal integration is a developmental process involving...... learning, with neuroplasticity as its underlying mechanism. Bayesian models of crossmodal cue integration form a unified percept as a sum of stimulus cues weighted by their respective reliabilities. This approach however requires a priori knowledge of the underlying stimulus noise distributions, i...

  15. Multisensory Speech Perception in Children with Autism Spectrum Disorders

    OpenAIRE

    Woynaroski, Tiffany G.; Kwakye, Leslie D.; Foss-Feig, Jennifer H.; Stevenson, Ryan A.; Stone, Wendy L.; Wallace, Mark T.

    2013-01-01

    This study examined unisensory and multisensory speech perception in 8–17 year old children with autism spectrum disorders (ASD) and typically developing controls matched on chronological age, sex, and IQ. Consonant– vowel syllables were presented in visual only, auditory only, matched audio-visual, and mismatched audiovisual (“McGurk”) conditions. Participants with ASD displayed deficits in visual only and matched audiovisual speech perception. Additionally, children with ASD reported a visu...

  16. Haptic and visual information speed up the neural processing of auditory speech in live dyadic interactions.

    Science.gov (United States)

    Treille, Avril; Cordeboeuf, Camille; Vilain, Coriandre; Sato, Marc

    2014-05-01

    Speech can be perceived not only by the ear and by the eye but also by the hand, with speech gestures felt from manual tactile contact with the speaker׳s face. In the present electro-encephalographic study, early cross-modal interactions were investigated by comparing auditory evoked potentials during auditory, audio-visual and audio-haptic speech perception in dyadic interactions between a listener and a speaker. In line with previous studies, early auditory evoked responses were attenuated and speeded up during audio-visual compared to auditory speech perception. Crucially, shortened latencies of early auditory evoked potentials were also observed during audio-haptic speech perception. Altogether, these results suggest early bimodal interactions during live face-to-face and hand-to-face speech perception in dyadic interactions. Copyright © 2014. Published by Elsevier Ltd.

  17. Interaction between electric and acoustic cues in diotic condition for speech perception in quiet and noise by cochlear implantees.

    Science.gov (United States)

    Richard, Céline; Ferrary, Evelyne; Borel, Stéphanie; Sterkers, Olivier; Grayeli, Alexis Bozorg

    2012-01-01

    This study aimed to evaluate the interaction of electric and acoustic cues in diotic condition in cochlear implantees. Five adult cochlear implantees with residual contralateral hearing were prospectively evaluated in hearing aid only (HA), cochlear implant only (CI), and HA + CI modes by audiometry (pure tone, dissyllabic words, and sentences), and sound quality questionnaires. CI electrodes corresponding to preserved frequencies in the contralateral ear (free-field aided thresholds, electric diotic cues but also some redundancy affecting the sound quality.

  18. Television and the Internet: The Role Digital Technologies Play in Adolescents’ Audio-Visual Media Consumption. Young Television Audiences in Catalonia (Spain

    Directory of Open Access Journals (Sweden)

    Meritxell Roca

    2014-03-01

    Full Text Available The aim of this reported study was to investigate adolescents TV consumption habits and perceptions. Although there appears to be no general consensus on how the Internet affects TV consumption by teenagers, and data vary depending on the country, according to our study, Spanish adolescents perceive television as a habit “of the past” and find the computer a device more suited to their recreational and audio-visual consumption needs. The data obtained from eight focus groups of teenagers aged between 12 and 18 and an online survey sent to their parents show that watching TV is an activity usually linked to the home’s communal spaces. On the contrary, online audio-visual consumption (understood as a wider term not limited to just TV shows is perceived by adolescents as a more convenient activity as it adapts to their own schedules and needs.

  19. N1 enhancement in synesthesia during visual and audio-visual perception in semantic cross-modal conflict situations: an ERP study

    Directory of Open Access Journals (Sweden)

    Christopher eSinke

    2014-01-01

    Full Text Available Synesthesia entails a special kind of sensory perception, where stimulation in one sensory modality leads to an internally generated perceptual experience of another, not stimulated sensory modality. This phenomenon can be viewed as an abnormal multisensory integration process as here the synesthetic percept is aberrantly fused with the stimulated modality. Indeed, recent synesthesia research has focused on multimodal processing even outside of the specific synesthesia-inducing context and has revealed changed multimodal integration, thus suggesting perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and inanimated objects in an audio-visual congruent and incongruent manner. Fourteen subjects with auditory-visual and/or grapheme-color synesthesia and 14 control subjects participated in the experiment. During presentation of the stimuli, event-related potentials were recorded from 32 electrodes. The analysis of reaction times and error rates revealed no group differences with best performance for audio-visually congruent stimulation indicating the well-known multimodal facilitation effect. We found an enhanced amplitude of the N1 component over occipital electrode sites for synesthetes compared to controls. The differences occurred irrespective of the experimental condition and therefore suggest a global influence on early sensory processing in synesthetes.

  20. PENGGUNAAN MEDIA AUDIO VISUAL UNTUK MENINGKATKAN HASIL BELAJAR MATERI MERODA PADA SENAM LANTAI KELAS VIII SMP NEGERI 13 SEMARANG TAHUN 2013/2014

    Directory of Open Access Journals (Sweden)

    Sigit Budi Prastyyo

    2015-01-01

    Full Text Available The purpose of this study was to determine the improvement of teaching physical education in schools through the use of audio-visual media aids the learning outcomes gymnastics floor meroda the eighth grade students of SMP Negeri 13 Semarang. In this research, a classroom action research (CAR cycle , the study was conducted in two cycles of action . Methods of data collection using the methods of documentation , observation , and testing . Analysis of the data using descriptive method by way of student learning outcomes after the action . Based on the results obtained by the use of audio-visual media in the learning material meroda floor exercises can improve learning outcomes eighth grade at Junior High School 13 Semarang 2013/2014 . This is evidenced by the acquisition value of the learning outcomes of each cycle has increased . The average value of students in the first cycle the average test score of students reached 70.51 , reaching 84.72 in the second cycle . Classical completeness in the first cycle of 54.84 % and the second cycle was 90.32 % . From the research results obtained it can be concluded that the learning material meroda floor exercises with the use of audio-visual media can improve learning outcomes students of SMP Negeri 13 Semarang .

  1. N1 enhancement in synesthesia during visual and audio-visual perception in semantic cross-modal conflict situations: an ERP study.

    Science.gov (United States)

    Sinke, Christopher; Neufeld, Janina; Wiswede, Daniel; Emrich, Hinderk M; Bleich, Stefan; Münte, Thomas F; Szycik, Gregor R

    2014-01-01

    Synesthesia entails a special kind of sensory perception, where stimulation in one sensory modality leads to an internally generated perceptual experience of another, not stimulated sensory modality. This phenomenon can be viewed as an abnormal multisensory integration process as here the synesthetic percept is aberrantly fused with the stimulated modality. Indeed, recent synesthesia research has focused on multimodal processing even outside of the specific synesthesia-inducing context and has revealed changed multimodal integration, thus suggesting perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and in animated objects in an audio-visual congruent and incongruent manner. Fourteen subjects with auditory-visual and/or grapheme-color synesthesia and 14 control subjects participated in the experiment. During presentation of the stimuli, event-related potentials were recorded from 32 electrodes. The analysis of reaction times and error rates revealed no group differences with best performance for audio-visually congruent stimulation indicating the well-known multimodal facilitation effect. We found enhanced amplitude of the N1 component over occipital electrode sites for synesthetes compared to controls. The differences occurred irrespective of the experimental condition and therefore suggest a global influence on early sensory processing in synesthetes.

  2. Perception of co-speech gestures in aphasic patients: a visual exploration study during the observation of dyadic conversations.

    Science.gov (United States)

    Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M

    2015-03-01

    Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Audio Visual Center

    Data.gov (United States)

    Federal Laboratory Consortium — The Audiovisual Services Center provides still photographic documentation with laboratory support, video documentation, video editing, video duplication, photo/video...

  4. Learning words' sounds before learning how words sound: 9-month-olds use distinct objects as cues to categorize speech information.

    Science.gov (United States)

    Yeung, H Henny; Werker, Janet F

    2009-11-01

    One of the central themes in the study of language acquisition is the gap between the linguistic knowledge that learners demonstrate, and the apparent inadequacy of linguistic input to support induction of this knowledge. One of the first linguistic abilities in the course of development to exemplify this problem is in speech perception: specifically, learning the sound system of one's native language. Native-language sound systems are defined by meaningful contrasts among words in a language, yet infants learn these sound patterns before any significant numbers of words are acquired. Previous approaches to this learning problem have suggested that infants can learn phonetic categories from statistical analysis of auditory input, without regard to word referents. Experimental evidence presented here suggests instead that young infants can use visual cues present in word-labeling situations to categorize phonetic information. In Experiment 1, 9-month-old English-learning infants failed to discriminate two non-native phonetic categories, establishing baseline performance in a perceptual discrimination task. In Experiment 2, these infants succeeded at discrimination after watching contrasting visual cues (i.e., videos of two novel objects) paired consistently with the two non-native phonetic categories. In Experiment 3, these infants failed at discrimination after watching the same visual cues, but paired inconsistently with the two phonetic categories. At an age before which memory of word labels is demonstrated in the laboratory, 9-month-old infants use contrastive pairings between objects and sounds to influence their phonetic sensitivity. Phonetic learning may have a more functional basis than previous statistical learning mechanisms assume: infants may use cross-modal associations inherent in social contexts to learn native-language phonetic categories.

  5. Non-fluent speech following stroke is caused by impaired efference copy.

    Science.gov (United States)

    Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

    2017-09-01

    Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.

  6. UPAYA MENINGKATKAN AKTIVITAS DAN HASIL BELAJAR MATERI APRESIASI TERHADAP KEUNIKAN SENI MUSIK DAERAH SETEMPAT DENGAN MENGGUNAKAN MEDIA AUDIO VISUAL PADA SISWA KELAS VII A SMP NEGERI 3 RANDUDONGKAL

    Directory of Open Access Journals (Sweden)

    Rina Muktinurasih

    2014-02-01

    Full Text Available Folk music is an element of simplicity and regionalism. Improving activities toward the appreciation on the work of art, especially folk music, was carried out by identifying the variety of folk songs, according to the personal view of most students. Over the course of the years, most students can only enjoy music. Because it takes an interest in advance so that students can express the music. Music learningneeds a lot of practice, however most of the times teachers are dominating the classroom time allocation meanwhile the students do not have adequate time to practice. The problems addressed in this study are : (1 whether or not the use of Audio Visual media can improvestudents learning activity in folk music appreciation (2 whether or not the use ofAudio Visual media can improve students learning outcomes in folk music appreciation material. The method used in this study was classroom action research with two cycles, each cycle consists of 4 phases: (1 planning (2 implementation (3 observation/ evaluation, (4 reflection. The research results shows that there were improvements both in the students learning activities and outcome from the use of Audio Visual learning media in folk music appreciation material. During the pre cycle there were only 16 out of 34 students passed (47.07%, onthe first cycle there were 20 out of 34 students passed (74.24%, and finally onthe second cycle there were 28out of 34 students passed (82.35%. Therefore it can be concluded that by the end of this second cycle, the indicator of the overall success has achieved the required frequency.

  7. Audio-visual synchrony and spatial attention enhance processing of dynamic visual stimulation independently and in parallel: A frequency-tagging study.

    Science.gov (United States)

    Covic, Amra; Keitel, Christian; Porcu, Emanuele; Schröger, Erich; Müller, Matthias M

    2017-08-09

    The neural processing of a visual stimulus can be facilitated by attending to its position or by a co-occurring auditory tone. Using frequency-tagging, we investigated whether facilitation by spatial attention and audio-visual synchrony rely on similar neural processes. Participants attended to one of two flickering Gabor patches (14.17 and 17 Hz) located in opposite lower visual fields. Gabor patches further "pulsed" (i.e. showed smooth spatial frequency variations) at distinct rates (3.14 and 3.63 Hz). Frequency-modulating an auditory stimulus at the pulse-rate of one of the visual stimuli established audio-visual synchrony. Flicker and pulsed stimulation elicited stimulus-locked rhythmic electrophysiological brain responses that allowed tracking the neural processing of simultaneously presented Gabor patches. These steady-state responses (SSRs) were quantified in the spectral domain to examine visual stimulus processing under conditions of synchronous vs. asynchronous tone presentation and when respective stimulus positions were attended vs. unattended. Strikingly, unique patterns of effects on pulse- and flicker driven SSRs indicated that spatial attention and audiovisual synchrony facilitated early visual processing in parallel and via different cortical processes. We found attention effects to resemble the classical top-down gain effect facilitating both, flicker and pulse-driven SSRs. Audio-visual synchrony, in turn, only amplified synchrony-producing stimulus aspects (i.e. pulse-driven SSRs) possibly highlighting the role of temporally co-occurring sights and sounds in bottom-up multisensory integration. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Pengembangan Media Pembelajaran Sejarah Berbasis Media Audio Visual Situs Purbakala Pugung Raharjo Untuk Meningkatkan Kesadaran Sejarah Siswa Kelas X SMA Negeri 1 Kotagajah

    OpenAIRE

    Samsi Haryanto, Aurora Nandia F , Herman J Waluyo

    2016-01-01

    The Product that expected in this research is a product of learning the history of media based on media audio visual The Site of Ancient Pugung Raharjo that packed in the form of learning video to raise awareness of the history of student learning and achievement. The purpose of this research are (1) to know the media teaching history which have so far carried out in SMA Negeri 1 Kotagajah, (2) to know the media teaching of history which have carried out in SMA Negeri 1 Kotagajah, (3) to know...

  9. Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users.

    Science.gov (United States)

    Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha

    2015-03-01

    While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed

  10. The sound of your lips: electrophysiological cross-modal interactions during hand-to-face and face-to-face speech perception

    OpenAIRE

    Avrill eTreille; Coriandre eVilain; Marc eSato

    2014-01-01

    Recent magneto-encephalographic and electro-encephalographic studies provide evidence for cross-modal integration during audio-visual and audio-haptic speech perception, with speech gestures viewed or felt from manual tactile contact with the speaker’s face. Given the temporal precedence of the haptic and visual signals on the acoustic signal in these studies, the observed modulation of N1/P2 auditory evoked responses during bimodal compared to unimodal speech perception suggest that relevant...

  11. Visual Distance Cues Amplify Neuromagnetic Auditory N1m Responses

    Directory of Open Access Journals (Sweden)

    Christian F Altmann

    2011-10-01

    Full Text Available Ranging of auditory objects relies on several acoustic cues and is possibly modulated by additional visual information. Sound pressure level can serve as a cue for distance perception because it decreases with increasing distance. In this agnetoencephalography (MEG experiment, we tested whether psychophysical loudness judgment and N1m MEG responses are modulated by visual distance cues. To this end, we paired noise bursts at different sound pressure levels with synchronous visual cues at different distances. We hypothesized that noise bursts paired with far visual cues will be perceived louder and result in increased N1m amplitudes compared to a pairing with close visual cues. The rationale behind this was that listeners might compensate the visually induced object distance when processing loudness. Psychophysically, we observed no significant modulation of loudness judgments by visual cues. However, N1m MEG responses at about 100 ms after stimulus onset were significantly stronger for far versus close visual cues in the left auditory cortex. N1m responses in the right auditory cortex increased with increasing sound pressure level, but were not modulated by visual distance cues. Thus, our results suggest an audio-visual interaction in the left auditory cortex that is possibly related to cue integration for auditory distance processing.

  12. Reactivity to Cannabis Cues in Virtual Reality Environments†

    Science.gov (United States)

    Bordnick, Patrick S.; Copp, Hilary L.; Traylor, Amy; Graap, Ken M.; Carter, Brian L.; Walton, Alicia; Ferrer, Mirtha

    2014-01-01

    Virtual reality (VR) cue environments have been developed and successfully tested in nicotine, cocaine, and alcohol abusers. Aims in the current article include the development and testing of a novel VR cannabis cue reactivity assessment system. It was hypothesized that subjective craving levels and attention to cannabis cues would be higher in VR environments merits with cannabis cues compared to VR neutral environments. Twenty nontreatment-seeking current cannabis smokers participated in the VR cue trial. During the VR cue trial, participants were exposed to four virtual environments that contained audio, visual, olfactory, and vibrotactile sensory stimuli. Two VR environments contained cannabis cues that consisted of a party room in which people were smoking cannabis and a room containing cannabis paraphernalia without people. Two VR neutral rooms without cannabis cues consisted of a digital art gallery with nature videos. Subjective craving and attention to cues were significantly higher in the VR cannabis environments compared to the VR neutral environments. These findings indicate that VR cannabis cue reactivity may offer a new technology-based method to advance addiction research and treatment. PMID:19705672

  13. Reactivity to cannabis cues in virtual reality environments.

    Science.gov (United States)

    Bordnick, Patrick S; Copp, Hilary L; Traylor, Amy; Graap, Ken M; Carter, Brian L; Walton, Alicia; Ferrer, Mirtha

    2009-06-01

    Virtual reality (VR) cue environments have been developed and successfully tested in nicotine, cocaine, and alcohol abusers. Aims in the current article include the development and testing of a novel VR cannabis cue reactivity assessment system. It was hypothesized that subjective craving levels and attention to cannabis cues would be higher in VR environments with cannabis cues compared to VR neutral environments. Twenty nontreatment-seeking current cannabis smokers participated in the VR cue trial. During the VR cue trial, participants were exposed to four virtual environments that contained audio, visual, olfactory, and vibrotactile sensory stimuli. Two VR environments contained cannabis cues that consisted of a party room in which people were smoking cannabis and a room containing cannabis paraphernalia without people. Two VR neutral rooms without cannabis cues consisted of a digital art gallery with nature videos. Subjective craving and attention to cues were significantly higher in the VR cannabis environments compared to the VR neutral environments. These findings indicate that VR cannabis cue reactivity may offer a new technology-based method to advance addiction research and treatment.

  14. Neural correlates of multisensory reliability and perceptual weights emerge at early latencies during audio-visual integration

    OpenAIRE

    Boyle, Stephanie Claire; Kayser, Stephanie J.; Kayser, Christoph

    2017-01-01

    To make accurate perceptual estimates observers must take the reliability of sensory information into account. Despite many behavioural studies showing that subjects weight individual sensory cues in proportion to their reliabilities, it is still unclear when during a trial neuronal responses are modulated by the reliability of sensory information, or when they reflect the perceptual weights attributed to each sensory input. We investigated these questions using a combination of psychophysics...

  15. Multisensory and modality specific processing of visual speech in different regions of the premotor cortex.

    Science.gov (United States)

    Callan, Daniel E; Jones, Jeffery A; Callan, Akiko

    2014-01-01

    Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex (PMC) has been shown to be active during both observation and execution of action ("Mirror System" properties), and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI) study, participants identified vowels produced by a speaker in audio-visual (saw the speaker's articulating face and heard her voice), visual only (only saw the speaker's articulating face), and audio only (only heard the speaker's voice) conditions with varying audio signal-to-noise ratios in order to determine the regions of the PMC involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the functional magnetic resonance imaging (fMRI) analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and PMC. The left ventral inferior premotor cortex (PMvi) showed properties of multimodal (audio-visual) enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex (PMvs/PMd) did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the PMC are involved with mapping unimodal (in this case visual) sensory features of the speech signal with

  16. Multisensory and Modality Specific Processing of Visual Speech in Different Regions of the Premotor Cortex

    Directory of Open Access Journals (Sweden)

    Daniel eCallan

    2014-05-01

    Full Text Available Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex has been shown to be active during both observation and execution of action (‘Mirror System’ properties, and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI study, participants identified vowels produced by a speaker in audio-visual (saw the speaker’s articulating face and heard her voice, visual only (only saw the speaker’s articulating face, and audio only (only heard the speaker’s voice conditions with varying audio signal-to-noise ratios in order to determine the regions of the premotor cortex involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the fMRI analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and premotor cortex. The left ventral inferior premotor cortex showed properties of multimodal (audio-visual enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the premotor cortex are involved with mapping unimodal (in this case visual sensory features of the speech signal with

  17. Audio-visual synchronization in reading while listening to texts: Effects on visual behavior and verbal learning

    OpenAIRE

    Gerbier, Emilie; Bailly, Gérard; Bosse, Marie-Line

    2018-01-01

    International audience; Reading while listening to texts (RWL) is a promising way to improve the learning benefits provided by a reading experience. In an exploratory study, we investigated the effect of synchronizing the highlighting of words (visual) with their auditory (speech) counterpart during a RWL task. Forty French children from 3rd to 5th grade read short stories in their native language while hearing the story spoken by a narrator. In the non-synchronized (S-) condition the text wa...

  18. Alternative Media Technologies for the Open University. A Research Report on Costed Alternatives to the Direct Transmission of Audio-Visual Materials. Final Report. I.E.T. Papers on Broadcasting No. 79.

    Science.gov (United States)

    Bates, Tony; Kern, Larry

    This study examines alternatives to direct transmission of television and radio programs for courses with low student enrollment at the Open University. Examined are cut-off points in terms of student numbers at which alternative means of distributing audio or audio-visual materials become more economical than direct television or radio…

  19. Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

    Science.gov (United States)

    Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-01-01

    A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production

  20. Audio-Visual Biofeedback Does Not Improve the Reliability of Target Delineation Using Maximum Intensity Projection in 4-Dimensional Computed Tomography Radiation Therapy Planning

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Wei, E-mail: wlu@umm.edu [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States); Neuner, Geoffrey A.; George, Rohini; Wang, Zhendong; Sasor, Sarah [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States); Huang, Xuan [Research and Development, Care Management Department, Johns Hopkins HealthCare LLC, Glen Burnie, Maryland (United States); Regine, William F.; Feigenberg, Steven J.; D' Souza, Warren D. [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States)

    2014-01-01

    Purpose: To investigate whether coaching patients' breathing would improve the match between ITV{sub MIP} (internal target volume generated by contouring in the maximum intensity projection scan) and ITV{sub 10} (generated by combining the gross tumor volumes contoured in 10 phases of a 4-dimensional CT [4DCT] scan). Methods and Materials: Eight patients with a thoracic tumor and 5 patients with an abdominal tumor were included in an institutional review board-approved prospective study. Patients underwent 3 4DCT scans with: (1) free breathing (FB); (2) coaching using audio-visual (AV) biofeedback via the Real-Time Position Management system; and (3) coaching via a spirometer system (Active Breathing Coordinator or ABC). One physician contoured all scans to generate the ITV{sub 10} and ITV{sub MIP}. The match between ITV{sub MIP} and ITV{sub 10} was quantitatively assessed with volume ratio, centroid distance, root mean squared distance, and overlap/Dice coefficient. We investigated whether coaching (AV or ABC) or uniform expansions (1, 2, 3, or 5 mm) of ITV{sub MIP} improved the match. Results: Although both AV and ABC coaching techniques improved frequency reproducibility and ABC improved displacement regularity, neither improved the match between ITV{sub MIP} and ITV{sub 10} over FB. On average, ITV{sub MIP} underestimated ITV{sub 10} by 19%, 19%, and 21%, with centroid distance of 1.9, 2.3, and 1.7 mm and Dice coefficient of 0.87, 0.86, and 0.88 for FB, AV, and ABC, respectively. Separate analyses indicated a better match for lung cancers or tumors not adjacent to high-intensity tissues. Uniform expansions of ITV{sub MIP} did not correct for the mismatch between ITV{sub MIP} and ITV{sub 10}. Conclusions: In this pilot study, audio-visual biofeedback did not improve the match between ITV{sub MIP} and ITV{sub 10}. In general, ITV{sub MIP} should be limited to lung cancers, and modification of ITV{sub MIP} in each phase of the 4DCT data set is recommended.

  1. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond.

    Science.gov (United States)

    Narayanan, Shrikanth; Georgiou, Panayiotis G

    2013-02-07

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion.

  2. Recall and decay of consent information among parents of infants participating in a randomized controlled clinical trial using an audio-visual tool in The Gambia.

    Science.gov (United States)

    Mboizi, Robert B; Afolabi, Muhammed O; Okoye, Michael; Kampmann, Beate; Roca, Anna; Idoko, Olubukola T

    2017-09-02

    Communicating essential research information to low literacy research participants in Africa is highly challenging, since this population is vulnerable to poor comprehension of consent information. Several supportive materials have been developed to aid participant comprehension in these settings. Within the framework of a pneumococcal vaccine trial in The Gambia, we evaluated the recall and decay of consent information during the trial which used an audio-visual tool called 'Speaking Book', to foster comprehension among parents of participating infants. The Speaking Book was developed in the 2 most widely spoken local languages. Four-hundred and 9 parents of trial infants gave consent to participate in this nested study and were included in the baseline assessment of their knowledge about trial participation. An additional assessment was conducted approximately 90 d later, following completion of the clinical trial protocol. All parents received a Speaking Book at the start of the trial. Trial knowledge was already high at the baseline assessment with no differences related to socio-economic status or education. Knowledge of key trial information was retained at the completion of the study follow-up. The Speaking Book (SB) was well received by the study participants. We hypothesize that the SB may have contributed to the retention of information over the trial follow-up. Further studies evaluating the impact of this innovative tool are thus warranted.

  3. The Use of Chitosan of Pomacea canaliculata Shell as Natural Preservation to Maintain Fruit Quality during Storage Process: Used as Audio Visual Media of Biotechnology Learning

    Directory of Open Access Journals (Sweden)

    Yurike Fransischa Trisnaningrum

    2016-11-01

    Full Text Available Chitosan merupakan turunan kitin yang terbentuk dari proses deasetilasi yang bisa digunakan sebagai bahan pengawet alami yang efektif dan memiliki aktivitas antimikroba.Penelitian ini bertujuan untuk mengetahui perbedaan kandungan vitamin C, pH dan berat dalam buah jeruk, strowberi, dan pisang yang diawetkan dengan chitosan cangkang keong mas selama proses penyimpanan danuntuk mengetahui berapakah konsentrasi chitosan cangkang keong mas yang paling efektif sebagai bahan pengawet buah. Penelitian merupakan True Experimental Research yang dilaksanakan di Laboratorium KimiaUniversitas Muhammadiyah Malang. Rancangan penelitian yang digunakan adalah Rancangan Acak Kelompok (RAK dengan 6 perlakuan dan 4 kali ulangan pada jeruk, strowberi dan pisang yaitu konsentrasi chitosan 1,5%, 2%, 2,5%, 3% dan 3,5%. Analisis data menggunakan analisis Blok Acak dan uji beda jarak nyata Duncan pada taraf signifikansi 0,05.Hasil penelitian menunjukkan ada pengaruh terhadap buah. Perubahan kandungan vitamin C dan berat paling kecil terjadi pada perlakuan 2,5% dan paling besar pada perlakuan kontrol. Pemberian konsentrasi chitosan cangkang keong mas2,5% yang paling efektif mempengaruhi kandungan vitamin C dan berat buah jeruk, strowberi dan pisang. Hasil penelitian diaplikasikan sebagai media audio visual dalam pembelajaran Bioteknologi

  4. The challenge of reducing scientific complexity for different target groups (without losing the essence) - experiences from interdisciplinary audio-visual media production

    Science.gov (United States)

    Hezel, Bernd; Broschkowski, Ephraim; Kropp, Jürgen

    2013-04-01

    The Climate Media Factory originates from an interdisciplinary media lab run by the Film and Television University "Konrad Wolf" Potsdam-Babelsberg (HFF) and the Potsdam Institute for Climate Impact Research (PIK). Climate scientists, authors, producers and media scholars work together to develop media products on climate change and sustainability. We strive towards communicating scientific content via different media platforms reconciling the communication needs of scientists and the audience's need to understand the complexity of topics that are relevant in their everyday life. By presenting four audio-visual examples, that have been designed for very different target groups, we show (i) the interdisciplinary challenges during the production process and the lessons learnt and (ii) possibilities to reach the required degree of simplification without the need for dumbing down the content. "We know enough about climate change" is a short animated film that was produced for the German Agency for International Cooperation (GIZ) for training programs and conferences on adaptation in the target countries including Indonesia, Tunisia and Mexico. "Earthbook" is a short animation produced for "The Year of Science" to raise awareness for the topics of sustainability among digital natives. "What is Climate Engineering?". Produced for the Institute for Advanced Sustainability Studies (IASS) the film is meant for an informed and interested public. "Wimmelwelt Energie!" is a prototype of an iPad application for children from 4-6 years of age to help them learn about different forms of energy and related greenhouse gas emissions.

  5. Contour identification with pitch and loudness cues using cochlear implants

    OpenAIRE

    Luo, Xin; Masterson, Megan E.; Wu, Ching-Chih

    2013-01-01

    Different from speech, pitch and loudness cues may or may not co-vary in music. Cochlear implant (CI) users with poor pitch perception may use loudness contour cues more than normal-hearing (NH) listeners. Contour identification was tested in CI users and NH listeners; the five-note contours contained either pitch cues alone, loudness cues alone, or both. Results showed that NH listeners' contour identification was better with pitch cues than with loudness cues; CI users performed similarly w...

  6. Talker Variability in Audiovisual Speech Perception

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-07-01

    Full Text Available A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition. So far, this talker-variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target-word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  7. Gestión documental de la información audiovisual deportiva en las televisiones generalistas Documentary management of the sport audio-visual information in the generalist televisions

    Directory of Open Access Journals (Sweden)

    Jorge Caldera Serrano

    2005-01-01

    Full Text Available Se analiza la gestión de la información audiovisual deportiva en el marco de los Sistemas de Información Documental de las cadenas estatales, zonales y locales. Para ello se realiza un realiza un recorrido por la cadena documental que realiza la información audiovisual deportiva con el fin de ir analizando cada uno de los parámetros, mostrando así una serie de recomendaciones y normativas para la confección del registro audiovisual deportivo. Evidentemente la documentación deportiva audiovisual no se diferencia en exceso del análisis de otros tipos documentales televisivos por lo que se lleva a cabo una profundización yampliación de su gestión y difusión, mostrando el flujo informacional dentro del Sistema.The management of the sport audio-visual documentation of the Information Systems of the state, zonal and local chains is analyzed within the framework. For it it is made makes a route by the documentary chain that makes the sport audio-visual information with the purpose of being analyzing each one of the parameters, showing therefore a series of recommendations and norms for the preparation of the sport audio-visual registry. Evidently the audio-visual sport documentation difference in excess of the analysis of other televising documentary types reason why is not carried out a deepening and extension of its management and diffusion, showing the informational flow within the System.

  8. PENGGUNAAN MEDIA AUDIO VISUAL UNTUK MENINGKATKAN MOTIVASI DAN HASIL BELAJAR MATA PELAJARAN SISTEM BAHAN BAKAR MOTOR BENSIN SISWA KELAS XI TSM DI SMK BINA MANDIRI KLAMPOK BANJARNEGARA TAHUN AJARAN 2015/2016

    Directory of Open Access Journals (Sweden)

    Arifin Prasetya

    2016-06-01

    Full Text Available This study aimed to know (1 the use of audio visual to improve learning motivation and (2 learning achievement of motor fuel system among the eleventh grade students of vocational school Bina Mandiri Klampok Banjarnegara in academic year 2015/2016. The type of this study was a classroom action research. Data collection techniques used questionnaire to know learning motivation and test to know learning achievement of motor fuel system. The research instruments were questionnaire and test. Data analysis technique used qualitative analysis.  This study shows that (1 the use of audio visual could improve learning motivation among the eleventh grade students of vocational school Bina Mandiri Klampok Banjarnegara in academic year 2015/2016. The average percentage of learning motivation in cycle I was 70.03 in fair category at the interval between 65,85 <  ≤ 84,15. The average percentage of learning motivation in cycle II was 84.75 in fair category at the interval between 84,15 <  ≤ 102,45. The average percentage of learning motivation in cycle III was 100.34 in fair category at the interval between 65,85  <  ≤ 84,15.  (2 The use of audio visual could improve learning achievement of motor fuel system among the eleventh grade students of vocational school Bina Mandiri Klampok Banjarnegara in academic year 2015/2016. The average score of test in pre cycle was 61.29, cycle I was 71.39, cycle II was 76.59, and cycle III was 81.15.  Based on these results, it could be concluded that the use of audio visual could improve learning motivation and learning achievement of motor fuel system among the eleventh grade students of vocational school Bina Mandiri Klampok Banjarnegara in academic year 2015/2016

  9. PEMBELAJARAN LAY UP SHOOT MENGGUNAKAN MEDIA AUDIO VISUAL BASIC LAY UP SHOOT UNTUK MENINGKATKAN HASILBELAJAR LAY UP SHOOT PADA SISWA KELAS VIIIA SMP KANISIUS PATI TAHUN 2013/2014

    Directory of Open Access Journals (Sweden)

    Frendy Nurochwan Febryanto

    2015-01-01

    Full Text Available The purpose of this study was to determine the learning lay up shoot using basic audiovisual media shoot lay ups can improve learning outcomes shoot lay ups in class VIIIA Starch Canisius junior year 2013/2014 . This study uses Classroom Action Research ( CAR. The technique of collecting data through observation and assessment of learning outcomes shoot basketball lay up. Data analysis techniques used in this research is descriptive . At the end of the first cycle activity of teachers in teaching basic techniques lay up shoot using audio-visual media reaches 76.19%, whereas at the end of the first cycle of student activity during the learning process lay up shoot using audio-visualmediareaches78.57%. At the end of the second cycle of activity of teachers in teaching basic techniques lay up shoot using audio-visual media reaches 85.71%, whereas at the end of the second cycle of activity of students during the learning process lay up shoot using audio-visual media reaches 92.86%. Based on the results of the study it can be concluded that learning the lay-up shoot using basic audiovisual media shoot lay ups can improve student learning outcomes at Canisius junior class VIIIA Pati year 2013/2014.

  10. La regulación audiovisual: argumentos a favor y en contra The audio-visual regulation: the arguments for and against

    Directory of Open Access Journals (Sweden)

    Jordi Sopena Palomar

    2008-03-01

    Full Text Available El artículo analiza la efectividad de la regulación audiovisual y valora los diversos argumentos a favor y en contra de la existencia de consejos reguladores a nivel estatal. El debate sobre la necesidad de un organismo de este calado en España todavía persiste. La mayoría de los países comunitarios se han dotado de consejos competentes en esta materia, como es el caso del OFCOM en el Reino Unido o el CSA en Francia. En España, la regulación audiovisual se limita a organismos de alcance autonómico, como son el Consejo Audiovisual de Navarra, el de Andalucía y el Consell de l’Audiovisual de Catalunya (CAC, cuyo modelo también es abordado en este artículo. The article analyzes the effectiveness of the audio-visual regulation and assesses the different arguments for and against the existence of the broadcasting authorities at the state level. The debate of the necessity of a Spanish organism of regulation is still active. Most of the European countries have created some competent authorities, like the OFCOM in United Kingdom and the CSA in France. In Spain, the broadcasting regulation is developed by regional organisms, like the Consejo Audiovisual de Navarra, the Consejo Audiovisual de Andalucía and the Consell de l’Audiovisual de Catalunya (CAC, whose case is also studied in this article.

  11. The Audio-Visual Man.

    Science.gov (United States)

    Babin, Pierre, Ed.

    A series of twelve essays discuss the use of audiovisuals in religious education. The essays are divided into three sections: one which draws on the ideas of Marshall McLuhan and other educators to explore the newest ideas about audiovisual language and faith, one that describes how to learn and use the new language of audio and visual images, and…

  12. Audio-Visual Materials Catalog.

    Science.gov (United States)

    Anderson (M.D.) Hospital and Tumor Inst., Houston, TX.

    This catalog lists 27 audiovisual programs produced by the Department of Medical Communications of the University of Texas M. D. Anderson Hospital and Tumor Institute for public distribution. Video tapes, 16 mm. motion pictures and slide/audio series are presented dealing mostly with cancer and related subjects. The programs are intended for…

  13. Cued speech for enhancing speech perception and first language development of children with cochlear implants.

    Science.gov (United States)

    Leybaert, Jacqueline; LaSasso, Carol J

    2010-06-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants.

  14. Cued Speech for Enhancing Speech Perception and First Language Development of Children With Cochlear Implants

    Science.gov (United States)

    Leybaert, Jacqueline; LaSasso, Carol J.

    2010-01-01

    Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

  15. An Evaluation of the Usefulness of Prosodic and Lexical Cues for Understanding Synthesized Speech of Mathematics. Research Report No. RR-16-33

    Science.gov (United States)

    Frankel, Lois; Brownstein, Beth

    2016-01-01

    The work described in this report is the second phase of a project to provide easy-to-use tools for authoring and rendering secondaryschool algebra-levelmath expressions insynthesized speech that is useful for studentswithblindnessor lowvision.This report describes the development and results of the second feedback study performed for our project,…

  16. BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

    Directory of Open Access Journals (Sweden)

    A. A. Karpov

    2014-09-01

    Full Text Available We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating all the components for generation of audio, visual and signed speech. The proposed system performs automatic translation of input textual information into speech (audio information and gestures (video information, information fusion and its output in the form of multimedia information. A user can input any grammatically correct text in Russian or Czech languages to the system; it is analyzed by the text processor to detect sentences, words and characters. Then this textual information is converted into symbols of the sign language notation. We apply international «Hamburg Notation System» - HamNoSys, which describes the main differential features of each manual sign: hand shape, hand orientation, place and type of movement. On their basis the 3D signing avatar displays the elements of the sign language. The virtual 3D model of human’s head and upper body has been created using VRML virtual reality modeling language, and it is controlled by the software based on OpenGL graphical library. The developed multimodal synthesis system is a universal one since it is oriented for both regular users and disabled people (in particular, for the hard-of-hearing and visually impaired, and it serves for multimedia output (by audio and visual modalities of input textual information.

  17. Acoustic cues for emotions in vocal expression and music

    OpenAIRE

    Erixon, Pauline

    2015-01-01

    Previous research shows that emotional expressions in speech and music use similar patterns of acoustic cues to communicate discrete emotions. The aim of the present study was to experimentally test if manipulation of the acoustic cues; F0, F0 variability, loudness, loudness variability and speech rate/tempo, affects the identification of discrete emotions in speech and music. Forty recordings of actors and musicians expressing anger, fear, happiness, sadness and tenderness were manipulated t...

  18. Speech misperception: speaking and seeing interfere differently with hearing.

    Directory of Open Access Journals (Sweden)

    Takemi Mochida

    Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.

  19. Speech misperception: speaking and seeing interfere differently with hearing.

    Science.gov (United States)

    Mochida, Takemi; Kimura, Toshitaka; Hiroya, Sadao; Kitagawa, Norimichi; Gomi, Hiroaki; Kondo, Tadahisa

    2013-01-01

    Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.

  20. Visual speech influences speech perception immediately but not automatically.

    Science.gov (United States)

    Mitterer, Holger; Reinisch, Eva

    2017-02-01

    Two experiments examined the time course of the use of auditory and visual speech cues to spoken word recognition using an eye-tracking paradigm. Results of the first experiment showed that the use of visual speech cues from lipreading is reduced if concurrently presented pictures require a division of attentional resources. This reduction was evident even when listeners' eye gaze was on the speaker rather than the (static) pictures. Experiment 2 used a deictic hand gesture to foster attention to the speaker. At the same time, the visual processing load was reduced by keeping the visual display constant over a fixed number of successive trials. Under these conditions, the visual speech cues from lipreading were used. Moreover, the eye-tracking data indicated that visual information was used immediately and even earlier than auditory information. In combination, these data indicate that visual speech cues are not used automatically, but if they are used, they are used immediately.

  1. Colliding Cues in Word Segmentation: The Role of Cue Strength and General Cognitive Processes

    Science.gov (United States)

    Weiss, Daniel J.; Gerfen, Chip; Mitchel, Aaron D.

    2010-01-01

    The process of word segmentation is flexible, with many strategies potentially available to learners. This experiment explores how segmentation cues interact, and whether successful resolution of cue competition is related to general executive functioning. Participants listened to artificial speech streams that contained both statistical and…

  2. Theater, Speech, Light

    Directory of Open Access Journals (Sweden)

    Primož Vitez

    2011-07-01

    Full Text Available This paper considers a medium as a substantial translator: an intermediary between the producers and receivers of a communicational act. A medium is a material support to the spiritual potential of human sources. If the medium is a support to meaning, then the relations between different media can be interpreted as a space for making sense of these meanings, a generator of sense: it means that the interaction of substances creates an intermedial space that conceives of a contextualization of specific meaningful elements in order to combine them into the sense of a communicational intervention. The theater itself is multimedia. A theatrical event is a communicational act based on a combination of several autonomous structures: text, scenography, light design, sound, directing, literary interpretation, speech, and, of course, the one that contains all of these: the actor in a human body. The actor is a physical and symbolic, anatomic, and emblematic figure in the synesthetic theatrical act because he reunites in his body all the essential principles and components of theater itself. The actor is an audio-visual being, made of kinetic energy, speech, and human spirit. The actor’s body, as a source, instrument, and goal of the theater, becomes an intersection of sound and light. However, theater as intermedial art is no intermediate practice; it must be seen as interposing bodies between conceivers and receivers, between authors and auditors. The body is not self-evident; the body in contemporary art forms is being redefined as a privilege. The art needs bodily dimensions to explore the medial qualities of substances: because it is alive, it returns to studying biology. The fact that theater is an archaic art form is also the purest promise of its future.

  3. Estimating the relative weights of visual and auditory tau versus heuristic-based cues for time-to-contact judgments in realistic, familiar scenes by older and younger adults.

    Science.gov (United States)

    Keshavarz, Behrang; Campos, Jennifer L; DeLucia, Patricia R; Oberfeld, Daniel

    2017-04-01

    Estimating time to contact (TTC) involves multiple sensory systems, including vision and audition. Previous findings suggested that the ratio of an object's instantaneous optical size/sound intensity to its instantaneous rate of change in optical size/sound intensity (τ) drives TTC judgments. Other evidence has shown that heuristic-based cues are used, including final optical size or final sound pressure level. Most previous studies have used decontextualized and unfamiliar stimuli (e.g., geometric shapes on a blank background). Here we evaluated TTC estimates by using a traffic scene with an approaching vehicle to evaluate the weights of visual and auditory TTC cues under more realistic conditions. Younger (18-39 years) and older (65+ years) participants made TTC estimates in three sensory conditions: visual-only, auditory-only, and audio-visual. Stimuli were presented within an immersive virtual-reality environment, and cue weights were calculated for both visual cues (e.g., visual τ, final optical size) and auditory cues (e.g., auditory τ, final sound pressure level). The results demonstrated the use of visual τ as well as heuristic cues in the visual-only condition. TTC estimates in the auditory-only condition, however, were primarily based on an auditory heuristic cue (final sound pressure level), rather than on auditory τ. In the audio-visual condition, the visual cues dominated overall, with the highest weight being assigned to visual τ by younger adults, and a more equal weighting of visual τ and heuristic cues in older adults. Overall, better characterizing the effects of combined sensory inputs, stimulus characteristics, and age on the cues used to estimate TTC will provide important insights into how these factors may affect everyday behavior.

  4. Parametric Representation of the Speaker's Lips for Multimodal Sign Language and Speech Recognition

    Science.gov (United States)

    Ryumin, D.; Karpov, A. A.

    2017-05-01

    In this article, we propose a new method for parametric representation of human's lips region. The functional diagram of the method is described and implementation details with the explanation of its key stages and features are given. The results of automatic detection of the regions of interest are illustrated. A speed of the method work using several computers with different performances is reported. This universal method allows applying parametrical representation of the speaker's lipsfor the tasks of biometrics, computer vision, machine learning, and automatic recognition of face, elements of sign languages, and audio-visual speech, including lip-reading.

  5. Neural entrainment to rhythmic speech in children with developmental dyslexia

    Science.gov (United States)

    Power, Alan J.; Mead, Natasha; Barnes, Lisa; Goswami, Usha

    2013-01-01

    A rhythmic paradigm based on repetition of the syllable “ba” was used to study auditory, visual, and audio-visual oscillatory entrainment to speech in children with and without dyslexia using EEG. Children pressed a button whenever they identified a delay in the isochronous stimulus delivery (500 ms; 2 Hz delta band rate). Response power, strength of entrainment and preferred phase of entrainment in the delta and theta frequency bands were compared between groups. The quality of stimulus representation was also measured using cross-correlation of the stimulus envelope with the neural response. The data showed a significant group difference in the preferred phase of entrainment in the delta band in response to the auditory and audio-visual stimulus streams. A different preferred phase has significant implications for the quality of speech information that is encoded neurally, as it implies enhanced neuronal processing (phase alignment) at less informative temporal points in the incoming signal. Consistent with this possibility, the cross-correlogram analysis revealed superior stimulus representation by the control children, who showed a trend for larger peak r-values and significantly later lags in peak r-values compared to participants with dyslexia. Significant relationships between both peak r-values and peak lags were found with behavioral measures of reading. The data indicate that the auditory temporal reference frame for speech processing is atypical in developmental dyslexia, with low frequency (delta) oscillations entraining to a different phase of the rhythmic syllabic input. This would affect the quality of encoding of speech, and could underlie the cognitive impairments in phonological representation that are the behavioral hallmark of this developmental disorder across languages. PMID:24376407

  6. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

    Directory of Open Access Journals (Sweden)

    Patterson Eric K

    2002-01-01

    Full Text Available Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are

  7. Congruent and Incongruent Cues in Highly Familiar Audiovisual Action Sequences: An ERP Study

    Directory of Open Access Journals (Sweden)

    SM Wuerger

    2012-07-01

    Full Text Available In a previous fMRI study we found significant differences in BOLD responses for congruent and incongruent semantic audio-visual action sequences (whole-body actions and speech actions in bilateral pSTS, left SMA, left IFG, and IPL (Meyer, Greenlee, & Wuerger, JOCN, 2011. Here, we present results from a 128-channel ERP study that examined the time-course of these interactions using a one-back task. ERPs in response to congruent and incongruent audio-visual actions were compared to identify regions and latencies of differences. Responses to congruent and incongruent stimuli differed between 240–280 ms, 340–420 ms, and 460–660 ms after stimulus onset. A dipole analysis revealed that the difference around 250 ms can be partly explained by a modulation of sources in the vicinity of the superior temporal area, while the responses after 400 ms are consistent with sources in inferior frontal areas. Our results are in line with a model that postulates early recognition of congruent audiovisual actions in the pSTS, perhaps as a sensory memory buffer, and a later role of the IFG, perhaps in a generative capacity, in reconciling incongruent signals.

  8. Speech Problems

    Science.gov (United States)

    ... Plan Hot Topics Flu Facts Arrhythmias Abuse Speech Problems KidsHealth > For Teens > Speech Problems Print A A ... form speech sounds into words. What Causes Speech Problems? Normal speech might seem effortless, but it's actually ...

  9. Distance Learning: Effectiveness of an Interdisciplinary Course in Speech Pathology and Dentistry

    Directory of Open Access Journals (Sweden)

    Janine S Ramos

    2015-08-01

    Full Text Available Objective: Evaluate the effectiveness of distance learning courses for the purpose of interdisciplinary continuing education in Speech Pathology and Dentistry. Methods: The online course was made available on the Moodle platform. A total of 30 undergraduates participated in the study (15 from the Dentistry course and 15 from the Speech Pathology course. Their knowledge was evaluated before and after the course, in addition to the user satisfaction by means of specific questionnaires. The course was evaluated by 6 specialists on the following aspects: presentation and quality of the content, audio-visual quality, adequacy to the target public, and information made available. To compare the obtained results in the pre- and post-course questionnaires, the test Wilcoxon was carried out, with a 5% significance level. Results: the teaching/learning process, including the theoretical/practical application for the interdisciplinary training, proved to be effective as there was a statistically significant difference between the pre- and post- course evaluations (p<0.001, the users’ satisfaction degree was favorable and the specialists evaluated the material as adequate regarding the target public, the audio-visual information quality and the strategies of content availability. Conclusion: The suggested distance-learning course proved to be effective for the purpose of Speech Pathology and Dentistry interdisciplinary education.

  10. The development of co-speech gesture in the communication of children with autism spectrum disorders.

    Science.gov (United States)

    Sowden, Hannah; Clegg, Judy; Perkins, Michael

    2013-12-01

    Co-speech gestures have a close semantic relationship to speech in adult conversation. In typically developing children co-speech gestures which give additional information to speech facilitate the emergence of multi-word speech. A difficulty with integrating audio-visual information is known to exist for individuals with Autism Spectrum Disorder (ASD), which may affect development of the speech-gesture system. A longitudinal observational study was conducted with four children with ASD, aged 2;4 to 3;5 years. Participants were video-recorded for 20 min every 2 weeks during their attendance on an intervention programme. Recording continued for up to 8 months, thus affording a rich analysis of gestural practices from pre-verbal to multi-word speech across the group. All participants combined gesture with either speech or vocalisations. Co-speech gestures providing additional information to speech were observed to be either absent or rare. Findings suggest that children with ASD do not make use of the facilitating communicative effects of gesture in the same way as typically developing children.

  11. Acoustic cues to Nehiyawewin constituency

    Science.gov (United States)

    Cook, Clare; Muehlbauer, Jeff

    2005-04-01

    This study examines how speakers use acoustic cues, e.g., pitch and pausing, to establish syntactic and semantic constituents in Nehiyawewin, an Algonquian language. Two Nehiyawewin speakers autobiographies, which have been recorded, transcribed, and translated by H. C. Wolfart in collaboration with a native speaker of Nehiyawewin, provide natural-speech data for the study. Since it is difficult for a non-native-speaker to reliably distinguish Nehiyawewin constituents, an intermediary is needed. The transcription provides this intermediary through punctuation marks (commas, semi-colons, em-dashes, periods), which have been shown to consistently mark constituency structure [Nunberg, CSLI 1990]. The acoustic cues are thus mapped onto the punctuated constituents, and then similar constituents are compared to see what acoustic cues they share. Preliminarily, the clearest acoustic signal to a constituent boundary is a pitch drop preceding the boundary and/or a pitch reset on the syllable following the boundary. Further, constituent boundaries marked by a period consistently end on a low pitch, are followed by a pitch reset of 30-90 Hz and have an average pause of 1.9 seconds. I also discuss cross-speaker cues, and prosodic cues that do not correlate to punctuation, with implications for the transcriptional view of orthography [Marckwardt, Oxford 1942].

  12. Promoting smoke-free homes: a novel behavioral intervention using real-time audio-visual feedback on airborne particle levels.

    Directory of Open Access Journals (Sweden)

    Neil E Klepeis

    Full Text Available Interventions are needed to protect the health of children who live with smokers. We pilot-tested a real-time intervention for promoting behavior change in homes that reduces second hand tobacco smoke (SHS levels. The intervention uses a monitor and feedback system to provide immediate auditory and visual signals triggered at defined thresholds of fine particle concentration. Dynamic graphs of real-time particle levels are also shown on a computer screen. We experimentally evaluated the system, field-tested it in homes with smokers, and conducted focus groups to obtain general opinions. Laboratory tests of the monitor demonstrated SHS sensitivity, stability, precision equivalent to at least 1 µg/m(3, and low noise. A linear relationship (R(2 = 0.98 was observed between the monitor and average SHS mass concentrations up to 150 µg/m(3. Focus groups and interviews with intervention participants showed in-home use to be acceptable and feasible. The intervention was evaluated in 3 homes with combined baseline and intervention periods lasting 9 to 15 full days. Two families modified their behavior by opening windows or doors, smoking outdoors, or smoking less. We observed evidence of lower SHS levels in these homes. The remaining household voiced reluctance to changing their smoking activity and did not exhibit lower SHS levels in main smoking areas or clear behavior change; however, family members expressed receptivity to smoking outdoors. This study established the feasibility of the real-time intervention, laying the groundwork for controlled trials with larger sample sizes. Visual and auditory cues may prompt family members to take immediate action to reduce SHS levels. Dynamic graphs of SHS levels may help families make decisions about specific mitigation approaches.

  13. Conversation, speech acts, and memory.

    Science.gov (United States)

    Holtgraves, Thomas

    2008-03-01

    Speakers frequently have specific intentions that they want others to recognize (Grice, 1957). These specific intentions can be viewed as speech acts (Searle, 1969), and I argue that they play a role in long-term memory for conversation utterances. Five experiments were conducted to examine this idea. Participants in all experiments read scenarios ending with either a target utterance that performed a specific speech act (brag, beg, etc.) or a carefully matched control. Participants were more likely to falsely recall and recognize speech act verbs after having read the speech act version than after having read the control version, and the speech act verbs served as better recall cues for the speech act utterances than for the controls. Experiment 5 documented individual differences in the encoding of speech act verbs. The results suggest that people recognize and retain the actions that people perform with their utterances and that this is one of the organizing principles of conversation memory.

  14. Development in Children's Interpretation of Pitch Cues to Emotions

    Science.gov (United States)

    Quam, Carolyn; Swingley, Daniel

    2012-01-01

    Young infants respond to positive and negative speech prosody (A. Fernald, 1993), yet 4-year-olds rely on lexical information when it conflicts with paralinguistic cues to approval or disapproval (M. Friend, 2003). This article explores this surprising phenomenon, testing one hundred eighteen 2- to 5-year-olds' use of isolated pitch cues to…

  15. The Personalized Cueing Method: From the Laboratory to the Clinic

    Science.gov (United States)

    Marshall, Robert C.; Freed, Donald B.

    2006-01-01

    Purpose: The personalized cueing method is a novel procedure for treating naming deficits of persons with aphasia that is relatively unfamiliar to most speech-language pathologists. The goal of this article is to introduce the personalized cueing method to clinicians so that it might be expanded and improved upon. It is also hoped that this…

  16. Temporal visual cues aid speech recognition

    DEFF Research Database (Denmark)

    Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue

    2006-01-01

    of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p......BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize...... that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features...

  17. Individual Sensitivity to Spectral and Temporal Cues in Listeners with Hearing Impairment

    Science.gov (United States)

    Souza, Pamela E.; Wright, Richard A.; Blackburn, Michael C.; Tatman, Rachael; Gallun, Frederick J.

    2015-01-01

    Purpose: The present study was designed to evaluate use of spectral and temporal cues under conditions in which both types of cues were available. Method: Participants included adults with normal hearing and hearing loss. We focused on 3 categories of speech cues: static spectral (spectral shape), dynamic spectral (formant change), and temporal…

  18. The Function of Consciousness in Multisensory Integration

    Science.gov (United States)

    Palmer, Terry D.; Ramsey, Ashley K.

    2012-01-01

    The function of consciousness was explored in two contexts of audio-visual speech, cross-modal visual attention guidance and McGurk cross-modal integration. Experiments 1, 2, and 3 utilized a novel cueing paradigm in which two different flash suppressed lip-streams cooccured with speech sounds matching one of these streams. A visual target was…

  19. Automatic discrimination between laughter and speech

    NARCIS (Netherlands)

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the

  20. Effects of Multisensory Speech Training and Visual Phonics on Speech Production of a Hearing-Impaired Child.

    Science.gov (United States)

    Zaccagnini, Cindy M.; Antia, Shirin D.

    1993-01-01

    This study of the effects of intensive multisensory speech training on the speech production of a profoundly hearing-impaired child (age nine) found that the addition of Visual Phonics hand cues did not result in speech production gains. All six target phonemes were generalized to new words and maintained after the intervention was discontinued.…

  1. Expectations and speech intelligibility.

    Science.gov (United States)

    Babel, Molly; Russell, Jamie

    2015-05-01

    Socio-indexical cues and paralinguistic information are often beneficial to speech processing as this information assists listeners in parsing the speech stream. Associations that particular populations speak in a certain speech style can, however, make it such that socio-indexical cues have a cost. In this study, native speakers of Canadian English who identify as Chinese Canadian and White Canadian read sentences that were presented to listeners in noise. Half of the sentences were presented with a visual-prime in the form of a photo of the speaker and half were presented in control trials with fixation crosses. Sentences produced by Chinese Canadians showed an intelligibility cost in the face-prime condition, whereas sentences produced by White Canadians did not. In an accentedness rating task, listeners rated White Canadians as less accented in the face-prime trials, but Chinese Canadians showed no such change in perceived accentedness. These results suggest a misalignment between an expected and an observed speech signal for the face-prime trials, which indicates that social information about a speaker can trigger linguistic associations that come with processing benefits and costs.

  2. Audio-Visual Peripheral Localization Disparity

    Directory of Open Access Journals (Sweden)

    Ryota Miyauchi

    2011-10-01

    Full Text Available In localizing simultaneous auditory and visual events, the brain should map the audiovisual events onto a unified perceptual space in a subsequent spatial process for integrating and/or comparing multisensory information. However, there is little qualitative and quantitative psychological data for estimating multisensory localization in peripheral visual fields. We measured the relative perceptual direction of a sound to a flash when they were simultaneously presented in peripheral visual fields. The results demonstrated that the sound and flash were perceptually located at the same position when the sound was presented in 5 deg-periphery from the flash. This phenomenon occurred even excluding the trial in which the participants' eyes moved. The measurement of the location of each sound and flash in a pointing task showed that the perceptual location of the sound shifted toward the frontal direction and conversely the perceptual location of the flash shifted toward the periphery. Our findings suggest that unisensory perceptual spaces of audition and vision have deviations in peripheral visual fields and, when the brain remaps unisensory locations of auditory and visual events into unified perceptual space, the unisensory spatial information of the events can be suitably maintained.

  3. Music in Audio-Visual Materials.

    Science.gov (United States)

    Jaspers, Fons

    1991-01-01

    Reviews literature on music as a component of instructional materials. The relationship between music and emotion is examined; the use and effects of music are discussed; music as nonverbal communication is considered; effects on cognitive and attitudinal learning results are described; and emotional, cognitive, and structural needs are discussed.…

  4. Audio-Visual Materials for Chinese Studies.

    Science.gov (United States)

    Ching, Eugene, Comp.; Ching, Nora C., Comp.

    This publication is designed for teachers of Chinese language and culture who are interested in using audiovisual materials to supplement classroom instruction. The listings objectively present materials which are available; the compilers have not attempted to evaluate them. Content includes historical studies, techniques of brush painting, myths,…

  5. P300 audio-visual speller

    NARCIS (Netherlands)

    Belitski, A.; Farquhar, J.D.R.; Desain, P.W.M.

    2011-01-01

    The Farwell and Donchin matrix speller is well known as one of the highest performing brain-computer interfaces (BCIs) currently available. However, its use of visual stimulation limits its applicability to users with normal eyesight. Alternative BCI spelling systems which rely on non-visual

  6. Audio-Visual Classification of Sports Types

    DEFF Research Database (Denmark)

    Gade, Rikke; Abou-Zleikha, Mohamed; Christensen, Mads Græsbøll

    2015-01-01

    In this work we propose a method for classification of sports types from combined audio and visual features ex- tracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modality...... short trajectories are constructed to rep- resent the motion of players. From these, four motion fea- tures are extracted and combined directly with audio fea- tures for classification. A k-nearest neighbour classifier is applied for classification of 180 1-minute video sequences from three sports types...

  7. Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

    Science.gov (United States)

    Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

    2015-07-01

    It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line

  8. Plasticity in bilateral superior temporal cortex: Effects of deafness and cochlear implantation on auditory and visual speech processing.

    Science.gov (United States)

    Anderson, Carly A; Lazard, Diane S; Hartley, Douglas E H

    2017-01-01

    While many individuals can benefit substantially from cochlear implantation, the ability to perceive and understand auditory speech with a cochlear implant (CI) remains highly variable amongst adult recipients. Importantly, auditory performance with a CI cannot be reliably predicted based solely on routinely obtained information regarding clinical characteristics of the CI candidate. This review argues that central factors, notably cortical function and plasticity, should also be considered as important contributors to the observed individual variability in CI outcome. Superior temporal cortex (STC), including auditory association areas, plays a crucial role in the processing of auditory and visual speech information. The current review considers evidence of cortical plasticity within bilateral STC, and how these effects may explain variability in CI outcome. Furthermore, evidence of audio-visual interactions in temporal and occipital cortices is examined, and relation to CI outcome is discussed. To date, longitudinal examination of changes in cortical function and plasticity over the period of rehabilitation with a CI has been restricted by methodological challenges. The application of functional near-infrared spectroscopy (fNIRS) in studying cortical function in CI users is becoming increasingly recognised as a potential solution to these problems. Here we suggest that fNIRS offers a powerful neuroimaging tool to elucidate the relationship between audio-visual interactions, cortical plasticity during deafness and following cochlear implantation, and individual variability in auditory performance with a CI. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Phonetic matching of auditory and visual speech develops during childhood : Evidence from sine-wave speech

    NARCIS (Netherlands)

    Baart, M.; Bortfeld, H.; Vroomen, J.

    2015-01-01

    The correspondence between auditory speech and lip-read information can be detected based on a combination of temporal and phonetic cross-modal cues. Here, we determined the point in developmental time at which children start to effectively use phonetic information to match a speech sound with one

  10. Enhancing Speech Intelligibility: Interactions among Context, Modality, Speech Style, and Masker

    Science.gov (United States)

    Van Engen, Kristin J.; Phelps, Jasmine E. B.; Smiljanic, Rajka; Chandrasekaran, Bharath

    2014-01-01

    Purpose: The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method: Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous…

  11. Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

    Directory of Open Access Journals (Sweden)

    Vincent Aubanel

    2016-08-01

    Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

  12. Children use visual speech to compensate for non-intact auditory speech.

    Science.gov (United States)

    Jerger, Susan; Damian, Markus F; Tye-Murray, Nancy; Abdi, Hervé

    2014-10-01

    We investigated whether visual speech fills in non-intact auditory speech (excised consonant onsets) in typically developing children from 4 to 14 years of age. Stimuli with the excised auditory onsets were presented in the audiovisual (AV) and auditory-only (AO) modes. A visual speech fill-in effect occurs when listeners experience hearing the same non-intact auditory stimulus (e.g., /-b/ag) as different depending on the presence/absence of visual speech such as hearing /bag/ in the AV mode but hearing /ag/ in the AO mode. We quantified the visual speech fill-in effect by the difference in the number of correct consonant onset responses between the modes. We found that easy visual speech cues /b/ provided greater filling in than difficult cues /g/. Only older children benefited from difficult visual speech cues, whereas all children benefited from easy visual speech cues, although 4- and 5-year-olds did not benefit as much as older children. To explore task demands, we compared results on our new task with those on the McGurk task. The influence of visual speech was uniquely associated with age and vocabulary abilities for the visual speech fill--in effect but was uniquely associated with speechreading skills for the McGurk effect. This dissociation implies that visual speech--as processed by children-is a complicated and multifaceted phenomenon underpinned by heterogeneous abilities. These results emphasize that children perceive a speaker's utterance rather than the auditory stimulus per se. In children, as in adults, there is more to speech perception than meets the ear. Copyright © 2014 Elsevier Inc. All rights reserved.

  13. Speech Compression

    Directory of Open Access Journals (Sweden)

    Jerry D. Gibson

    2016-06-01

    Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.

  14. Treating speech subsystems in childhood apraxia of speech with tactual input: the PROMPT approach.

    Science.gov (United States)

    Dale, Philip S; Hayden, Deborah A

    2013-11-01

    Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT; Hayden, 2004; Hayden, Eigen, Walker, & Olsen, 2010)-a treatment approach for the improvement of speech sound disorders in children-uses tactile-kinesthetic- proprioceptive (TKP) cues to support and shape movements of the oral articulators. No research to date has systematically examined the efficacy of PROMPT for children with childhood apraxia of speech (CAS). Four children (ages 3;6 [years;months] to 4;8), all meeting the American Speech-Language-Hearing Association (2007) criteria for CAS, were treated using PROMPT. All children received 8 weeks of 2 × per week treatment, including at least 4 weeks of full PROMPT treatment that included TKP cues. During the first 4 weeks, 2 of the 4 children received treatment that included all PROMPT components except TKP cues. This design permitted both between-subjects and within-subjects comparisons to evaluate the effect of TKP cues. Gains in treatment were measured by standardized tests and by criterion-referenced measures based on the production of untreated probe words, reflecting change in speech movements and auditory perceptual accuracy. All 4 children made significant gains during treatment, but measures of motor speech control and untreated word probes provided evidence for more gain when TKP cues were included. PROMPT as a whole appears to be effective for treating children with CAS, and the inclusion of TKP cues appears to facilitate greater effect.

  15. Tuning Neural Phase Entrainment to Speech.

    Science.gov (United States)

    Falk, Simone; Lanzilotti, Cosima; Schön, Daniele

    2017-08-01

    Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.

  16. Prediction and constraint in audiovisual speech perception.

    Science.gov (United States)

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  17. Prediction and constraint in audiovisual speech perception

    Science.gov (United States)

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  18. The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise

    Science.gov (United States)

    Shen, Jung; Souza, Pamela E.

    2017-01-01

    Purpose: This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for…

  19. Speech Development

    Science.gov (United States)

    ... Spotlight Fundraising Ideas Vehicle Donation Volunteer Efforts Speech Development skip to submenu What We Do Cleft & Craniofacial Educational Materials Speech Development To download the PDF version of this factsheet, ...

  20. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  1. Speech-to-Speech Relay Service

    Science.gov (United States)

    Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...

  2. A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

    Science.gov (United States)

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

    2015-01-01

    The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.

  3. Phonetic matching of auditory and visual speech develops during childhood: evidence from sine-wave speech.

    Science.gov (United States)

    Baart, Martijn; Bortfeld, Heather; Vroomen, Jean

    2015-01-01

    The correspondence between auditory speech and lip-read information can be detected based on a combination of temporal and phonetic cross-modal cues. Here, we determined the point in developmental time at which children start to effectively use phonetic information to match a speech sound with one of two articulating faces. We presented 4- to 11-year-olds (N=77) with three-syllabic sine-wave speech replicas of two pseudo-words that were perceived as non-speech and asked them to match the sounds with the corresponding lip-read video. At first, children had no phonetic knowledge about the sounds, and matching was thus based on the temporal cues that are fully retained in sine-wave speech. Next, we trained all children to perceive the phonetic identity of the sine-wave speech and repeated the audiovisual (AV) matching task. Only at around 6.5 years of age did the benefit of having phonetic knowledge about the stimuli become apparent, thereby indicating that AV matching based on phonetic cues presumably develops more slowly than AV matching based on temporal cues. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. Articulatory control parameters of phonological contrasts: the case of cue-weighting for Dutch /ɑ/ - /a

    NARCIS (Netherlands)

    Terband, H.R.; van Montfort, Manou; Bax, Lydia; Sehgal, Sapna; Smorenburg, Laura; Versteeg, Fleur; Lentz, Tom

    2016-01-01

    Speech-language acquisition involves learning the speech sounds of the language at hand as well as which acoustic cues are relevant to differentiate them. For example, the Dutch vowels /ɑ/ and /a/ in the words 'man' (man) and 'maan' (moon) differ both in their spectral properties (F1 and F2 are both

  5. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    Science.gov (United States)

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  6. Predicting Intelligibility Gains in Dysarthria through Automated Speech Feature Analysis

    Science.gov (United States)

    Fletcher, Annalise R.; Wisler, Alan A.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Liss, Julie M.

    2017-01-01

    Purpose: Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, &…

  7. Causal inference of asynchronous audiovisual speech

    Directory of Open Access Journals (Sweden)

    John F Magnotti

    2013-11-01

    Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.

  8. Speech Perception Results: Audition and Lipreading Enhancement.

    Science.gov (United States)

    Geers, Ann; Brenner, Chris

    1994-01-01

    This paper describes changes in speech perception performance of deaf children using cochlear implants, tactile aids, or conventional hearing aids over a three-year period. Eleven of the 13 children with cochlear implants were able to identify words on the basis of auditory consonant cues. Significant lipreading enhancement was also achieved with…

  9. Phonological and Phonetic Biases in Speech Perception

    Science.gov (United States)

    Key, Michael Parrish

    2012-01-01

    This dissertation investigates how knowledge of phonological generalizations influences speech perception, with a particular focus on evidence that phonological processing is autonomous from (rather than interactive with) auditory processing. A model is proposed in which auditory cue constraints and markedness constraints interact to determine a…

  10. Fishing for meaningful units in connected speech

    DEFF Research Database (Denmark)

    Henrichsen, Peter Juel; Christiansen, Thomas Ulrich

    2009-01-01

    was far lower than for phonemic recognition. Our findings show that it is possible to automatically characterize a linguistic message, without detailed spectral information or presumptions about the target units. Further, fishing for simple meaningful cues and enhancing these selectively would potentially...... be a more effective way of achieving intelligibility transfer, which is the end goal for speech transducing technologies....

  11. The influence of visual speech information on the intelligibility of English consonants produced by non-native speakers.

    Science.gov (United States)

    Kawase, Saya; Hannah, Beverly; Wang, Yue

    2014-09-01

    This study examines how visual speech information affects native judgments of the intelligibility of speech sounds produced by non-native (L2) speakers. Native Canadian English perceivers as judges perceived three English phonemic contrasts (/b-v, θ-s, l-ɹ/) produced by native Japanese speakers as well as native Canadian English speakers as controls. These stimuli were presented under audio-visual (AV, with speaker voice and face), audio-only (AO), and visual-only (VO) conditions. The results showed that, across conditions, the overall intelligibility of Japanese productions of the native (Japanese)-like phonemes (/b, s, l/) was significantly higher than the non-Japanese phonemes (/v, θ, ɹ/). In terms of visual effects, the more visually salient non-Japanese phonemes /v, θ/ were perceived as significantly more intelligible when presented in the AV compared to the AO condition, indicating enhanced intelligibility when visual speech information is available. However, the non-Japanese phoneme /ɹ/ was perceived as less intelligible in the AV compared to the AO condition. Further analysis revealed that, unlike the native English productions, the Japanese speakers produced /ɹ/ without visible lip-rounding, indicating that non-native speakers' incorrect articulatory configurations may decrease the degree of intelligibility. These results suggest that visual speech information may either positively or negatively affect L2 speech intelligibility.

  12. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  13. Reactivity to nicotine cues over repeated cue reactivity sessions

    Science.gov (United States)

    LaRowe, Steven D.; Saladin, Michael E.; Carpenter, Matthew J.; Upadhyaya, Himanshu P.

    2009-01-01

    The present study investigated whether reactivity to nicotine-related cues would attenuate across four experimental sessions held one week apart. Participants were nineteen non-treatment seeking, nicotine-dependent males. Cue reactivity sessions were performed in an outpatient research center using in vivo cues consisting of standardized smoking-related paraphernalia (e.g., cigarettes) and neutral comparison paraphernalia (e.g., pencils). Craving ratings were collected before and after both cue presentations while physiological measures (heart rate, skin conductance) were collected before and during the cue presentations. Although craving levels decreased across sessions, smoking-related cues consistently evoked significantly greater increases in craving relative to neutral cues over all four experimental sessions. Skin conductance was higher in response to smoking cues, though this effect was not as robust as that observed for craving. Results suggest that, under the described experimental parameters, craving can be reliably elicited over repeated cue reactivity sessions. PMID:17537583

  14. Cues and expressions

    Directory of Open Access Journals (Sweden)

    Thorbjörg Hróarsdóttir

    2005-02-01

    Full Text Available A number of European languages have undergone a change from object-verb to verb-object order. We focus on the change in English and Icelandic, showing that while the structural change was the same, it took place at different times and different ways in the two languages, triggered by different E-language changes. As seen from the English viewpoint, low-level facts of inflection morphology may express the relevant cue for parameters, and so the loss of inflection may lead to a grammar change. This analysis does not carry over to Icelandic, as the loss of OV there took place despite rich case morphology. We aim to show how this can be explained within a cue-style approach, arguing for a universal set of cues. However, the relevant cue may be expressed differently among languages: While it may have been expressed through morphology in English, it as expressed through information structure in Icelandic. In both cases, external effects led to fewer expressions of the relevant (universal cue and a grammar change took place.

  15. Infant directed speech and the development of speech perception: enhancing development or an unintended consequence?

    Science.gov (United States)

    McMurray, Bob; Kovack-Lesh, Kristine A; Goodwin, Dresden; McEchron, William

    2013-11-01

    Infant directed speech (IDS) is a speech register characterized by simpler sentences, a slower rate, and more variable prosody. Recent work has implicated it in more subtle aspects of language development. Kuhl et al. (1997) demonstrated that segmental cues for vowels are affected by IDS in a way that may enhance development: the average locations of the extreme "point" vowels (/a/, /i/ and /u/) are further apart in acoustic space. If infants learn speech categories, in part, from the statistical distributions of such cues, these changes may specifically enhance speech category learning. We revisited this by asking (1) if these findings extend to a new cue (Voice Onset Time, a cue for voicing); (2) whether they extend to the interior vowels which are much harder to learn and/or discriminate; and (3) whether these changes may be an unintended phonetic consequence of factors like speaking rate or prosodic changes associated with IDS. Eighteen caregivers were recorded reading a picture book including minimal pairs for voicing (e.g., beach/peach) and a variety of vowels to either an adult or their infant. Acoustic measurements suggested that VOT was different in IDS, but not in a way that necessarily supports better development, and that these changes are almost entirely due to slower rate of speech of IDS. Measurements of the vowel suggested that in addition to changes in the mean, there was also an increase in variance, and statistical modeling suggests that this may counteract the benefit of any expansion of the vowel space. As a whole this suggests that changes in segmental cues associated with IDS may be an unintended by-product of the slower rate of speech and different prosodic structure, and do not necessarily derive from a motivation to enhance development. Copyright © 2013 Elsevier B.V. All rights reserved.

  16. A configural dominant account of contextual cueing: Configural cues are stronger than colour cues.

    Science.gov (United States)

    Kunar, Melina A; John, Rebecca; Sweetman, Hollie

    2014-01-01

    Previous work has shown that reaction times to find a target in displays that have been repeated are faster than those for displays that have never been seen before. This learning effect, termed "contextual cueing" (CC), has been shown using contexts such as the configuration of the distractors in the display and the background colour. However, it is not clear how these two contexts interact to facilitate search. We investigated this here by comparing the strengths of these two cues when they appeared together. In Experiment 1, participants searched for a target that was cued by both colour and distractor configural cues, compared with when the target was only predicted by configural information. The results showed that the addition of a colour cue did not increase contextual cueing. In Experiment 2, participants searched for a target that was cued by both colour and distractor configuration compared with when the target was only cued by colour. The results showed that adding a predictive configural cue led to a stronger CC benefit. Experiments 3 and 4 tested the disruptive effects of removing either a learned colour cue or a learned configural cue and whether there was cue competition when colour and configural cues were presented together. Removing the configural cue was more disruptive to CC than removing colour, and configural learning was shown to overshadow the learning of colour cues. The data support a configural dominant account of CC, where configural cues act as the stronger cue in comparison to colour when they are presented together.

  17. Composition: Cue Wheel

    DEFF Research Database (Denmark)

    Bergstrøm-Nielsen, Carl

    2014-01-01

    Cue Rondo is an open composition to be realised by improvising musicians. See more about my composition practise in the entry "Composition - General Introduction". This work is licensed under a Creative Commons "by-nc" License. You may for non-commercial purposes use and distribute it, performanc...

  18. Neural entrainment to speech modulates speech intelligibility

    NARCIS (Netherlands)

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Başkent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

  19. Acoustic and perceptual effects of magnifying interaural difference cues in a simulated "binaural" hearing aid.

    Science.gov (United States)

    de Taillez, Tobias; Grimm, Giso; Kollmeier, Birger; Neher, Tobias

    2017-04-10

    To investigate the influence of an algorithm designed to enhance or magnify interaural difference cues on speech signals in noisy, spatially complex conditions using both technical and perceptual measurements. To also investigate the combination of interaural magnification (IM), monaural microphone directionality (DIR), and binaural coherence-based noise reduction (BC). Speech-in-noise stimuli were generated using virtual acoustics. A computational model of binaural hearing was used to analyse the spatial effects of IM. Predicted speech quality changes and signal-to-noise-ratio (SNR) improvements were also considered. Additionally, a listening test was carried out to assess speech intelligibility and quality. Listeners aged 65-79 years with and without sensorineural hearing loss (N = 10 each). IM increased the horizontal separation of concurrent directional sound sources without introducing any major artefacts. In situations with diffuse noise, however, the interaural difference cues were distorted. Preprocessing the binaural input signals with DIR reduced distortion. IM influenced neither speech intelligibility nor speech quality. The IM algorithm tested here failed to improve speech perception in noise, probably because of the dispersion and inconsistent magnification of interaural difference cues in complex environments.

  20. Psychophysics of complex auditory and speech stimuli

    Science.gov (United States)

    Pastore, Richard E.

    1993-10-01

    A major focus on the primary project is the use of different procedures to provide converging evidence on the nature of perceptual spaces for speech categories. Completed research examined initial voiced consonants, with results providing strong evidence that different stimulus properties may cue a phoneme category in different vowel contexts. Thus, /b/ is cued by a rising second format (F2) with the vowel /a/, requiring both F2 and F3 to be rising with /i/, and is independent of the release burst for these vowels. Furthermore, cues for phonetic contrasts are not necessarily symmetric, and the strong dependence of prior speech research on classification procedures may have led to errors. Thus, the opposite (falling F2 and F3) transitions lead somewhat ambiguous percepts (i.e., not /b/) which may be labeled consistently (as /d/ or /g/), but requires a release burst to achieve high category quality and similarity to category exemplars. Ongoing research is examining cues in other vowel contexts and issuing procedures to evaluate the nature of interaction between cues for categories of both speech and music.

  1. Voice quality in affect cueing: does loudness matter?

    OpenAIRE

    Irena eYanushevskaya; Christer eGobl; Ailbhe eNí Chasaide

    2013-01-01

    PUBLISHED In emotional speech research, it has been suggested that loudness, along with other prosodic features, may be an important cue in communicating high activation affects. In earlier studies, we found different voice quality stimuli to be consistently associated with certain affective states. In these stimuli, as in typical human productions, the different voice qualities entailed differences in loudness. To examine the extent to which the loudness differences among these voice qual...

  2. Sensitivity to structure in the speech signal by children with speech sound disorder and reading disability.

    Science.gov (United States)

    Johnson, Erin Phinney; Pennington, Bruce F; Lowenstein, Joanna H; Nittrouer, Susan

    2011-01-01

    Children with speech sound disorder (SSD) and reading disability (RD) have poor phonological awareness, a problem believed to arise largely from deficits in processing the sensory information in speech, specifically individual acoustic cues. However, such cues are details of acoustic structure. Recent theories suggest that listeners also need to be able to integrate those details to perceive linguistically relevant form. This study examined abilities of children with SSD, RD, and SSD+RD not only to process acoustic cues but also to recover linguistically relevant form from the speech signal. Ten- to 11-year-olds with SSD (n=17), RD (n=16), SSD+RD (n=17), and Controls (n=16) were tested to examine their sensitivity to (1) voice onset times (VOT); (2) spectral structure in fricative-vowel syllables; and (3) vocoded sentences. Children in all groups performed similarly with VOT stimuli, but children with disorders showed delays on other tasks, although the specifics of their performance varied. Children with poor phonemic awareness not only lack sensitivity to acoustic details, but are also less able to recover linguistically relevant forms. This is contrary to one of the main current theories of the relation between spoken and written language development. Readers will be able to (1) understand the role speech perception plays in phonological awareness, (2) distinguish between segmental and global structure analysis of speech perception, (3) describe differences and similarities in speech perception among children with speech sound disorder and/or reading disability, and (4) recognize the importance of broadening clinical interventions to focus on recognizing structure at all levels of speech analysis. Copyright © 2011 Elsevier Inc. All rights reserved.

  3. Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion

    Directory of Open Access Journals (Sweden)

    Michael Schutz

    2017-11-01

    Full Text Available Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor, a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy” pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015. Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.

  4. Performance evaluation of a motor-imagery-based EEG-Brain computer interface using a combined cue with heterogeneous training data in BCI-Naive subjects

    Directory of Open Access Journals (Sweden)

    Lee Youngbum

    2011-10-01

    Full Text Available Abstract Background The subjects in EEG-Brain computer interface (BCI system experience difficulties when attempting to obtain the consistent performance of the actual movement by motor imagery alone. It is necessary to find the optimal conditions and stimuli combinations that affect the performance factors of the EEG-BCI system to guarantee equipment safety and trust through the performance evaluation of using motor imagery characteristics that can be utilized in the EEG-BCI testing environment. Methods The experiment was carried out with 10 experienced subjects and 32 naive subjects on an EEG-BCI system. There were 3 experiments: The experienced homogeneous experiment, the naive homogeneous experiment and the naive heterogeneous experiment. Each experiment was compared in terms of the six audio-visual cue combinations and consisted of 50 trials. The EEG data was classified using the least square linear classifier in case of the naive subjects through the common spatial pattern filter. The accuracy was calculated using the training and test data set. The p-value of the accuracy was obtained through the statistical significance test. Results In the case in which a naive subject was trained by a heterogeneous combined cue and tested by a visual cue, the result was not only the highest accuracy (p Conclusions We propose the use of this measuring methodology of a heterogeneous combined cue for training data and a visual cue for test data by the typical EEG-BCI algorithm on the EEG-BCI system to achieve effectiveness in terms of consistence, stability, cost, time, and resources management without the need for a trial and error process.

  5. Antecedent Stimulus Control: Using Orienting Cues to Facilitate First-Word Acquisition for Nonresponders with Autism

    OpenAIRE

    Koegel, Robert L.; Shirotova, Larisa; Koegel, Lynn Kern

    2009-01-01

    Although considerable progress has been made in improving the acquisition of expressive verbal communication in children with autism, research has documented that a subpopulation of children still fail to acquire speech even with intensive intervention. One variable that might be important in facilitating responding for this nonverbal subgroup of children is the use of antecedent orienting cues. Using a multiple baseline design, this study examined whether individualized orienting cues could ...

  6. PENINGKATAN KETERAMPILAN MEMBUAT KEPUTUSAN DENGAN MENGGUNAKAN MODEL PROBLEM BASED LEARNING BERBANTUAN MEDIA AUDIO VISUAL SISWA SMK PGRI BATANG (Studi pada Kelas X Pemasaran 1 TahunAjaran 2013/ 2014

    Directory of Open Access Journals (Sweden)

    Lilis Septiarini

    2014-06-01

    Full Text Available Penelitian ini bertujuan untuk mengetahui bagaimana penerapan model problem based learning dan apakah model problem based learning dapat meningkatkan keterampilan membuat keputusan. Subyek penelitian ini adalah siswa kelas XPemasaran 1 SMK PGRI BATANG. Latar belakang penelitian ini adalah karena kurangnya keterampilan membuat keputusan siswa, selain itu karena metode pembelajaran yang digunakan tidak tepat yaitu menggunakan ceramah tanpa variasi sedangkan karakter materinya adalah analistik dan aplikatif. Penelitian ini merupakan penelitian tindakan kelas yang dilakukan dalam 2 siklus. Hasil penelitian ini diperoleh presentase aktivitas siswa pada pembelajaran siklus I dengan kategori baik dan pada siklus II meningkat menjadi dengan kategori sangat baik, persentase aktivitas guru pada pembelajaran siklus I kategori baik dan pada siklus II meningkat menjadi kategori sangat baik, rata-rata kelas yang dicapai dalam kategori baik dan pada siklus II rata-rata kelas menjadi kategori sangat baik. This research aims to how the use model problem based learning and whether model problem based learning can improving students skills in decision making. The subject of this research is the class X Marketing 1 SMK PGRI Batang. The background of this research is the lack of students skills in decision making, more over because the learning method that isused is not appropriateto usethe oral speech without variation while the characteris annalistic and aplicatif. This study is anaction research conductedin two cycles. The research finding showed that the percentage of activity of students in learning cycle I with good category and on cycle II increased with very good category, the percentage of the activity of the teacher in the learning cycle I that good category and on cycle II increased with very good category, the average grade achieved in cycle I with good category and on cycle II the average grade increased with very good category.

  7. Cue conflicts in context

    DEFF Research Database (Denmark)

    Boeg Thomsen, Ditte; Poulsen, Mads

    2015-01-01

    When learning their first language, children develop strategies for assigning semantic roles to sentence structures, depending on morphosyntactic cues such as case and word order. Traditionally, comprehension experiments have presented transitive clauses in isolation, and crosslinguistically...... preschoolers. However, object-first clauses may be context-sensitive structures, which are infelicitous in isolation. In a second act-out study we presented OVS clauses in supportive and unsupportive discourse contexts and in isolation and found that five-to-six-year-olds’ OVS comprehension was enhanced...... in discourse-pragmatically felicitous contexts. Our results extend previous findings of preschoolers’ sensitivity to discourse-contextual cues in sentence comprehension (Hurewitz, 2001; Song & Fisher, 2005) to the basic task of assigning agent and patient roles....

  8. How to Tell Beans from Farmers: Cues to the Perception of Pitch Accent in Whispered Norwegian

    Directory of Open Access Journals (Sweden)

    Hannele Nicholson

    2004-01-01

    Full Text Available East Norwegian employs pitch accent contours in order to make lexical distinctions. This paper researches listeners' ability to make lexical distinctions in the absence of f0 (ie. whispered speech as the listener attempts to determine which pitch accent word token best fits into a whispered ambiguous utterance in spoken Norwegian. The results confirm that local syntactic context alone is not a reliable cue to assist in lexical selection and concur with Fintoft (1970 in suggesting that listeners utilise a separate prosodic cue, possibly syllable duration or intensity, to make the pitch accent distinction in whispered speech.

  9. Mind your pricing cues.

    Science.gov (United States)

    Anderson, Eric; Simester, Duncan

    2003-09-01

    For most of the items they buy, consumers don't have an accurate sense of what the price should be. Ask them to guess how much a four-pack of 35-mm film costs, and you'll get a variety of wrong answers: Most people will underestimate; many will only shrug. Research shows that consumers' knowledge of the market is so far from perfect that it hardly deserves to be called knowledge at all. Yet people happily buy film and other products every day. Is this because they don't care what kind of deal they're getting? No. Remarkably, it's because they rely on retailers to tell them whether they're getting a good price. In subtle and not-so-subtle ways, retailers send signals to customers, telling them whether a given price is relatively high or low. In this article, the authors review several common pricing cues retailers use--"sale" signs, prices that end in 9, signpost items, and price-matching guarantees. They also offer some surprising facts about how--and how well--those cues work. For instance, the authors' tests with several mail-order catalogs reveal that including the word "sale" beside a price can increase demand by more than 50%. The practice of using a 9 at the end of a price to denote a bargain is so common, you'd think customers would be numb to it. Yet in a study the authors did involving a women's clothing catalog, they increased demand by a third just by changing the price of a dress from $34 to $39. Pricing cues are powerful tools for guiding customers' purchasing decisions, but they must be applied judiciously. Used inappropriately, the cues may breach customers' trust, reduce brand equity, and give rise to lawsuits.

  10. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  11. Role of Binaural Temporal Fine Structure and Envelope Cues in Cocktail-Party Listening.

    Science.gov (United States)

    Swaminathan, Jayaganesh; Mason, Christine R; Streeter, Timothy M; Best, Virginia; Roverud, Elin; Kidd, Gerald

    2016-08-03

    While conversing in a crowded social setting, a listener is often required to follow a target speech signal amid multiple competing speech signals (the so-called "cocktail party" problem). In such situations, separation of the target speech signal in azimuth from the interfering masker signals can lead to an improvement in target intelligibility, an effect known as spatial release from masking (SRM). This study assessed the contributions of two stimulus properties that vary with separation of sound sources, binaural envelope (ENV) and temporal fine structure (TFS), to SRM in normal-hearing (NH) human listeners. Target speech was presented from the front and speech maskers were either colocated with or symmetrically separated from the target in azimuth. The target and maskers were presented either as natural speech or as "noise-vocoded" speech in which the intelligibility was conveyed only by the speech ENVs from several frequency bands; the speech TFS within each band was replaced with noise carriers. The experiments were designed to preserve the spatial cues in the speech ENVs while retaining/eliminating them from the TFS. This was achieved by using the same/different noise carriers in the two ears. A phenomenological auditory-nerve model was used to verify that the interaural correlations in TFS differed across conditions, whereas the ENVs retained a high degree of correlation, as intended. Overall, the results from this study revealed that binaural TFS cues, especially for frequency regions below 1500 Hz, are critical for achieving SRM in NH listeners. Potential implications for studying SRM in hearing-impaired listeners are discussed. Acoustic signals received by the auditory system pass first through an array of physiologically based band-pass filters. Conceptually, at the output of each filter, there are two principal forms of temporal information: slowly varying fluctuations in the envelope (ENV) and rapidly varying fluctuations in the temporal fine

  12. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  13. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  14. Speech dynamics

    NARCIS (Netherlands)

    Pols, L.C.W.

    2011-01-01

    In order for speech to be informative and communicative, segmental and suprasegmental variation is mandatory. Only this leads to meaningful words and sentences. The building blocks are no stable entities put next to each other (like beads on a string or like printed text), but there are gradual

  15. Speech Enhancement

    DEFF Research Database (Denmark)

    Benesty, Jacob; Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    and their performance bounded and assessed in terms of noise reduction and speech distortion. The book shows how various filter designs can be obtained in this framework, including the maximum SNR, Wiener, LCMV, and MVDR filters, and how these can be applied in various contexts, like in single-channel and multichannel...

  16. Zebra finches are sensitive to prosodic features of human speech.

    Science.gov (United States)

    Spierings, Michelle J; ten Cate, Carel

    2014-07-22

    Variation in pitch, amplitude and rhythm adds crucial paralinguistic information to human speech. Such prosodic cues can reveal information about the meaning or emphasis of a sentence or the emotional state of the speaker. To examine the hypothesis that sensitivity to prosodic cues is language independent and not human specific, we tested prosody perception in a controlled experiment with zebra finches. Using a go/no-go procedure, subjects were trained to discriminate between speech syllables arranged in XYXY patterns with prosodic stress on the first syllable and XXYY patterns with prosodic stress on the final syllable. To systematically determine the salience of the various prosodic cues (pitch, duration and amplitude) to the zebra finches, they were subjected to five tests with different combinations of these cues. The zebra finches generalized the prosodic pattern to sequences that consisted of new syllables and used prosodic features over structural ones to discriminate between stimuli. This strong sensitivity to the prosodic pattern was maintained when only a single prosodic cue was available. The change in pitch was treated as more salient than changes in the other prosodic features. These results show that zebra finches are sensitive to the same prosodic cues known to affect human speech perception. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  17. The Effects of Nonverbal Involvement and Communication Apprehension on State Anxiety, Interpersonal Attraction, and Speech Duration.

    Science.gov (United States)

    Remland, Martin S.; Jones, Tricia S.

    1989-01-01

    Examines whether communication apprehension mediates the effect of nonverbal involvement cues (head nods, eye contact, body orientation, etc.) on state anxiety, interpersonal attraction, and speech duration in information gathering interviews. Finds that nonverbal cues affect loquacity and liking, but that a speaker's communication apprehension…

  18. Visual Phonetic Processing Localized Using Speech and Non-Speech Face Gestures in Video and Point-Light Displays

    Science.gov (United States)

    Bernstein, Lynne E.; Jiang, Jintao; Pantazis, Dimitrios; Lu, Zhong-Lin; Joshi, Anand

    2011-01-01

    The talking face affords multiple types of information. To isolate cortical sites with responsibility for integrating linguistically relevant visual speech cues, speech and non-speech face gestures were presented in natural video and point-light displays during fMRI scanning at 3.0T. Participants with normal hearing viewed the stimuli and also viewed localizers for the fusiform face area (FFA), the lateral occipital complex (LOC), and the visual motion (V5/MT) regions of interest (ROIs). The FFA, the LOC, and V5/MT were significantly less activated for speech relative to non-speech and control stimuli. Distinct activation of the posterior superior temporal sulcus and the adjacent middle temporal gyrus to speech, independent of media, was obtained in group analyses. Individual analyses showed that speech and non-speech stimuli were associated with adjacent but different activations, with the speech activations more anterior. We suggest that the speech activation area is the temporal visual speech area (TVSA), and that it can be localized with the combination of stimuli used in this study. PMID:20853377

  19. How each prosodic boundary cue matters: Evidence from German infants

    Directory of Open Access Journals (Sweden)

    Caroline eWellmann

    2012-12-01

    Full Text Available Previous studies have revealed that infants aged six to ten months are able to use the acoustic correlates of major prosodic boundaries, that is, pitch change, preboundary lengthening, and pause, for the segmentation of the continuous speech signal. Moreover, investigations with American-English- and Dutch-learning infants suggest that processing prosodic boundary markings involves a weighting of these cues. This weighting seems to develop with increasing exposure to the native language and to underlie crosslinguistic variation. In the following, we report the results of four experiments using the headturn preference procedure to explore the perception of prosodic boundary cues in German infants. We presented eight-month-old infants with a sequence of names in two different prosodic groupings, with or without boundary markers. Infants discriminated both sequences when the boundary was marked by all three cues (Experiment 1 and when it was marked by a pitch change and preboundary lengthening in combination (Experiment 2. The presence of a pitch change (Experiment 3 or preboundary lengthening (Experiment 4 as single cues did not lead to a successful discrimination. Our results indicate that pause is not a necessary cue for German infants. Pitch and preboundary lengthening in combination, but not as single cues, are sufficient. Hence, by eight months infants only rely on a convergence of boundary markers. Comparisons with adults’ performance on the same stimulus materials suggest that the pattern observed with the eight-month-olds is already consistent with that of adults. We discuss our findings with respect to crosslinguistic variation and the development of a language-specific prosodic cue weighting.

  20. Effect of F0 contours on top-down repair of interrupted speech

    NARCIS (Netherlands)

    Clarke, Jeanne; Kazanoglu, Deniz; Baskent, Deniz; Gaudrain, Etienne

    Top-down repair of interrupted speech can be influenced by bottom-up acoustic cues such as voice pitch (F0). This study aims to investigate the role of the dynamic information of pitch, i.e., F0 contours, in top-down repair of speech. Intelligibility of sentences interrupted with silence or noise

  1. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    Science.gov (United States)

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  2. Helmets: conventional to cueing

    Science.gov (United States)

    Sedillo, Michael R.; Dixon, Sharon A.

    2003-09-01

    Aviation helmets have always served as an interface between technology and flyers. The functional evolution of helmets continued with the advent of radio when helmets were modified to accept communication components and later, oxygen masks. As development matured, interest in safety increased as evident in more robust designs. Designing helmets became a balance between adding new capabilities and reducing the helmet's weight. As the research community better defined acceptable limits of weight-tolerances with tools such as the "Knox Box" criteria, system developers added and subtracted technologies while remaining within these limits. With most helmet-mounted technologies being independent of each other, the level of precision in mounting these technologies was not as significant a concern as it is today. The attachment of new components was acceptable as long as the components served their purpose. However this independent concept has become obsolete with the dawn of modern helmet mounted displays. These complex systems are interrelated and demand precision in their attachment to the helmet. The helmets' role now extends beyond serving as a means to mount the technologies to the head, but is now instrumental in critical visual alignment of complex night vision and missile cueing technologies. These new technologies demand a level of helmet fit and component alignment previously not seen in past helmet designs. This paper presents some of the design, integration and logistical issues gleaned during the development of the Joint Helmet Mounted Cueing System (JHMCS) to include the application of head-track technologies in forensic investigations.

  3. Learnability of prosodic boundaries: Is infant-directed speech easier?

    Science.gov (United States)

    Ludusan, Bogdan; Cristia, Alejandrina; Martin, Andrew; Mazuka, Reiko; Dupoux, Emmanuel

    2016-08-01

    This study explores the long-standing hypothesis that the acoustic cues to prosodic boundaries in infant-directed speech (IDS) make those boundaries easier to learn than those in adult-directed speech (ADS). Three cues (pause duration, nucleus duration, and pitch change) were investigated, by means of a systematic review of the literature, statistical analyses of a corpus of Japanese, and machine learning experiments. The review of previous work revealed that the effect of register on boundary cues is less well established than previously thought, and that results often vary across studies for certain cues. Statistical analyses run on a large database of mother-child and mother-interviewer interactions showed that the duration of a pause and the duration of the syllable nucleus preceding the boundary are two cues which are enhanced in IDS, while f0 change is actually degraded in IDS. Supervised and unsupervised machine learning techniques applied to these acoustic cues revealed that IDS boundaries were consistently better classified than ADS ones, regardless of the learning method used. The role of the cues examined in this study and the importance of these findings in the more general context of early linguistic structure acquisition is discussed.

  4. Cue reactivity towards shopping cues in female participants.

    Science.gov (United States)

    Starcke, Katrin; Schlereth, Berenike; Domass, Debora; Schöler, Tobias; Brand, Matthias

    2013-03-01

    Background and aims It is currently under debate whether pathological buying can be considered as a behavioural addiction. Addictions have often been investigated with cue-reactivity paradigms to assess subjective, physiological and neural craving reactions. The current study aims at testing whether cue reactivity towards shopping cues is related to pathological buying tendencies. Methods A sample of 66 non-clinical female participants rated shopping related pictures concerning valence, arousal, and subjective craving. In a subgroup of 26 participants, electrodermal reactions towards those pictures were additionally assessed. Furthermore, all participants were screened concerning pathological buying tendencies and baseline craving for shopping. Results Results indicate a relationship between the subjective ratings of the shopping cues and pathological buying tendencies, even if baseline craving for shopping was controlled for. Electrodermal reactions were partly related to the subjective ratings of the cues. Conclusions Cue reactivity may be a potential correlate of pathological buying tendencies. Thus, pathological buying may be accompanied by craving reactions towards shopping cues. Results support the assumption that pathological buying can be considered as a behavioural addiction. From a methodological point of view, results support the view that the cue-reactivity paradigm is suited for the investigation of craving reactions in pathological buying and future studies should implement this paradigm in clinical samples.

  5. Brain-Computer Interfaces for Speech Communication.

    Science.gov (United States)

    Brumberg, Jonathan S; Nieto-Castanon, Alfonso; Kennedy, Philip R; Guenther, Frank H

    2010-04-01

    This paper briefly reviews current silent speech methodologies for normal and disabled individuals. Current techniques utilizing electromyographic (EMG) recordings of vocal tract movements are useful for physically healthy individuals but fail for tetraplegic individuals who do not have accurate voluntary control over the speech articulators. Alternative methods utilizing EMG from other body parts (e.g., hand, arm, or facial muscles) or electroencephalography (EEG) can provide capable silent communication to severely paralyzed users, though current interfaces are extremely slow relative to normal conversation rates and require constant attention to a computer screen that provides visual feedback and/or cueing. We present a novel approach to the problem of silent speech via an intracortical microelectrode brain computer interface (BCI) to predict intended speech information directly from the activity of neurons involved in speech production. The predicted speech is synthesized and acoustically fed back to the user with a delay under 50 ms. We demonstrate that the Neurotrophic Electrode used in the BCI is capable of providing useful neural recordings for over 4 years, a necessary property for BCIs that need to remain viable over the lifespan of the user. Other design considerations include neural decoding techniques based on previous research involving BCIs for computer cursor or robotic arm control via prediction of intended movement kinematics from motor cortical signals in monkeys and humans. Initial results from a study of continuous speech production with instantaneous acoustic feedback show the BCI user was able to improve his control over an artificial speech synthesizer both within and across recording sessions. The success of this initial trial validates the potential of the intracortical microelectrode-based approach for providing a speech prosthesis that can allow much more rapid communication rates.

  6. Availability of binaural cues for pediatric bilateral cochlear implant recipients.

    Science.gov (United States)

    Sheffield, Sterling W; Haynes, David S; Wanna, George B; Labadie, Robert F; Gifford, René H

    2015-03-01

    Bilateral implant recipients theoretically have access to binaural cues. Research in postlingually deafened adults with cochlear implants (CIs) indicates minimal evidence for true binaural hearing. Congenitally deafened children who experience spatial hearing with bilateral CIs, however, might perceive binaural cues in the CI signal differently. There is limited research examining binaural hearing in children with CIs, and the few published studies are limited by the use of unrealistic speech stimuli and background noise. The purposes of this study were to (1) replicate our previous study of binaural hearing in postlingually deafened adults with AzBio sentences in prelingually deafened children with the pediatric version of the AzBio sentences, and (2) replicate previous studies of binaural hearing in children with CIs using more open-set sentences and more realistic background noise (i.e., multitalker babble). The study was a within-participant, repeated-measures design. The study sample consisted of 14 children with bilateral CIs with at least 25 mo of listening experience. Speech recognition was assessed using sentences presented in multitalker babble at a fixed signal-to-noise ratio. Test conditions included speech at 0° with noise presented at 0° (S0N0), on the side of the first CI (90° or 270°) (S0N1stCI), and on the side of the second CI (S0N2ndCI) as well as speech presented at 0° with noise presented semidiffusely from eight speakers at 45° intervals. Estimates of summation, head shadow, squelch, and spatial release from masking were calculated. Results of test conditions commonly reported in the literature (S0N0, S0N1stCI, S0N2ndCI) are consistent with results from previous research in adults and children with bilateral CIs, showing minimal summation and squelch but typical head shadow and spatial release from masking. However, bilateral benefit over the better CI with speech at 0° was much larger with semidiffuse noise. Congenitally deafened

  7. MEG evidence that the central auditory system simultaneously encodes multiple temporal cues

    NARCIS (Netherlands)

    Simpson, M.I.G.; Barnes, G.R.; Johnson, S.R.; Hillebrand, A.; Singh, K.D.; Green, G.G.R.

    2009-01-01

    Speech contains complex amplitude modulations that have envelopes with multiple temporal cues. The processing of these complex envelopes is not well explained by the classical models of amplitude modulation processing. This may be because the evidence for the models typically comes from the use of

  8. Influences of Semantic and Prosodic Cues on Word Repetition and Categorization in Autism

    Science.gov (United States)

    Singh, Leher; Harrow, MariLouise S.

    2014-01-01

    Purpose: To investigate sensitivity to prosodic and semantic cues to emotion in individuals with high-functioning autism (HFA). Method: Emotional prosody and semantics were independently manipulated to assess the relative influence of prosody versus semantics on speech processing. A sample of 10-year-old typically developing children (n = 10) and…

  9. Modeling the Contribution of Phonotactic Cues to the Problem of Word Segmentation

    Science.gov (United States)

    Blanchard, Daniel; Heinz, Jeffrey; Golinkoff, Roberta

    2010-01-01

    How do infants find the words in the speech stream? Computational models help us understand this feat by revealing the advantages and disadvantages of different strategies that infants might use. Here, we outline a computational model of word segmentation that aims both to incorporate cues proposed by language acquisition researchers and to…

  10. Word Order, Referential Expression, and Case Cues to the Acquisition of Transitive Sentences in Italian

    Science.gov (United States)

    Abbot-Smith, Kirsten; Serratrice, Ludovica

    2015-01-01

    In Study 1 we analyzed Italian child-directed-speech (CDS) and selected the three most frequent active transitive sentence frames used with overt subjects. In Study 2 we experimentally investigated how Italian-speaking children aged 2;6, 3;6, and 4;6 comprehended these orders with novel verbs when the cues of animacy, gender, and subject-verb…

  11. Weighting of Acoustic Cues to a Manner Distinction by Children with and without Hearing Loss

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H.

    2015-01-01

    Purpose: Children must develop optimal perceptual weighting strategies for processing speech in their first language. Hearing loss can interfere with that development, especially if cochlear implants are required. The three goals of this study were to measure, for children with and without hearing loss: (a) cue weighting for a manner distinction,…

  12. Perception of Acoustic Emotion Cues in Normal Hearing Listeners and Cochlear Implant Users

    NARCIS (Netherlands)

    Gilbers, Steven; Fuller, Christina; Broersma, M.; Goudbeek, M.B.; Free, Rolien; Başkent, Deniz

    2014-01-01

    Due to the limitations in sound transmission in the electrode-nerve interface, cochlear implant users are unable to fully perceive the acoustic emotion cues in speech. Therefore, it has been suggested that they use different perceptual strategies than normal-hearing listeners, namely by adapting the

  13. Tactile perception by the profoundly deaf. Speech and environmental sounds.

    Science.gov (United States)

    Plant, G L

    1982-11-01

    Four subjects fitted with single-channel vibrotactile aids and provided with training in their use took part in a testing programme aimed at assessing their aided and unaided lipreading performance, their ability to detect segmental and suprasegmental features of speech, and the discrimination of common environmental sounds. The results showed that the vibrotactile aid provided very useful information as to speech and non-speech stimuli with the subjects performing best on those tasks where time/intensity cues provided sufficient information to enable identification. The implications of the study are discussed and a comparison made with those results reported for subjects using cochlear implants.

  14. Improving Understanding of Emotional Speech Acoustic Content

    Science.gov (United States)

    Tinnemore, Anna

    Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.

  15. Temporal aspects of cue combination

    NARCIS (Netherlands)

    van Mierlo, C.M.; Brenner, E.; Smeets, J.B.J.

    2007-01-01

    The human brain processes different kinds of information (or cues) independently with different neural latencies. How does the brain deal with these differences in neural latency when it combines cues into one estimate? To find out, we introduced artificial asynchronies between the moments that

  16. Speech impairment (adult)

    Science.gov (United States)

    Language impairment; Impairment of speech; Inability to speak; Aphasia; Dysarthria; Slurred speech; Dysphonia voice disorders ... but anyone can develop a speech and language impairment suddenly, usually in a trauma. APHASIA Alzheimer disease ...

  17. Speech and Swallowing

    Science.gov (United States)

    ... You are here Home › Speech and Swallowing Problems Speech and Swallowing Problems People with Parkinson’s may notice ... How do I know if I have a speech or voice problem? My voice makes it difficult ...

  18. Speech and Language Impairments

    Science.gov (United States)

    ... What? (log-in required) Select Page Speech and Language Impairments Jun 16, 2010 A legacy disability fact ... 11] Back to top Development of Speech and Language Skills in Childhood Speech and language skills develop ...

  19. Development of a test battery for evaluating speech perception in complex listening environments.

    Science.gov (United States)

    Brungart, Douglas S; Sheffield, Benjamin M; Kubli, Lina R

    2014-08-01

    In the real world, spoken communication occurs in complex environments that involve audiovisual speech cues, spatially separated sound sources, reverberant listening spaces, and other complicating factors that influence speech understanding. However, most clinical tools for assessing speech perception are based on simplified listening environments that do not reflect the complexities of real-world listening. In this study, speech materials from the QuickSIN speech-in-noise test by Killion, Niquette, Gudmundsen, Revit, and Banerjee [J. Acoust. Soc. Am. 116, 2395-2405 (2004)] were modified to simulate eight listening conditions spanning the range of auditory environments listeners encounter in everyday life. The standard QuickSIN test method was used to estimate 50% speech reception thresholds (SRT50) in each condition. A method of adjustment procedure was also used to obtain subjective estimates of the lowest signal-to-noise ratio (SNR) where the listeners were able to understand 100% of the speech (SRT100) and the highest SNR where they could detect the speech but could not understand any of the words (SRT0). The results show that the modified materials maintained most of the efficiency of the QuickSIN test procedure while capturing performance differences across listening conditions comparable to those reported in previous studies that have examined the effects of audiovisual cues, binaural cues, room reverberation, and time compression on the intelligibility of speech.

  20. Nonnative audiovisual speech perception in noise: dissociable effects of the speaker and listener.

    Science.gov (United States)

    Xie, Zilong; Yi, Han-Gyol; Chandrasekaran, Bharath

    2014-01-01

    Nonnative speech poses a challenge to speech perception, especially in challenging listening environments. Audiovisual (AV) cues are known to improve native speech perception in noise. The extent to which AV cues benefit nonnative speech perception in noise, however, is much less well-understood. Here, we examined native American English-speaking and native Korean-speaking listeners' perception of English sentences produced by a native American English speaker and a native Korean speaker across a range of signal-to-noise ratios (SNRs;-4 to -20 dB) in audio-only and audiovisual conditions. We employed psychometric function analyses to characterize the pattern of AV benefit across SNRs. For native English speech, the largest AV benefit occurred at intermediate SNR (i.e. -12 dB); but for nonnative English speech, the largest AV benefit occurred at a higher SNR (-4 dB). The psychometric function analyses demonstrated that the AV benefit patterns were different between native and nonnative English speech. The nativeness of the listener exerted negligible effects on the AV benefit across SNRs. However, the nonnative listeners' ability to gain AV benefit in native English speech was related to their proficiency in English. These findings suggest that the native language background of both the speaker and listener clearly modulate the optimal use of AV cues in speech recognition.

  1. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With... Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay...

  2. Spoken Word Recognition of Chinese Words in Continuous Speech

    Science.gov (United States)

    Yip, Michael C. W.

    2015-01-01

    The present study examined the role of positional probability of syllables played in recognition of spoken word in continuous Cantonese speech. Because some sounds occur more frequently at the beginning position or ending position of Cantonese syllables than the others, so these kinds of probabilistic information of syllables may cue the locations…

  3. Vocabulary influences older and younger listeners' processing of dysarthric speech.

    Science.gov (United States)

    McAuliffe, Megan J; Gibson, Elizabeth M R; Kerr, Sarah E; Anderson, Tim; LaShell, Patrick J

    2013-08-01

    This study examined younger (n = 16) and older (n = 16) listeners' processing of dysarthric speech-a naturally occurring form of signal degradation. It aimed to determine how age, hearing acuity, memory, and vocabulary knowledge interacted in speech recognition and lexical segmentation. Listener transcripts were coded for accuracy and pattern of lexical boundary errors. For younger listeners, transcription accuracy was predicted by receptive vocabulary. For older listeners, this same effect existed but was moderated by pure-tone hearing thresholds. While both groups employed syllabic stress cues to inform lexical segmentation, older listeners were less reliant on this perceptual strategy. The results were interpreted to suggest that individuals with larger receptive vocabularies, with their presumed greater language familiarity, were better able to leverage cue redundancies within the speech signal to form lexical hypothesis-leading to an improved ability to comprehend dysarthric speech. This advantage was minimized as hearing thresholds increased. While the differing levels of reliance on stress cues across the listener groups could not be attributed to specific individual differences, it was hypothesized that some combination of larger vocabularies and reduced hearing thresholds in the older participant group led to them prioritize lexical cues as a segmentation frame.

  4. The Influence of Direct and Indirect Speech on Mental Representations

    NARCIS (Netherlands)

    A. Eerland (Anita); J.A.A. Engelen (Jan A.A.); R.A. Zwaan (Rolf)

    2013-01-01

    textabstractLanguage can be viewed as a set of cues that modulate the comprehender's thought processes. It is a very subtle instrument. For example, the literature suggests that people perceive direct speech (e.g., Joanne said: 'I went out for dinner last night') as more vivid and perceptually

  5. Visual Speech Perception in Children with Language Learning Impairments

    Science.gov (United States)

    Knowland, Victoria C. P.; Evans, Sam; Snell, Caroline; Rosen, Stuart

    2016-01-01

    Purpose: The purpose of the study was to assess the ability of children with developmental language learning impairments (LLIs) to use visual speech cues from the talking face. Method: In this cross-sectional study, 41 typically developing children (mean age: 8 years 0 months, range: 4 years 5 months to 11 years 10 months) and 27 children with…

  6. Speech Recognition

    Directory of Open Access Journals (Sweden)

    Adrian Morariu

    2009-01-01

    Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.

  7. Predicting speech release from masking through spatial separation in distance

    DEFF Research Database (Denmark)

    Chabot-Leclerc, Alexandre; Dau, Torsten

    2014-01-01

    Speech intelligibility models typically consist of a preprocessing part that transforms stimuli into some internal (auditory) representation and a decision metric that relates the internal representation to speech intelligibility. This study investigated speech intelligibility in conditions......-term monaural model based on the SNRenv metric predicted a small SRM only in the noise-masker condition. The results suggest that true binaural processing is not always crucial to account for speech intelligibility in spatial conditions and that an SNR metric in the envelope domain appears to be more...... appropriate in conditions of on-axis spatial speech segregation than the conventional SNR. Additionally, none of the models considered grouping cues, which seem to play an important role in the conditions studied....

  8. Review of 50 years Research About Speech Reading

    Directory of Open Access Journals (Sweden)

    Abdollah Mousavi

    2003-08-01

    Full Text Available Watching a speakers lips is like hearing speech by eye instead of by ear and markedly improves speech perception. In this review I summarise studies over the last sixty years about lip reading, it issues, methodological problems, experimental and co relational studies, issues of cerebral lateralization, localization and cognitive and neuro psychologic function. Several studies on speech reading in general suggest that hearing impaired groups actually do not possess superior speech reading skills compared to normal controls. With function magnetic resonance imaging (FMRI it was also found that the linguistic visual cues are sufficient to activate auditory cortex in the absence of auditory speech sounds. Here I presented data and arguments about all aspects of the phenomenon of lip reading and its use in rehabilitation audio logy.

  9. Integration of Pragmatic and Phonetic Cues in Spoken Word Recognition

    Science.gov (United States)

    Rohde, Hannah; Ettlinger, Marc

    2015-01-01

    Although previous research has established that multiple top-down factors guide the identification of words during speech processing, the ultimate range of information sources that listeners integrate from different levels of linguistic structure is still unknown. In a set of experiments, we investigate whether comprehenders can integrate information from the two most disparate domains: pragmatic inference and phonetic perception. Using contexts that trigger pragmatic expectations regarding upcoming coreference (expectations for either he or she), we test listeners' identification of phonetic category boundaries (using acoustically ambiguous words on the/hi/∼/∫i/continuum). The results indicate that, in addition to phonetic cues, word recognition also reflects pragmatic inference. These findings are consistent with evidence for top-down contextual effects from lexical, syntactic, and semantic cues, but they extend this previous work by testing cues at the pragmatic level and by eliminating a statistical-frequency confound that might otherwise explain the previously reported results. We conclude by exploring the time-course of this interaction and discussing how different models of cue integration could be adapted to account for our results. PMID:22250908

  10. Fundamental frequency and speech intelligibility in background noise.

    Science.gov (United States)

    Brown, Christopher A; Bacon, Sid P

    2010-07-01

    Speech reception in noise is an especially difficult problem for listeners with hearing impairment as well as for users of cochlear implants (CIs). One likely cause of this is an inability to 'glimpse' a target talker in a fluctuating background, which has been linked to deficits in temporal fine-structure processing. A fine-structure cue that has the potential to be beneficial for speech reception in noise is fundamental frequency (F0). A challenging problem, however, is delivering the cue to these individuals. The benefits to speech intelligibility of F0 for both listeners with hearing impairment and users of CIs are reviewed, as well as various methods of delivering F0 to these listeners.

  11. Sound frequency affects speech emotion perception: Results from congenital amusia

    Directory of Open Access Journals (Sweden)

    Sydney eLolli

    2015-09-01

    Full Text Available Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody (MBEP were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under band-pass and unfiltered speech conditions. Results showed a significant correlation between pitch discrimination threshold and emotion identification accuracy for band-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold > 16 Hz performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between band-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation.

  12. Brief Report: Using Individualized Orienting Cues to Facilitate First-Word Acquisition in Non-Responders with Autism

    OpenAIRE

    Koegel, Robert L.; Shirotova, Larisa; Koegel, Lynn K.

    2009-01-01

    Though considerable progress has been made in developing techniques for improving the acquisition of expressive verbal communication in children with autism, research has documented that 10–25% still fail to develop speech. One possible technique that could be significant in facilitating responding for this nonverbal subgroup of children is the use of orienting cues. Using a multiple baseline design, this study examined whether individualized orienting cues could be identified, and whether th...

  13. La pedagogía del Audio Visual

    Directory of Open Access Journals (Sweden)

    José Tavares de Barros

    2015-01-01

    Full Text Available La formación de técnicos y realizadores de cine y video tiene requisitos propios que la distinguen de las demás especialidades en comunicaciones. El autor del artículo explora algunos antecedentes de la experiencia brasileña y los desafíos que enfrentan docentes y estudiantes de la producción audiovisual.

  14. Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

    Directory of Open Access Journals (Sweden)

    Petr Motlicek

    2013-01-01

    Full Text Available We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director. Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.

  15. Audio-visual interactions in product sound design

    NARCIS (Netherlands)

    Özcan, E.; Van Egmond, R.

    2010-01-01

    Consistent product experience requires congruity between product properties such as visual appearance and sound. Therefore, for designing appropriate product sounds by manipulating their spectral-temporal structure, product sounds should preferably not be considered in isolation but as an integral

  16. Preattentive processing of audio-visual emotional signals

    DEFF Research Database (Denmark)

    Föcker, J.; Gondan, Matthias; Röder, B.

    2011-01-01

    to a response conflict rather than interference at earlier, e.g. perceptual processing stages. In Experiment 1, participants had to categorize the valence and rate the intensity of happy, sad, angry and neutral unimodal or bimodal face-voice stimuli. They were asked to rate either the facial or vocal expression...

  17. Audio-visual active speaker tracking in cluttered indoors environments.

    Science.gov (United States)

    Talantzis, Fotios; Pnevmatikakis, Aristodemos; Constantinides, Anthony G

    2009-02-01

    We propose a system for detecting the active speaker in cluttered and reverberant environments where more than one person speaks and moves. Rather than using only audio information, the system utilizes audiovisual information from multiple acoustic and video sensors that feed separate audio and video tracking modules. The audio module operates using a particle filter (PF) and an information-theoretic framework to provide accurate acoustic source location under reverberant conditions. The video subsystem combines in 3-D a number of 2-D trackers based on a variation of Stauffer's adaptive background algorithm with spatiotemporal adaptation of the learning parameters and a Kalman tracker in a feedback configuration. Extensive experiments show that gains are to be expected when fusion of the separate modalities is performed to detect the active speaker.

  18. Audio-visual Training for Lip–reading

    DEFF Research Database (Denmark)

    Gebert, Hermann; Bothe, Hans-Heinrich

    2011-01-01

    This new edited book aims to bring together researchers and developers from various related areas to share their knowledge and experience, to describe current state of the art in mobile and wireless-based adaptive e-learning and to present innovative techniques and solutions that support a person....... The book is an excellent source of comprehensive knowledge and literature on the topic of Learning-Oriented Technologies, Devices and Networks....

  19. Joint Audio-Visual Tracking Using Particle Filters

    Directory of Open Access Journals (Sweden)

    Dmitry N. Zotkin

    2002-11-01

    Full Text Available It is often advantageous to track objects in a scene using multimodal information when such information is available. We use audio as a complementary modality to video data, which, in comparison to vision, can provide faster localization over a wider field of view. We present a particle-filter based tracking framework for performing multimodal sensor fusion for tracking people in a videoconferencing environment using multiple cameras and multiple microphone arrays. One advantage of our proposed tracker is its ability to seamlessly handle temporary absence of some measurements (e.g., camera occlusion or silence. Another advantage is the possibility of self-calibration of the joint system to compensate for imprecision in the knowledge of array or camera parameters by treating them as containing an unknown statistical component that can be determined using the particle filter framework during tracking. We implement the algorithm in the context of a videoconferencing and meeting recording system. The system also performs high-level semantic analysis of the scene by keeping participant tracks, recognizing turn-taking events and recording an annotated transcript of the meeting. Experimental results are presented. Our system operates in real-time and is shown to be robust and reliable.

  20. Audio-Visual Language--Verbal and Visual Codes.

    Science.gov (United States)

    Doelker, Christian

    1980-01-01

    Figurative (visual representation) and commentator (verbal representation) functions and their use in audiovisual media are discussed. Three categories each of visual and aural media are established: real images, artificial forms, and graphic signs; and sound effects, music, and the spoken language. (RAO)

  1. In Search of an Audio Visual Composing Process.

    Science.gov (United States)

    Lorac, Carol

    Rules for the development and application of audiovisual material are constantly being redesigned whether one is concerned with technological aspects, economic and policy structures, social impact, or media practice. This paper outlines the work being done by the International Media Literacy Project at the Royal University of London. The project…

  2. A Joint Audio-Visual Approach to Audio Localization

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2015-01-01

    Localization of audio sources is an important research problem, e.g., to facilitate noise reduction. In the recent years, the problem has been tackled using distributed microphone arrays (DMA). A common approach is to apply direction-of-arrival (DOA) estimation on each array (denoted as nodes...... time-of-flight cameras. Moreover, we propose an optimal method for weighting such DOA and range information for audio localization. Our experiments on both synthetic and real data show that there is a clear, potential advantage of using the joint audiovisual localization framework....

  3. Influence of Audio-Visual Presentations on Learning Abstract Concepts.

    Science.gov (United States)

    Lai, Shu-Ling

    2000-01-01

    Describes a study of college students that investigated whether various types of visual illustrations influenced abstract concept learning when combined with audio instruction. Discusses results of analysis of variance and pretest posttest scores in relation to learning performance, attitudes toward the computer-based program, and differences in…

  4. Effect of Audio-Visual Intervention Program on Cognitive ...

    African Journals Online (AJOL)

    Science, Technology and Arts Research Journal. Journal Home · ABOUT · Advanced Search · Current Issue · Archives · Journal Home > Vol 1, No 4 (2012) >. Log in or Register to get access to full text downloads.

  5. Audio-Visual Equipment Depreciation. RDU-75-07.

    Science.gov (United States)

    Drake, Miriam A.; Baker, Martha

    A study was conducted at Purdue University to gather operational and budgetary planning data for the Libraries and Audiovisual Center. The objectives were: (1) to complete a current inventory of equipment including year of purchase, costs, and salvage value; (2) to determine useful life data for general classes of equipment; and (3) to determine…

  6. An Audio-Visual Presentation of Black Francophone Poetry.

    Science.gov (United States)

    Bruner, Charlotte H.

    1982-01-01

    A college class project to develop a videocassette presentation of African, Caribbean, and Afro-American French poetry is described from its inception through the processes of obtaining copyright and translation permissions, arranging scripts, presenting at various functions, and reception by Francophone and non-Francophone audiences. (MSE)

  7. Audio-Visual Integration Modifies Emotional Judgment in Music

    OpenAIRE

    Shen-Yuan Su; Su-Ling Yeh

    2011-01-01

    The conventional view that perceived emotion in music is derived mainly from auditory signals has led to neglect of the contribution of visual image. In this study, we manipulated mode (major vs. minor) and examined the influence of a video image on emotional judgment in music. Melodies in either major or minor mode were controlled for tempo and rhythm and played to the participants. We found that Taiwanese participants, like Westerners, judged major melodies as expressing positive, and minor...

  8. Recommended Audio-Visual Materials on South Africa.

    Science.gov (United States)

    Crofts, Marylee

    1984-01-01

    Presents a descriptive list of films, videocassettes, and slide sets available and recommended for teaching about South Africa and Namibia. Organizes cited materials according to the subjects they cover, including resistance to apartheid, the police state, homelands and Bantustans, the struggle of women, labor, the United States role, white rule,…

  9. Audio-Visual Preferences and Tranquillity Ratings in Urban Areas

    Directory of Open Access Journals (Sweden)

    Luca Cassina

    2017-12-01

    Full Text Available During a survey related to acoustic and visual perception of users of urban areas, 614 people have been interviewed in Pisa (Italy. The work aims to identify and quantify the effects of parameters influencing the perception of tranquillity in order to understand the soundscape and to propose a method based on the perception of tranquillity for the detection of quiet areas within urban ones. A linear model that predicts the tranquillity perceived in different environments, based on their visual and acoustic characteristics, is proposed. Users were interviewed by operators inside the areas, using a direct approach of standardized questionnaires and oral questions. Simultaneous noise measurements and soundwalks have been performed, together with visual registrations. The linear model obtained predicts the perceived tranquillity based on the statistical level LA10 (A-weighted noise level exceeded for 10% of the measurement time the sound sources and visual elements. The perceived tranquillity results negatively correlated to LA10 and to the presence of sound sources or negative visual elements. The presence of beneficial sound sources is positively correlated to the perceived tranquillity. However, the effect of the noise level is regulated by environmental characteristics. Perceived tranquillity is proposed as an indicator to identify quiet areas in the urban environment, according to European Directive 49/2002/EC. The obtained model identifies the areas that would get a higher tranquillity value than a fixed threshold value and therefore would be perceived as quiet. The model can be used as a cost-benefit analysis support tool to identify the best solution between the reduction of noise levels and the regeneration of urban areas, referring to the tranquillity perceived by the users.

  10. Sound of mind : electrophysiological and behavioural evidence for the role of context, variation and informativity in human speech processing

    NARCIS (Netherlands)

    Nixon, Jessie Sophia

    2014-01-01

    Spoken communication involves transmission of a message which takes physical form in acoustic waves. Within any given language, acoustic cues pattern in language-specific ways along language-specific acoustic dimensions to create speech sound contrasts. These cues are utilized by listeners to

  11. Dual-carrier processing to convey temporal fine structure cues: Implications for cochlear implants

    Science.gov (United States)

    Apoux, Frédéric; Youngdahl, Carla L.; Yoho, Sarah E.; Healy, Eric W.

    2015-01-01

    Speech intelligibility in noise can be degraded by using vocoder processing to alter the temporal fine structure (TFS). Here it is argued that this degradation is not attributable to the loss of speech information potentially present in the TFS. Instead it is proposed that the degradation results from the loss of sound-source segregation information when two or more carriers (i.e., TFS) are substituted with only one as a consequence of vocoder processing. To demonstrate this segregation role, vocoder processing involving two carriers, one for the target and one for the background, was implemented. Because this approach does not preserve the speech TFS, it may be assumed that any improvement in intelligibility can only be a consequence of the preserved carrier duality and associated segregation cues. Three experiments were conducted using this “dual-carrier” approach. All experiments showed substantial sentence intelligibility in noise improvements compared to traditional single-carrier conditions. In several conditions, the improvement was so substantial that intelligibility approximated that for unprocessed speech in noise. A foreseeable and potentially promising implication for the dual-carrier approach involves implementation into cochlear implant speech processors, where it may provide the TFS cues necessary to segregate speech from noise. PMID:26428784

  12. Development in Children’s Interpretation of Pitch Cues to Emotions

    Science.gov (United States)

    Quam, Carolyn; Swingley, Daniel

    2012-01-01

    Young infants respond to positive and negative speech prosody (Fernald, 1993), yet 4-year-olds rely on lexical information when it conflicts with paralinguistic cues to approval or disapproval (Friend, 2003). This article explores this surprising phenomenon, testing 118 2- to 5-year-olds’ use of isolated pitch cues to emotions in interactive tasks. Only 4- to 5-year-olds consistently interpreted exaggerated, stereotypically happy or sad pitch contours as evidence that a puppet had succeeded or failed to find his toy (Experiment 1) or was happy or sad (Experiments 2, 3). Two- and three-year-olds exploited facial and body-language cues in the same task. The authors discuss the implications of this late-developing use of pitch cues to emotions, relating them to other functions of pitch. PMID:22181680

  13. Gaze Cueing by Pareidolia Faces

    Directory of Open Access Journals (Sweden)

    Kohske Takahashi

    2013-12-01

    Full Text Available Visual images that are not faces are sometimes perceived as faces (the pareidolia phenomenon. While the pareidolia phenomenon provides people with a strong impression that a face is present, it is unclear how deeply pareidolia faces are processed as faces. In the present study, we examined whether a shift in spatial attention would be produced by gaze cueing of face-like objects. A robust cueing effect was observed when the face-like objects were perceived as faces. The magnitude of the cueing effect was comparable between the face-like objects and a cartoon face. However, the cueing effect was eliminated when the observer did not perceive the objects as faces. These results demonstrated that pareidolia faces do more than give the impression of the presence of faces; indeed, they trigger an additional face-specific attentional process.

  14. A speech reception in noise test for preschool children (the Galker-test): Validity, reliability and acceptance.

    Science.gov (United States)

    Lauritsen, Maj-Britt Glenn; Kreiner, Svend; Söderström, Margareta; Dørup, Jens; Lous, Jørgen

    2015-10-01

    This study evaluates initial validity and reliability of the "Galker test of speech reception in noise" developed for Danish preschool children suspected to have problems with hearing or understanding speech against strict psychometric standards and assesses acceptance by the children. The Galker test is an audio-visual, computerised, word discrimination test in background noise, originally comprised of 50 word pairs. Three hundred and eighty eight children attending ordinary day care centres and aged 3-5 years were included. With multiple regression and the Rasch item response model it was examined whether the total score of the Galker test validly reflected item responses across subgroups defined by sex, age, bilingualism, tympanometry, audiometry and verbal comprehension. A total of 370 children (95%) accepted testing and 339 (87%) completed all 50 items. The analysis showed that 35 items fitted the Rasch model. Reliability was 0.75 before and after exclusion of the 15 non-fitting items. In the stepwise linear regression model age group of children could explain 20% of the variation in Galker-35-score, sex 1%, second language at home 4%, tympanometry in best ear 2%, and parental education another 2%. Other variable did not reach significance. The Galker-35 was well accepted by children down to the age of 3 years and results indicate that the scale represents construct valid and reliable measurement. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  15. Discrepant visual speech facilitates covert selective listening in "cocktail party" conditions.

    Science.gov (United States)

    Williams, Jason A

    2012-06-01

    The presence of congruent visual speech information facilitates the identification of auditory speech, while the addition of incongruent visual speech information often impairs accuracy. This latter arrangement occurs naturally when one is being directly addressed in conversation but listens to a different speaker. Under these conditions, performance may diminish since: (a) one is bereft of the facilitative effects of the corresponding lip motion and (b) one becomes subject to visual distortion by incongruent visual speech; by contrast, speech intelligibility may be improved due to (c) bimodal localization of the central unattended stimulus. Participants were exposed to centrally presented visual and auditory speech while attending to a peripheral speech stream. In some trials, the lip movements of the central visual stimulus matched the unattended speech stream; in others, the lip movements matched the attended peripheral speech. Accuracy for the peripheral stimulus was nearly one standard deviation greater with incongruent visual information, compared to the congruent condition which provided bimodal pattern recognition cues. Likely, the bimodal localization of the central stimulus further differentiated the stimuli and thus facilitated intelligibility. Results are discussed with regard to similar findings in an investigation of the ventriloquist effect, and the relative strength of localization and speech cues in covert listening.

  16. Effects of Visual Speech on Early Auditory Evoked Fields - From the Viewpoint of Individual Variance.

    Directory of Open Access Journals (Sweden)

    Izumi Yahata

    Full Text Available The effects of visual speech (the moving image of the speaker's face uttering speech sound on early auditory evoked fields (AEFs were examined using a helmet-shaped magnetoencephalography system in 12 healthy volunteers (9 males, mean age 35.5 years. AEFs (N100m in response to the monosyllabic sound /be/ were recorded and analyzed under three different visual stimulus conditions, the moving image of the same speaker's face uttering /be/ (congruent visual stimuli or uttering /ge/ (incongruent visual stimuli, and visual noise (still image processed from speaker's face using a strong Gaussian filter: control condition. On average, latency of N100m was significantly shortened in the bilateral hemispheres for both congruent and incongruent auditory/visual (A/V stimuli, compared to the control A/V condition. However, the degree of N100m shortening was not significantly different between the congruent and incongruent A/V conditions, despite the significant differences in psychophysical responses between these two A/V conditions. Moreover, analysis of the magnitudes of these visual effects on AEFs in individuals showed that the lip-reading effects on AEFs tended to be well correlated between the two different audio-visual conditions (congruent vs. incongruent visual stimuli in the bilateral hemispheres but were not significantly correlated between right and left hemisphere. On the other hand, no significant correlation was observed between the magnitudes of visual speech effects and psychophysical responses. These results may indicate that the auditory-visual interaction observed on the N100m is a fundamental process which does not depend on the congruency of the visual information.

  17. Dog-directed speech: why do we use it and do dogs pay attention to it?

    Science.gov (United States)

    Ben-Aderet, Tobey; Gallego-Abenza, Mario; Reby, David; Mathevon, Nicolas

    2017-01-11

    Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners. © 2017 The Author(s).

  18. Visual form Cues, Biological Motions, Auditory Cues, and Even Olfactory Cues Interact to Affect Visual Sex Discriminations

    Directory of Open Access Journals (Sweden)

    Rick Van Der Zwan

    2011-05-01

    Full Text Available Johnson and Tassinary (2005 proposed that visually perceived sex is signalled by structural or form cues. They suggested also that biological motion cues signal sex, but do so indirectly. We previously have shown that auditory cues can mediate visual sex perceptions (van der Zwan et al., 2009. Here we demonstrate that structural cues to body shape are alone sufficient for visual sex discriminations but that biological motion cues alone are not. Interestingly, biological motions can resolve ambiguous structural cues to sex, but so can olfactory cues even when those cues are not salient. To accommodate these findings we propose an alternative model of the processes mediating visual sex discriminations: Form cues can be used directly if they are available and unambiguous. If there is any ambiguity other sensory cues are used to resolve it, suggesting there may exist sex-detectors that are stimulus independent.

  19. Music and speech prosody: a common rhythm.

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).

  20. Music and speech prosody: A common rhythm

    Directory of Open Access Journals (Sweden)

    Maija eHausen

    2013-09-01

    Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.

  1. Music and speech prosody: a common rhythm

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022

  2. Orienting asymmetries in dogs' responses to different communicatory components of human speech.

    Science.gov (United States)

    Ratcliffe, Victoria F; Reby, David

    2014-12-15

    It is well established that in human speech perception the left hemisphere (LH) of the brain is specialized for processing intelligible phonemic (segmental) content (e.g., [1-3]), whereas the right hemisphere (RH) is more sensitive to prosodic (suprasegmental) cues. Despite evidence that a range of mammal species show LH specialization when processing conspecific vocalizations, the presence of hemispheric biases in domesticated animals' responses to the communicative components of human speech has never been investigated. Human speech is familiar and relevant to domestic dogs (Canis familiaris), who are known to perceive both segmental phonemic cues and suprasegmental speaker-related and emotional prosodic cues. Using the head-orienting paradigm, we presented dogs with manipulated speech and tones differing in segmental or suprasegmental content and recorded their orienting responses. We found that dogs showed a significant LH bias when presented with a familiar spoken command in which the salience of meaningful phonemic (segmental) cues was artificially increased but a significant RH bias in response to commands in which the salience of intonational or speaker-related (suprasegmental) vocal cues was increased. Our results provide insights into mechanisms of interspecific vocal perception in a domesticated mammal and suggest that dogs may share ancestral or convergent hemispheric specializations for processing the different functional communicative components of speech with human listeners. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Development of speech glimpsing in synchronously and asynchronously modulated noise.

    Science.gov (United States)

    Hall, Joseph W; Buss, Emily; Grose, John H

    2014-06-01

    This study investigated development of the ability to integrate glimpses of speech in modulated noise. Noise was modulated synchronously across frequency or asynchronously such that when noise below 1300 Hz was "off," noise above 1300 Hz was "on," and vice versa. Asynchronous masking was used to examine the ability of listeners to integrate speech glimpses separated across time and frequency. The study used the Word Intelligibility by Picture Identification (WIPI) test and included adults, older children (age 8-10 yr) and younger children (5-7 yr). Results showed poorer masking release for the children than the adults for synchronous modulation but not for asynchronous modulation. It is possible that children can integrate cues relatively well when all intervals provide at least partial speech information (asynchronous modulation) but less well when some intervals provide little or no information (synchronous modulation). Control conditions indicated that children appeared to derive less benefit than adults from speech cues below 1300 Hz. This frequency effect was supported by supplementary conditions where the noise was unmodulated and the speech was low- or high-pass filtered. Possible sources of the developmental frequency effect include differences in frequency weighting, effective speech bandwidth, and the signal-to-noise ratio in the unmodulated noise condition.

  4. Modulation of sensory and motor cortex activity during speech preparation.

    Science.gov (United States)

    Mock, Jeffrey R; Foundas, Anne L; Golob, Edward J

    2011-03-01

    Previous studies have shown that speaking affects auditory and motor cortex responsiveness, which may reflect the influence of motor efference copy. If motor efference copy is involved, it would also likely influence auditory and motor cortical activity when preparing to speak. We tested this hypothesis by using auditory event-related potentials and transcranial magnetic stimulation (TMS) of the motor cortex. In the speech condition subjects were visually cued to prepare a vocal response to a subsequent target, which was compared to a control condition without speech preparation. Auditory and motor cortex responsiveness at variable times between the cue and target were probed with an acoustic stimulus (Experiment 1, tone or consonant-vowels) or motor cortical TMS (Experiment 2). Acoustic probes delivered shortly before targets elicited a fronto-central negative potential in the speech condition. Current density analysis showed that auditory cortical activity was attenuated at the beginning of the slow potential in the speech condition. Sensory potentials in response to probes had shorter latencies (N100) and larger amplitudes (P200) when consonant-vowels matched the sound of cue words. Motor cortex excitability was greater in the speech than in the control condition at all time points before picture onset. The results suggest that speech preparation induces top-down regulation of sensory and motor cortex responsiveness, with different time courses for auditory and motor systems. © 2011 The Authors. European Journal of Neuroscience © 2011 Federation of European Neuroscience Societies and Blackwell Publishing Ltd.

  5. Pet-directed speech draws adult dogs' attention more efficiently than Adult-directed speech.

    Science.gov (United States)

    Jeannin, Sarah; Gilbert, Caroline; Amy, Mathieu; Leboucher, Gérard

    2017-07-10

    Humans speak to dogs using a special speech register called Pet-Directed Speech (PDS) which is very similar to Infant-Directed Speech (IDS) used by parents when talking to young infants. These two type of speech share prosodic features that are distinct from the typical Adult-Directed Speech (ADS): a high pitched voice and an increased pitch variation. So far, only one study has investigated the effect of PDS on dogs' attention. We video recorded 44 adult pet dogs and 19 puppies when listening to the same phrase enounced either in ADS or in PDS or in IDS. The phrases were previously recorded and were broadcasted via a loudspeaker placed in front of the dog. The total gaze duration of the dogs toward the loudspeaker, was used as a proxy of attention. Results show that adult dogs are significantly more attentive to PDS than to ADS and that their attention significantly increases along with the rise of the fundamental frequency of human' speech. It is likely that the exaggerated prosody of PDS is used by owners as an ostensive cue for dogs that facilitates the effectiveness of their communication, and should represent an evolutionarily determined adaptation that benefits the regulation and maintenance of their relationships.

  6. Neural Oscillations Carry Speech Rhythm through to Comprehension.

    Science.gov (United States)

    Peelle, Jonathan E; Davis, Matthew H

    2012-01-01

    A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners' processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging - particularly electroencephalography (EEG) and magnetoencephalography (MEG) - point to phase locking by ongoing cortical oscillations to low-frequency information (~4-8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain.

  7. Speech intelligibility for normal hearing and hearing-impaired listeners in simulated room acoustic conditions

    DEFF Research Database (Denmark)

    Arweiler, Iris; Dau, Torsten; Poulsen, Torben

    Speech intelligibility depends on many factors such as room acoustics, the acoustical properties and location of the signal and the interferers, and the ability of the (normal and impaired) auditory system to process monaural and binaural sounds. In the present study, the effect of reverberation...... on spatial release from masking was investigated in normal hearing and hearing impaired listeners using three types of interferers: speech shaped noise, an interfering female talker and speech-modulated noise. Speech reception thresholds (SRT) were obtained in three simulated environments: a listening room...... intelligibility and when binaural cues are effective. (Poster). Partly from HEARCOM project....

  8. Perception of co-speech gestures in aphasic patients: A visual exploration study during the observation of dyadic conversations

    OpenAIRE

    Preisig, Basil

    2015-01-01

    Background: Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or...

  9. Tolerable hearing aid delays. II. Estimation of limits imposed during speech production.

    Science.gov (United States)

    Stone, Michael A; Moore, Brian C J

    2002-08-01

    introduced by digital hearing aids is primarily determined by aspects of the perception of self-generated speech. Speech production, on average, is hardly affected unless the processing delay exceeds 30 msec. The permissible limit of 20 to 30 msec is smaller than the delays at which audio-visual integration is disrupted.

  10. Effects of deafness on acoustic characteristics of American English tense/lax vowels in maternal speech to infants

    Science.gov (United States)

    Kondaurova, Maria V.; Bergeson, Tonya R.; Dilley, Laura C.

    2012-01-01

    Recent studies have demonstrated that mothers exaggerate phonetic properties of infant-directed (ID) speech. However, these studies focused on a single acoustic dimension (frequency), whereas speech sounds are composed of multiple acoustic cues. Moreover, little is known about how mothers adjust phonetic properties of speech to children with hearing loss. This study examined mothers’ production of frequency and duration cues to the American English tense/lax vowel contrast in speech to profoundly deaf (N = 14) and normal-hearing (N = 14) infants, and to an adult experimenter. First and second formant frequencies and vowel duration of tense (/i/, /u/) and lax (/I/, /ʊ/) vowels were measured. Results demonstrated that for both infant groups mothers hyperarticulated the acoustic vowel space and increased vowel duration in ID speech relative to adult-directed speech. Mean F2 values were decreased for the /u/ vowel and increased for the /I/ vowel, and vowel duration was longer for the /i/, /u/, and /I/ vowels in ID speech. However, neither acoustic cue differed in speech to hearing-impaired or normal-hearing infants. These results suggest that both formant frequencies and vowel duration that differentiate American English tense/lx vowel contrasts are modified in ID speech regardless of the hearing status of the addressee. PMID:22894224

  11. Mistaking minds and machines: How speech affects dehumanization and anthropomorphism.

    Science.gov (United States)

    Schroeder, Juliana; Epley, Nicholas

    2016-11-01

    Treating a human mind like a machine is an essential component of dehumanization, whereas attributing a humanlike mind to a machine is an essential component of anthropomorphism. Here we tested how a cue closely connected to a person's actual mental experience-a humanlike voice-affects the likelihood of mistaking a person for a machine, or a machine for a person. We predicted that paralinguistic cues in speech are particularly likely to convey the presence of a humanlike mind, such that removing voice from communication (leaving only text) would increase the likelihood of mistaking the text's creator for a machine. Conversely, adding voice to a computer-generated script (resulting in speech) would increase the likelihood of mistaking the text's creator for a human. Four experiments confirmed these hypotheses, demonstrating that people are more likely to infer a human (vs. computer) creator when they hear a voice expressing thoughts than when they read the same thoughts in text. Adding human visual cues to text (i.e., seeing a person perform a script in a subtitled video clip), did not increase the likelihood of inferring a human creator compared with only reading text, suggesting that defining features of personhood may be conveyed more clearly in speech (Experiments 1 and 2). Removing the naturalistic paralinguistic cues that convey humanlike capacity for thinking and feeling, such as varied pace and intonation, eliminates the humanizing effect of speech (Experiment 4). We discuss implications for dehumanizing others through text-based media, and for anthropomorphizing machines through speech-based media. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  12. High-frequency neural activity predicts word parsing in ambiguous speech streams.

    Science.gov (United States)

    Kösem, Anne; Basirat, Anahita; Azizi, Leila; van Wassenhove, Virginie

    2016-12-01

    During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept. Copyright © 2016 the American Physiological Society.

  13. Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes.

    OpenAIRE

    Setti, Annalisa; Burke, Kate E.; Kenny, RoseAnne; Newell, Fiona N.

    2013-01-01

    PUBLISHED PMC3760087 Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the “McGurk illusion,” in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the sa...

  14. The Generalization of Auditory Accommodation to Altered Spectral Cues.

    Science.gov (United States)

    Watson, Christopher J G; Carlile, Simon; Kelly, Heather; Balachandar, Kapilesh

    2017-09-14

    The capacity of healthy adult listeners to accommodate to altered spectral cues to the source locations of broadband sounds has now been well documented. In recent years we have demonstrated that the degree and speed of accommodation are improved by using an integrated sensory-motor training protocol under anechoic conditions. Here we demonstrate that the learning which underpins the localization performance gains during the accommodation process using anechoic broadband training stimuli generalize to environmentally relevant scenarios. As previously, alterations to monaural spectral cues were produced by fitting participants with custom-made outer ear molds, worn during waking hours. Following acute degradations in localization performance, participants then underwent daily sensory-motor training to improve localization accuracy using broadband noise stimuli over ten days. Participants not only demonstrated post-training improvements in localization accuracy for broadband noises presented in the same set of positions used during training, but also for stimuli presented in untrained locations, for monosyllabic speech sounds, and for stimuli presented in reverberant conditions. These findings shed further light on the neuroplastic capacity of healthy listeners, and represent the next step in the development of training programs for users of assistive listening devices which degrade localization acuity by distorting or bypassing monaural cues.

  15. Speech-Language Pathologists

    Science.gov (United States)

    ... State & Area Data Explore resources for employment and wages by state and area for speech-language pathologists. Similar Occupations Compare the job duties, education, job growth, and pay of speech-language pathologists ...

  16. Speech disorders - children

    Science.gov (United States)

    ... page: //medlineplus.gov/ency/article/001430.htm Speech disorders - children To use the sharing features on this page, ... Voice disorders Speech disorders are different from language disorders in children . Language disorders refer to someone having difficulty with: ...

  17. Apraxia of Speech

    Science.gov (United States)

    ... children, such as in a classroom. Therefore, speech-language therapy is necessary for children with AOS as well ... with AOS. Frequent, intensive, one-on-one speech-language therapy sessions are needed for both children and adults ...

  18. Speech perception as categorization

    National Research Council Canada - National Science Library

    Holt, Lori L; Lotto, Andrew J

    2010-01-01

    Speech perception (SP) most commonly refers to the perceptual mapping from the highly variable acoustic speech signal to a linguistic representation, whether it be phonemes, diphones, syllables, or words...

  19. Mathematical modeling of vowel perception by users of analog multichannel cochlear implants: temporal and channel-amplitude cues.

    Science.gov (United States)

    Svirsky, M A

    2000-03-01

    A "multidimensional phoneme identification" (MPI) model is proposed to account for vowel perception by cochlear implant users. A multidimensional extension of the Durlach-Braida model of intensity perception, this model incorporates an internal noise model and a decision model to account separately for errors due to poor sensitivity and response bias. The MPI model provides a complete quantitative description of how listeners encode and combine acoustic cues, and how they use this information to determine which sound they heard. Thus, it allows for testing specific hypotheses about phoneme identification in a very stringent fashion. As an example of the model's application, vowel identification matrices obtained with synthetic speech stimuli (including "conflicting cue" conditions [Dorman et al., J. Acoust. Soc. Am. 92, 3428-3432 (1992)] were examined. The listeners were users of the "compressed-analog" stimulation strategy, which filters the speech spectrum into four partly overlapping frequency bands and delivers each signal to one of four electrodes in the cochlea. It was found that a simple model incorporating one temporal cue (i.e., an acoustic cue based only on the time waveforms delivered to the most basal channel) and spectral cues (based on the distribution of amplitudes among channels) can be quite successful in explaining listener responses. The new approach represented by the MPI model may be used to obtain useful insights about speech perception by cochlear implant users in particular, and by all kinds of listeners in general.

  20. The effects of speech motor preparation on auditory perception

    Science.gov (United States)

    Myers, John

    Perception and action are coupled via bidirectional relationships between sensory and motor systems. Motor systems influence sensory areas by imparting a feedforward influence on sensory processing termed "motor efference copy" (MEC). MEC is suggested to occur in humans because speech preparation and production modulate neural measures of auditory cortical activity. However, it is not known if MEC can affect auditory perception. We tested the hypothesis that during speech preparation auditory thresholds will increase relative to a control condition, and that the increase would be most evident for frequencies that match the upcoming vocal response. Participants performed trials in a speech condition that contained a visual cue indicating a vocal response to prepare (one of two frequencies), followed by a go signal to speak. To determine threshold shifts, voice-matched or -mismatched pure tones were presented at one of three time points between the cue and target. The control condition was the same except the visual cues did not specify a response and subjects did not speak. For each participant, we measured f0 thresholds in isolation from the task in order to establish baselines. Results indicated that auditory thresholds were highest during speech preparation, relative to baselines and a non-speech control condition, especially at suprathreshold levels. Thresholds for tones that matched the frequency of planned responses gradually increased over time, but sharply declined for the mismatched tones shortly before targets. Findings support the hypothesis that MEC influences auditory perception by modulating thresholds during speech preparation, with some specificity relative to the planned response. The threshold increase in tasks vs. baseline may reflect attentional demands of the tasks.

  1. The effects of noise vocoding on speech quality perception.

    Science.gov (United States)

    Anderson, Melinda C; Arehart, Kathryn H; Kates, James M

    2014-03-01

    Speech perception depends on access to spectral and temporal acoustic cues. Temporal cues include slowly varying amplitude changes (i.e. temporal envelope, TE) and quickly varying amplitude changes associated with the center frequency of the auditory filter (i.e. temporal fine structure, TFS). This study quantifies the effects of TFS randomization through noise vocoding on the perception of speech quality by parametrically varying the amount of original TFS available above 1500Hz. The two research aims were: 1) to establish the role of TFS in quality perception, and 2) to determine if the role of TFS in quality perception differs between subjects with normal hearing and subjects with sensorineural hearing loss. Ratings were obtained from 20 subjects (10 with normal hearing and 10 with hearing loss) using an 11-point quality scale. Stimuli were processed in three different ways: 1) A 32-channel noise-excited vocoder with random envelope fluctuations in the noise carrier, 2) a 32-channel noise-excited vocoder with the noise-carrier envelope smoothed, and 3) removal of high-frequency bands. Stimuli were presented in quiet and in babble noise at 18dB and 12dB signal-to-noise ratios. TFS randomization had a measurable detrimental effect on quality ratings for speech in quiet and a smaller effect for speech in background babble. Subjects with normal hearing and subjects with sensorineural hearing loss provided similar quality ratings for noise-vocoded speech. Copyright © 2013 Elsevier B.V. All rights reserved.

  2. Speech perception studies using a multichannel electrotactile speech processor, residual hearing, and lipreading.

    Science.gov (United States)

    Cowan, R S; Alcantara, J I; Whitford, L A; Blamey, P J; Clark, G M

    1989-06-01

    Three studies are reported on the speech perception of normally hearing and hearing-impaired adults using combinations of visual, auditory, and tactile input. In study 1, mean scores for four normally hearing subjects showed that addition of tactile information, provided through the multichannel electrotactile speech processor, to either audition alone (300-Hz low-pass-filtered speech) or lipreading plus audition resulted in significant improvements in phoneme and word discrimination scores. Information transmission analyses demonstrated the effectiveness of the tactile aid in providing cues to duration, F1 and F2 features for vowels, and manner of articulation features for consonants, especially features requiring detection and discrimination of high-frequency information. In study 2, six different cutoff frequencies were used for a low-pass-filtered auditory signal. Mean scores for vowel and consonant identification were significantly higher with the addition of tactile input to audition alone at each cutoff frequency up to 1500 Hz. The mean speechtracking rate was also significantly increased by the additional tactile input up to 1500 Hz. Study 3 examined speech discrimination of three hearing-impaired adults. Additional information available through the tactile aid was shown to improve speech discrimination scores; however, the degree of increase was inversely related to the level of residual hearing. Results indicate that the electrotactile aid may be useful for patients with little residual hearing and for the severely to profoundly hearing impaired, who could benefit from the high-frequency information presented through the tactile modality, but unavailable through hearing aids.

  3. Infant-directed speech: Final syllable lengthening and rate of speech

    Science.gov (United States)

    Church, Robyn; Bernhardt, Barbara; Shi, Rushen; Pichora-Fuller, Kathleen

    2005-04-01

    Speech rate has been reported to be slower in infant-directed speech (IDS) than in adult-directed speech (ADS). Studies have also found phrase-final lengthening to be more exaggerated in IDS compared with ADS. In our study we asked whether the observed overall slower rate of IDS is due to exaggerated utterance-final syllable lengthening. Two mothers of preverbal English-learning infants each participated in two recording sessions, one with her child, and another with an adult friend. The results showed an overall slower rate in IDS compared to ADS. However, when utterance-final syllables were excluded from the calculation, the speech rate in IDS and ADS did not differ significantly. The duration of utterance-final syllables differed significantly for IDS versus ADS. Thus, the overall slower rate of IDS was due to the extra-long final syllable occurring in relatively short utterances. The comparable pre-final speech rate for IDS and ADS further accentuates the final syllable lengthening in IDS. As utterances in IDS are typically phrases or clauses, the particularly strong final-lengthening cue could potentially facilitate infants' segmentation of these syntactic units. These findings are consistent with the existing evidence that pre-boundary lengthening is important in the processing of major syntactic units in English-learning infants.

  4. Individual differences in speech-in-noise perception parallel neural speech processing and attention in preschoolers.

    Science.gov (United States)

    Thompson, Elaine C; Woodruff Carr, Kali; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina

    2017-02-01

    From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3-5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ∼12 months), we followed a cohort of 59 preschoolers, ages 3.0-4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Speech segregation based on sound localization

    Science.gov (United States)

    Roman, Nicoleta; Wang, Deliang; Brown, Guy J.

    2003-10-01

    At a cocktail party, one can selectively attend to a single voice and filter out all the other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel, supervised learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial localization cues: interaural time differences (ITD) and interaural intensity differences (IID). Motivated by the auditory masking effect, the notion of an ``ideal'' time-frequency binary mask is suggested, which selects the target if it is stronger than the interference in a local time-frequency (T-F) unit. It is observed that within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic changes for estimated ITD and IID. For a given spatial configuration, this interaction produces characteristic clustering in the binaural feature space. Consequently, pattern classification is performed in order to estimate ideal binary masks. A systematic evaluation in terms of signal-to-noise ratio as well as automatic speech recognition performance shows that the resulting system produces masks very close to ideal binary ones. A quantitative comparison shows that the model yields significant improvement in performance over an existing approach. Furthermore, under certain conditions the model produces large speech intelligibility improvements with normal listeners.

  6. Lip movements entrain the observers? low-frequency brain oscillations to facilitate speech intelligibility

    OpenAIRE

    Park, Hyojin; Kayser, Christoph; Thut, Gregor; Gross, Joachim

    2016-01-01

    eLife digest People are able communicate effectively with each other even in very noisy places where it is difficult to actually hear what others are saying. In a face-to-face conversation, people detect and respond to many physical cues ? including body posture, facial expressions, head and eye movements and gestures ? alongside the sound cues. Lip movements are particularly important and contain enough information to allow trained observers to understand speech even if they cannot hear the ...

  7. Evaluation of multimodal ground cues

    DEFF Research Database (Denmark)

    Nordahl, Rolf; Lecuyer, Anatole; Serafin, Stefania

    2012-01-01

    This chapter presents an array of results on the perception of ground surfaces via multiple sensory modalities,with special attention to non visual perceptual cues, notably those arising from audition and haptics, as well as interactions between them. It also reviews approaches to combining...

  8. Optimal assessment of multiple cues

    NARCIS (Netherlands)

    Fawcett, Tim W; Johnstone, Rufus A

    2003-01-01

    In a wide range of contexts from mate choice to foraging, animals are required to discriminate between alternative options on the basis of multiple cues. How should they best assess such complex multicomponent stimuli? Here, we construct a model to investigate this problem, focusing on a simple case

  9. Perceived gender in clear and conversational speech

    Science.gov (United States)

    Booz, Jaime A.

    Although many studies have examined acoustic and sociolinguistic differences between male and female speech, the relationship between talker speaking style and perceived gender has not yet been explored. The present study attempts to determine whether clear speech, a style adopted by talkers who perceive some barrier to effective communication, shifts perceptions of femininity for male and female talkers. Much of our understanding of gender perception in voice and speech is based on sustained vowels or single words, eliminating temporal, prosodic, and articulatory cues available in more naturalistic, connected speech. Thus, clear and conversational sentence stimuli, selected from the 41 talkers of the Ferguson Clear Speech Database (Ferguson, 2004) were presented to 17 normal-hearing listeners, aged 18 to 30. They rated the talkers' gender using a visual analog scale with "masculine" and "feminine" endpoints. This response method was chosen to account for within-category shifts of gender perception by allowing nonbinary responses. Mixed-effects regression analysis of listener responses revealed a small but significant effect of speaking style, and this effect was larger for male talkers than female talkers. Because of the high degree of talker variability observed for talker gender, acoustic analyses of these sentences were undertaken to determine the relationship between acoustic changes in clear and conversational speech and perceived femininity. Results of these analyses showed that mean fundamental frequency (fo) and f o standard deviation were significantly correlated to perceived gender for both male and female talkers, and vowel space was significantly correlated only for male talkers. Speaking rate and breathiness measures (CPPS) were not significantly related for either group. Outcomes of this study indicate that adopting a clear speaking style is correlated with increases in perceived femininity. Although the increase was small, some changes associated

  10. The relative cueing power of F0 and duration in German prominence perception

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Winkler, Jana

    2017-01-01

    Previous studies showed for German and other (West) Germanic language, including English, that perceived syllable prominence is primarily controlled by changes in duration and F0, with the latter cue being more powerful than the former. Our study is an initial approach to develop this prominence ...... effect of a 30% increase in duration of a neighboring syllable. These numbers are fairly stable across a large range of absolute F0 and duration levels and hence useful in speech technology....

  11. Switching of auditory attention in "cocktail-party" listening: ERP evidence of cueing effects in younger and older adults.

    Science.gov (United States)

    Getzmann, Stephan; Jasny, Julian; Falkenstein, Michael

    2017-02-01

    Verbal communication in a "cocktail-party situation" is a major challenge for the auditory system. In particular, changes in target speaker usually result in declined speech perception. Here, we investigated whether speech cues indicating a subsequent change in target speaker reduce the costs of switching in younger and older adults. We employed event-related potential (ERP) measures and a speech perception task, in which sequences of short words were simultaneously presented by four speakers. Changes in target speaker were either unpredictable or semantically cued by a word within the target stream. Cued changes resulted in a less decreased performance than uncued changes in both age groups. The ERP analysis revealed shorter latencies in the change-related N400 and late positive complex (LPC) after cued changes, suggesting an acceleration in context updating and attention switching. Thus, both younger and older listeners used semantic cues to prepare changes in speaker setting. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

    Directory of Open Access Journals (Sweden)

    Qiaotong Su

    2016-06-01

    Full Text Available Cochlear implant (CI speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users.

  13. Speech in spinocerebellar ataxia.

    Science.gov (United States)

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. Eye'm talking to you: speakers' gaze direction modulates co-speech gesture processing in the right MTG.

    Science.gov (United States)

    Holler, Judith; Kokal, Idil; Toni, Ivan; Hagoort, Peter; Kelly, Spencer D; Özyürek, Aslı

    2015-02-01

    Recipients process information from speech and co-speech gestures, but it is currently unknown how this processing is influenced by the presence of other important social cues, especially gaze direction, a marker of communicative intent. Such cues may modulate neural activity in regions associated either with the processing of ostensive cues, such as eye gaze, or with the processing of semantic information, provided by speech and gesture. Participants were scanned (fMRI) while taking part in triadic communication involving two recipients and a speaker. The speaker uttered sentences that were and were not accompanied by complementary iconic gestures. Crucially, the speaker alternated her gaze direction, thus creating two recipient roles: addressed (direct gaze) vs unaddressed (averted gaze) recipient. The comprehension of Speech&Gesture relative to SpeechOnly utterances recruited middle occipital, middle temporal and inferior frontal gyri, bilaterally. The calcarine sulcus and posterior cingulate cortex were sensitive to differences between direct and averted gaze. Most importantly, Speech&Gesture utterances, but not SpeechOnly utterances, produced additional activity in the right middle temporal gyrus when participants were addressed. Marking communicative intent with gaze direction modulates the processing of speech-gesture utterances in cerebral areas typically associated with the semantic processing of multi-modal communicative acts. © The Author (2014). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  15. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  16. Phonetic categorisation and cue weighting in adolescents with Specific Language Impairment (SLI).

    Science.gov (United States)

    Tuomainen, Outi; Stuart, Nichola J; van der Lely, Heather K J

    2015-07-01

    This study investigates phonetic categorisation and cue weighting in adolescents and young adults with Specific Language Impairment (SLI). We manipulated two acoustic cues, vowel duration and F1 offset frequency, that signal word-final stop consonant voicing ([t] and [d]) in English. Ten individuals with SLI (14.0-21.4 years), 10 age-matched controls (CA; 14.6-21.9 years) and 10 non-matched adult controls (23.3-36.0 years) labelled synthetic CVC non-words in an identification task. The results showed that the adolescents and young adults with SLI were less consistent than controls in the identification of the good category representatives. The group with SLI also assigned less weight to vowel duration than the adult controls. However, no direct relationship between phonetic categorisation, cue weighting and language skills was found. These findings indicate that some individuals with SLI have speech perception deficits but they are not necessarily associated with oral language skills.

  17. Prosody cues word order in 7-month-old bilingual infants.

    Science.gov (United States)

    Gervain, Judit; Werker, Janet F

    2013-01-01

    A central problem in language acquisition is how children effortlessly acquire the grammar of their native language even though speech provides no direct information about underlying structure. This learning problem is even more challenging for dual language learners, yet bilingual infants master their mother tongues as efficiently as monolinguals do. Here we ask how bilingual infants succeed, investigating the particularly challenging task of learning two languages with conflicting word orders (English: eat an apple versus Japanese: ringo-wo taberu 'apple.acc eat'). We show that 7-month-old bilinguals use the characteristic prosodic cues (pitch and duration) associated with different word orders to solve this problem. Thus, the complexity of bilingual acquisition is countered by bilinguals' ability to exploit relevant cues. Moreover, the finding that perceptually available cues like prosody can bootstrap grammatical structure adds to our understanding of how and why infants acquire grammar so early and effortlessly.

  18. Zebra finches can use positional and transitional cues to distinguish vocal element strings.

    Science.gov (United States)

    Chen, Jiani; Ten Cate, Carel

    2015-08-01

    Learning sequences is of great importance to humans and non-human animals. Many motor and mental actions, such as singing in birds and speech processing in humans, rely on sequential learning. At least two mechanisms are considered to be involved in such learning. The chaining theory proposes that learning of sequences relies on memorizing the transitions between adjacent items, while the positional theory suggests that learners encode the items according to their ordinal position in the sequence. Positional learning is assumed to dominate sequential learning. However, human infants exposed to a string of speech sounds can learn transitional (chaining) cues. So far, it is not clear whether birds, an increasingly important model for examining vocal processing, can do this. In this study we use a Go-Nogo design to examine whether zebra finches can use transitional cues to distinguish artificially constructed strings of song elements. Zebra finches were trained with sequences differing in transitional and positional information and next tested with novel strings sharing positional and transitional similarities with the training strings. The results show that they can attend to both transitional and positional cues and that their sequential coding strategies can be biased toward transitional cues depending on the learning context. This article is part of a Special Issue entitled: In Honor of Jerry Hogan. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Managing the reaction effects of speech disorders on speech ...

    African Journals Online (AJOL)

    Managing the reaction effects of speech disorders on speech defectives. ... DOWNLOAD FULL TEXT Open Access DOWNLOAD FULL TEXT Subscription or Fee Access ... Unfortunately, it is the speech defectives that bear the consequences resulting from penalizing speech disorders. Consequences for punishing speech ...

  20. Stop identity cue as a cue to language identity

    Science.gov (United States)

    Castonguay, Paula Lisa

    The purpose of the present study was to determine whether language membership could potentially be cued by the acoustic-phonetic detail of word-initial stops and retained all the way through the process of lexical access to aid in language identification. Of particular interest were language-specific differences in CE and CF word-initial stops. Experiment 1 consisted of an interlingual homophone production task. The purpose of this study was to examine how word-initial stop consonants differ in terms of acoustic properties in Canadian English (CE) and Canadian French (CF) interlingual homophones. The analyses from the bilingual speakers in Experiment 1 indicate that bilinguals do produce language-specific differences in CE and CF word-initial stops, and that closure duration, voice onset time, and burst spectral SD may provide cues to language identity in CE and CF stops. Experiment 2 consisted of a Phoneme and Language Categorization task. The purpose of this study was to examine how stop identity cues, such as VOT and closure duration, influence a listener to identify word-initial stop consonants as belonging to Canadian English (CE) or Canadian French (CF). The RTs from the bilingual listeners in this study indicate that bilinguals do perceive language-specific differences in CE and CF word-initial stops, and that voice onset time may provide cues to phoneme and language membership in CE and CF stops. Experiment 3 consisted of a Phonological-Semantic priming task. The purpose of this study was to examine how subphonetic variations, such as changes in the VOT, affect lexical access. The results of Experiment 3 suggest that language-specific cues, such as VOT, affects the composition of the bilingual cohort and that the extent to which English and/or French words are activated is dependent on the language-specific cues present in a word. The findings of this study enhanced our theoretical understanding of lexical structure and lexical access in bilingual speakers