WorldWideScience

Sample records for audio-visual speech cue

  1. [Intermodal timing cues for audio-visual speech recognition].

    Science.gov (United States)

    Hashimoto, Masahiro; Kumashiro, Masaharu

    2004-06-01

    The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.

  2. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

    Directory of Open Access Journals (Sweden)

    Magnus eAlm

    2015-07-01

    Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.

  3. Audio-Visual Speech Perception: A Developmental ERP Investigation

    Science.gov (United States)

    Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…

  4. Audio-visual temporal recalibration can be constrained by content cues regardless of spatial overlap

    Directory of Open Access Journals (Sweden)

    Warrick eRoseboom

    2013-04-01

    Full Text Available It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated, and opposing, estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this was necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; Experiment 1 and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; Experiment 2 we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.

  5. Effects of Audio-Visual Information on the Intelligibility of Alaryngeal Speech

    Science.gov (United States)

    Evitts, Paul M.; Portugal, Lindsay; Van Dine, Ami; Holler, Aline

    2010-01-01

    Background: There is minimal research on the contribution of visual information on speech intelligibility for individuals with a laryngectomy (IWL). Aims: The purpose of this project was to determine the effects of mode of presentation (audio-only, audio-visual) on alaryngeal speech intelligibility. Method: Twenty-three naive listeners were…

  6. Classifying laughter and speech using audio-visual feature prediction

    NARCIS (Netherlands)

    Petridis, Stavros; Asghar, Ali; Pantic, Maja

    2010-01-01

    In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and

  7. Audio-Visual Speech in Noise Perception in Dyslexia

    Science.gov (United States)

    van Laarhoven, Thijs; Keetels, Mirjam; Schakel, Lemmy; Vroomen, Jean

    2018-01-01

    Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and adults with DD. We found that children with a…

  8. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Directory of Open Access Journals (Sweden)

    Yue Zhao

    2012-12-01

    Full Text Available Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.

  9. Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

    Directory of Open Access Journals (Sweden)

    Petar S. Aleksic

    2002-11-01

    Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0–30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.

  10. ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

    Directory of Open Access Journals (Sweden)

    D.V. Ivanko

    2016-05-01

    Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.

  11. Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap

    OpenAIRE

    Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin?Ya

    2013-01-01

    It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated, and opposing, estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possib...

  12. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  13. Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

    Directory of Open Access Journals (Sweden)

    Warrick Roseboom

    2011-04-01

    Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.

  14. The effect of combined sensory and semantic components on audio-visual speech perception in older adults

    Directory of Open Access Journals (Sweden)

    Corrina eMaguinness

    2011-12-01

    Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.

  15. Robust audio-visual speech recognition under noisy audio-video conditions.

    Science.gov (United States)

    Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji

    2014-02-01

    This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.

  16. Audio-visual speech perception in prelingually deafened Japanese children following sequential bilateral cochlear implantation.

    Science.gov (United States)

    Yamamoto, Ryosuke; Naito, Yasushi; Tona, Risa; Moroto, Saburo; Tamaya, Rinko; Fujiwara, Keizo; Shinohara, Shogo; Takebayashi, Shinji; Kikuchi, Masahiro; Michida, Tetsuhiko

    2017-11-01

    An effect of audio-visual (AV) integration is observed when the auditory and visual stimuli are incongruent (the McGurk effect). In general, AV integration is helpful especially in subjects wearing hearing aids or cochlear implants (CIs). However, the influence of AV integration on spoken word recognition in individuals with bilateral CIs (Bi-CIs) has not been fully investigated so far. In this study, we investigated AV integration in children with Bi-CIs. The study sample included thirty one prelingually deafened children who underwent sequential bilateral cochlear implantation. We assessed their responses to congruent and incongruent AV stimuli with three CI-listening modes: only the 1st CI, only the 2nd CI, and Bi-CIs. The responses were assessed in the whole group as well as in two sub-groups: a proficient group (syllable intelligibility ≥80% with the 1st CI) and a non-proficient group (syllable intelligibility effect in each of the three CI-listening modes. AV integration responses were observed in a subset of incongruent AV stimuli, and the patterns observed with the 1st CI and with Bi-CIs were similar. In the proficient group, the responses with the 2nd CI were not significantly different from those with the 1st CI whereas in the non-proficient group the responses with the 2nd CI were driven by visual stimuli more than those with the 1st CI. Our results suggested that prelingually deafened Japanese children who underwent sequential bilateral cochlear implantation exhibit AV integration abilities, both in monaural listening as well as in binaural listening. We also observed a higher influence of visual stimuli on speech perception with the 2nd CI in the non-proficient group, suggesting that Bi-CIs listeners with poorer speech recognition rely on visual information more compared to the proficient subjects to compensate for poorer auditory input. Nevertheless, poorer quality auditory input with the 2nd CI did not interfere with AV integration with binaural

  17. Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

    Directory of Open Access Journals (Sweden)

    Alan James Power

    2012-07-01

    Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling

  18. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study.

    Science.gov (United States)

    Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our

  19. The Effect of Onset Asynchrony in Audio Visual Speech and the Uncanny Valley in Virtual Characters

    DEFF Research Database (Denmark)

    Tinwell, Angela; Grimshaw, Mark; Abdel Nabi, Deborah

    2015-01-01

    This study investigates if the Uncanny Valley phenomenon is increased for realistic, human-like characters with an asynchrony of lip movement during speech. An experiment was conducted in which 113 participants rated, a human and a realistic, talking-head, human-like, virtual character over a ran...

  20. Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

    Science.gov (United States)

    D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

    2016-08-01

    Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs.

    Science.gov (United States)

    Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

    2013-01-01

    Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.

  2. Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

    Science.gov (United States)

    Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

    2011-01-01

    Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…

  3. Atypical audio-visual speech perception and McGurk effects in children with specific language impairment.

    Science.gov (United States)

    Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric

    2014-01-01

    Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed.

  4. Perception of audio-visual speech synchrony in Spanish-speaking children with and without specific language impairment.

    Science.gov (United States)

    Pons, Ferran; Andreu, Llorenç; Sanz-Torrent, Monica; Buil-Legaz, Lucía; Lewkowicz, David J

    2013-06-01

    Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666 ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component preceded [corrected] the visual component. None of the groups perceived an audiovisual asynchrony of 366 ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception.

  5. Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.

    Directory of Open Access Journals (Sweden)

    Kirsten E Smayda

    Full Text Available Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35 and thirty-three older adults (ages 60-90 to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger

  6. Basic to Applied Research: The Benefits of Audio-Visual Speech Perception Research in Teaching Foreign Languages

    Science.gov (United States)

    Erdener, Dogu

    2016-01-01

    Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…

  7. Cortical Integration of Audio-Visual Information

    Science.gov (United States)

    Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.

    2013-01-01

    We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442

  8. Spatio-temporal distribution of brain activity associated with audio-visually congruent and incongruent speech and the McGurk Effect.

    Science.gov (United States)

    Pratt, Hillel; Bleich, Naomi; Mittelman, Nomi

    2015-11-01

    Spatio-temporal distributions of cortical activity to audio-visual presentations of meaningless vowel-consonant-vowels and the effects of audio-visual congruence/incongruence, with emphasis on the McGurk effect, were studied. The McGurk effect occurs when a clearly audible syllable with one consonant, is presented simultaneously with a visual presentation of a face articulating a syllable with a different consonant and the resulting percept is a syllable with a consonant other than the auditorily presented one. Twenty subjects listened to pairs of audio-visually congruent or incongruent utterances and indicated whether pair members were the same or not. Source current densities of event-related potentials to the first utterance in the pair were estimated and effects of stimulus-response combinations, brain area, hemisphere, and clarity of visual articulation were assessed. Auditory cortex, superior parietal cortex, and middle temporal cortex were the most consistently involved areas across experimental conditions. Early (visual cortex. Clarity of visual articulation impacted activity in secondary visual cortex and Wernicke's area. McGurk perception was associated with decreased activity in primary and secondary auditory cortices and Wernicke's area before 100 msec, increased activity around 100 msec which decreased again around 180 msec. Activity in Broca's area was unaffected by McGurk perception and was only increased to congruent audio-visual stimuli 30-70 msec following consonant onset. The results suggest left hemisphere prominence in the effects of stimulus and response conditions on eight brain areas involved in dynamically distributed parallel processing of audio-visual integration. Initially (30-70 msec) subcortical contributions to auditory cortex, superior parietal cortex, and middle temporal cortex occur. During 100-140 msec, peristriate visual influences and Wernicke's area join in the processing. Resolution of incongruent audio-visual inputs is then

  9. Speech cues contribute to audiovisual spatial integration.

    Directory of Open Access Journals (Sweden)

    Christopher W Bishop

    Full Text Available Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.

  10. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

    Science.gov (United States)

    Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu

    2018-05-01

    Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.

  11. Preschoolers Benefit from Visually Salient Speech Cues

    Science.gov (United States)

    Lalonde, Kaylah; Holt, Rachael Frush

    2015-01-01

    Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…

  12. Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

    Directory of Open Access Journals (Sweden)

    José Antonio Palao Errando

    2007-10-01

    Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.

  13. Audio-visual identification of place of articulation and voicing in white and babble noise.

    Science.gov (United States)

    Alm, Magnus; Behne, Dawn M; Wang, Yue; Eg, Ragnhild

    2009-07-01

    Research shows that noise and phonetic attributes influence the degree to which auditory and visual modalities are used in audio-visual speech perception (AVSP). Research has, however, mainly focused on white noise and single phonetic attributes, thus neglecting the more common babble noise and possible interactions between phonetic attributes. This study explores whether white and babble noise differentially influence AVSP and whether these differences depend on phonetic attributes. White and babble noise of 0 and -12 dB signal-to-noise ratio were added to congruent and incongruent audio-visual stop consonant-vowel stimuli. The audio (A) and video (V) of incongruent stimuli differed either in place of articulation (POA) or voicing. Responses from 15 young adults show that, compared to white noise, babble resulted in more audio responses for POA stimuli, and fewer for voicing stimuli. Voiced syllables received more audio responses than voiceless syllables. Results can be attributed to discrepancies in the acoustic spectra of both the noise and speech target. Voiced consonants may be more auditorily salient than voiceless consonants which are more spectrally similar to white noise. Visual cues contribute to identification of voicing, but only if the POA is visually salient and auditorily susceptible to the noise type.

  14. Fusion for Audio-Visual Laughter Detection

    NARCIS (Netherlands)

    Reuderink, B.

    2007-01-01

    Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed

  15. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English

    Science.gov (United States)

    Russo, Frank A.

    2018-01-01

    The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were each rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity and test-retest intrarater reliability were reported. Corrected accuracy and composite "goodness" measures are presented to assist researchers in the selection of stimuli. All recordings are made freely available under a Creative Commons license and can be downloaded at https://doi.org/10.5281/zenodo.1188976. PMID:29768426

  16. Computationally Efficient Clustering of Audio-Visual Meeting Data

    Science.gov (United States)

    Hung, Hayley; Friedland, Gerald; Yeo, Chuohao

    This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.

  17. Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech

    Science.gov (United States)

    Pilling, Michael; Thomas, Sharon

    2011-01-01

    Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…

  18. Joint evaluation of communication quality and user experience in an audio-visual virtual reality meeting

    DEFF Research Database (Denmark)

    Møller, Anders Kalsgaard; Hoffmann, Pablo F.; Carrozzino, Marcello

    2013-01-01

    The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...

  19. Audio-visual Classification and Fusion of Spontaneous Affect Data in Likelihood Space

    NARCIS (Netherlands)

    Nicolaou, Mihalis A.; Gunes, Hatice; Pantic, Maja

    2010-01-01

    This paper focuses on audio-visual (using facial expression, shoulder and audio cues) classification of spontaneous affect, utilising generative models for classification (i) in terms of Maximum Likelihood Classification with the assumption that the generative model structure in the classifier is

  20. Consequence of audio visual collection in school libraries

    OpenAIRE

    Kuri, Ramesh

    2016-01-01

    The collection of Audio-Visual in library plays important role in teaching and learning. The importance of audio visual (AV) technology in education should not be underestimated. If audio-visual collection in library is carefully planned and designed, it can provide a rich learning environment. In this article, an author discussed the consequences of Audio-Visual collection in libraries especially for students of school library

  1. Acoustic cues identifying phonetic transitions for speech segmentation

    CSIR Research Space (South Africa)

    Van Niekerk, DR

    2008-11-01

    Full Text Available The quality of corpus-based text-to-speech (TTS) systems depends strongly on the consistency of boundary placements during phonetic alignments. Expert human transcribers use visually represented acoustic cues in order to consistently place...

  2. Segmentation cues in conversational speech: Robust semantics and fragile phonotactics

    Directory of Open Access Journals (Sweden)

    Laurence eWhite

    2012-10-01

    Full Text Available Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs hypoarticulation (H&H – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs read, using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylised landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech.Independent of speech style, we found an interaction between cue valence (favourable/unfavourable and cue type (phonotactics/semantics. Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behaviour. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically-based cues in the segmentation of natural

  3. Audio visual information materials for risk communication

    International Nuclear Information System (INIS)

    Gunji, Ikuko; Tabata, Rimiko; Ohuchi, Naomi

    2005-07-01

    Japan Nuclear Cycle Development Institute (JNC), Tokai Works set up the Risk Communication Study Team in January, 2001 to promote mutual understanding between the local residents and JNC. The Team has studied risk communication from various viewpoints and developed new methods of public relations which are useful for the local residents' risk perception toward nuclear issues. We aim to develop more effective risk communication which promotes a better mutual understanding of the local residents, by providing the risk information of the nuclear fuel facilities such a Reprocessing Plant and other research and development facilities. We explain the development process of audio visual information materials which describe our actual activities and devices for the risk management in nuclear fuel facilities, and our discussion through the effectiveness measurement. (author)

  4. Modeling the Development of Audiovisual Cue Integration in Speech Perception.

    Science.gov (United States)

    Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

    2017-03-21

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

  5. Decision-level fusion for audio-visual laughter detection

    NARCIS (Netherlands)

    Reuderink, B.; Poel, M.; Truong, K.; Poppe, R.; Pantic, M.

    2008-01-01

    Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is

  6. Decision-Level Fusion for Audio-Visual Laughter Detection

    NARCIS (Netherlands)

    Reuderink, B.; Poel, Mannes; Truong, Khiet Phuong; Poppe, Ronald Walter; Pantic, Maja; Popescu-Belis, Andrei; Stiefelhagen, Rainer

    Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laugh- ter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio- visual laughter detection is

  7. Haptic and Audio-visual Stimuli: Enhancing Experiences and Interaction

    NARCIS (Netherlands)

    Nijholt, Antinus; Dijk, Esko O.; Lemmens, Paul M.C.; Luitjens, S.B.

    2010-01-01

    The intention of the symposium on Haptic and Audio-visual stimuli at the EuroHaptics 2010 conference is to deepen the understanding of the effect of combined Haptic and Audio-visual stimuli. The knowledge gained will be used to enhance experiences and interactions in daily life. To this end, a

  8. The contribution of dynamic visual cues to audiovisual speech perception.

    Science.gov (United States)

    Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

    2015-08-01

    Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Cross-language differences in cue use for speech segmentation

    NARCIS (Netherlands)

    Tyler, M.D.; Cutler, A.

    2009-01-01

    Two artificial-language learning experiments directly compared English, French, and Dutch listeners' use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable

  10. Audio-Visual Perception System for a Humanoid Robotic Head

    Directory of Open Access Journals (Sweden)

    Raquel Viciana-Abad

    2014-05-01

    Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  11. Should visual speech cues (speechreading) be considered when fitting hearing aids?

    Science.gov (United States)

    Grant, Ken

    2002-05-01

    When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.

  12. Proper Use of Audio-Visual Aids: Essential for Educators.

    Science.gov (United States)

    Dejardin, Conrad

    1989-01-01

    Criticizes educators as the worst users of audio-visual aids and among the worst public speakers. Offers guidelines for the proper use of an overhead projector and the development of transparencies. (DMM)

  13. Auditory and audio-visual processing in patients with cochlear, auditory brainstem, and auditory midbrain implants: An EEG study.

    Science.gov (United States)

    Schierholz, Irina; Finke, Mareike; Kral, Andrej; Büchner, Andreas; Rach, Stefan; Lenarz, Thomas; Dengler, Reinhard; Sandmann, Pascale

    2017-04-01

    There is substantial variability in speech recognition ability across patients with cochlear implants (CIs), auditory brainstem implants (ABIs), and auditory midbrain implants (AMIs). To better understand how this variability is related to central processing differences, the current electroencephalography (EEG) study compared hearing abilities and auditory-cortex activation in patients with electrical stimulation at different sites of the auditory pathway. Three different groups of patients with auditory implants (Hannover Medical School; ABI: n = 6, CI: n = 6; AMI: n = 2) performed a speeded response task and a speech recognition test with auditory, visual, and audio-visual stimuli. Behavioral performance and cortical processing of auditory and audio-visual stimuli were compared between groups. ABI and AMI patients showed prolonged response times on auditory and audio-visual stimuli compared with NH listeners and CI patients. This was confirmed by prolonged N1 latencies and reduced N1 amplitudes in ABI and AMI patients. However, patients with central auditory implants showed a remarkable gain in performance when visual and auditory input was combined, in both speech and non-speech conditions, which was reflected by a strong visual modulation of auditory-cortex activation in these individuals. In sum, the results suggest that the behavioral improvement for audio-visual conditions in central auditory implant patients is based on enhanced audio-visual interactions in the auditory cortex. Their findings may provide important implications for the optimization of electrical stimulation and rehabilitation strategies in patients with central auditory prostheses. Hum Brain Mapp 38:2206-2225, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  14. Discriminating individually considerate and authoritarian leaders by speech activity cues

    OpenAIRE

    Feese, Sebastian; Muaremi, Amir; Arnrich, Bert; Tröster, Gerhard; Meyer, Bertolt; Jonas, Klaus

    2011-01-01

    Effective leadership can increase team performance, however up to now the influence of specific micro-level behavioral patterns on team performance is unclear. At the same time, current behavior observation methods in social psychology mostly rely on manual video annotations that impede research. In our work, we follow a sensor-based approach to automatically extract speech activity cues to discriminate individualized considerate from authoritarian leadership. On a subset of 35 selected...

  15. Imagination and Modern Audio Visual Form

    Directory of Open Access Journals (Sweden)

    Ana Đurković

    2017-09-01

    Full Text Available Through three episodes Archetype of modern fairy tales, the mysterious world of fantasy and reality,tell as a serious story about archetypes, symbols, knowledge of good and evil. Rts editor: Natasa Neskovic Written and directed by: Suncica Jergovic Editing: Ana Djurkovic How to illuminate concept of phantasy and affective factors in our imagination a priori something so imaginary, by their genetic provenance, such as a movie scene, or digital picture and sound. You can not always avoid the association to a valid phrase of arnhajm’s truth: mass age -massage: the medium is the message. In elementary and tersely definition of „the shot“ from Plaževsky film language there is term for „le cadre“, however these are selected bits of reality, immanent frame that contains the individual act of images divided of the continent’s view of reality, handling the specific code of semantic value, when its’s imaginative, of course, by aesthetic categories and evaluations. In this type of positive simulacrum, it can not be better segment for the current thinking about the limits of imagination and truth in contemporary media, and contemporary global environment, than the original audio-visual forms through whose prism we search throught a fairy tale in a same time myth and imagination as well as exploring its overall impact on the personality. Everything can be a fairy tale, even false, amoral platitudes politicized by political lobbies in a contemporary existing power sistems, but this is no fairy tale authenticity in it, or creative act, nor humanity and artificial and historical entity of a man that is always present in the ethical effort of a true artist. So, we are investigating the conditions of creative images, modalities of audiovisual media in film language,and it is the archetype of the fairy tale, which, with its psychodynamics still exists and which is removed when the modern man is tired of lies and simulations during his global

  16. The Fungible Audio-Visual Mapping and its Experience

    Directory of Open Access Journals (Sweden)

    Adriana Sa

    2014-12-01

    Full Text Available This article draws a perceptual approach to audio-visual mapping. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping. Our aim here is to glean its constitution and aspect. We will report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, after which we pose quantitative and qualitative questions regarding their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole. 

  17. Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition

    NARCIS (Netherlands)

    Jesse, A.; McQueen, J.M.

    2014-01-01

    Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes

  18. Use of amplitude modulation cues recovered from frequency modulation for cochlear implant users when original speech cues are severely degraded.

    Science.gov (United States)

    Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T

    2014-06-01

    Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.

  19. A conceptual framework for audio-visual museum media

    DEFF Research Database (Denmark)

    Kirkedahl Lysholm Nielsen, Mikkel

    2017-01-01

    In today's history museums, the past is communicated through many other means than original artefacts. This interdisciplinary and theoretical article suggests a new approach to studying the use of audio-visual media, such as film, video and related media types, in a museum context. The centre...... and museum studies, existing case studies, and real life observations, the suggested framework instead stress particular characteristics of contextual use of audio-visual media in history museums, such as authenticity, virtuality, interativity, social context and spatial attributes of the communication...

  20. Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration.

    Science.gov (United States)

    Stropahl, Maren; Debener, Stefan

    2017-01-01

    There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI) users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users ( n  = 18), untreated mild to moderately hearing impaired individuals (n = 18) and normal hearing controls ( n  = 17). Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the auditory system

  1. Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration

    Directory of Open Access Journals (Sweden)

    Maren Stropahl

    2017-01-01

    Full Text Available There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users (n = 18, untreated mild to moderately hearing impaired individuals (n = 18 and normal hearing controls (n = 17. Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the

  2. Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus.

    Directory of Open Access Journals (Sweden)

    Mary Flaherty

    Full Text Available The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated, Passive speech exposure (regular exposure to human speech, and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.

  3. Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition

    OpenAIRE

    Jesse, A.; McQueen, J.

    2014-01-01

    Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker...

  4. Crossmodal and incremental perception of audiovisual cues to emotional speech.

    Science.gov (United States)

    Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

    2010-01-01

    In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores

  5. Audio-visual materials usage preference among agricultural ...

    African Journals Online (AJOL)

    It was found that respondents preferred radio, television, poster, advert, photographs, specimen, bulletin, magazine, cinema, videotape, chalkboard, and bulletin board as audio-visual materials for extension work. These are the materials that can easily be manipulated and utilized for extension work. Nigerian Journal of ...

  6. Audio-Visual Aid in Teaching "Fatty Liver"

    Science.gov (United States)

    Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

    2016-01-01

    Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various…

  7. Market potential for interactive audio-visual media

    NARCIS (Netherlands)

    Leurdijk, A.; Limonard, S.

    2005-01-01

    NM2 (New Media for a New Millennium) develops tools for interactive, personalised and non-linear audio-visual content that will be tested in seven pilot productions. This paper looks at the market potential for these productions from a technological, a business and a users' perspective. It shows

  8. Computationally efficient clustering of audio-visual meeting data

    NARCIS (Netherlands)

    Hung, H.; Friedland, G.; Yeo, C.; Shao, L.; Shan, C.; Luo, J.; Etoh, M.

    2010-01-01

    This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors,

  9. Voice activity detection using audio-visual information

    DEFF Research Database (Denmark)

    Petsatodis, Theodore; Pnevmatikakis, Aristodemos; Boukis, Christos

    2009-01-01

    An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post...

  10. Selected Audio-Visual Materials for Consumer Education. [New Version.

    Science.gov (United States)

    Johnston, William L.

    Ninety-two films, filmstrips, multi-media kits, slides, and audio cassettes, produced between 1964 and 1974, are listed in this selective annotated bibliography on consumer education. The major portion of the bibliography is devoted to films and filmstrips. The main topics of the audio-visual materials include purchasing, advertising, money…

  11. Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Grossmann, Tobias

    2015-01-01

    Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…

  12. Documentary management of the sport audio-visual information in the generalist televisions

    OpenAIRE

    Jorge Caldera Serrano; Felipe Alonso

    2007-01-01

    The management of the sport audio-visual documentation of the Information Systems of the state, zonal and local chains is analyzed within the framework. For it it is made makes a route by the documentary chain that makes the sport audio-visual information with the purpose of being analyzing each one of the parameters, showing therefore a series of recommendations and norms for the preparation of the sport audio-visual registry. Evidently the audio-visual sport documentation difference i...

  13. Audio-visual feedback improves the BCI performance in the navigational control of a humanoid robot

    Directory of Open Access Journals (Sweden)

    Emmanuele eTidoni

    2014-06-01

    Full Text Available Advancement in brain computer interfaces (BCI technology allows people to actively interact in the world through surrogates. Controlling real humanoid robots using BCI as intuitively as we control our body represents a challenge for current research in robotics and neuroscience. In order to successfully interact with the environment the brain integrates multiple sensory cues to form a coherent representation of the world. Cognitive neuroscience studies demonstrate that multisensory integration may imply a gain with respect to a single modality and ultimately improve the overall sensorimotor performance. For example, reactivity to simultaneous visual and auditory stimuli may be higher than to the sum of the same stimuli delivered in isolation or in temporal sequence. Yet, knowledge about whether audio-visual integration may improve the control of a surrogate is meager. To explore this issue, we provided human footstep sounds as audio feedback to BCI users while controlling a humanoid robot. Participants were asked to steer their robot surrogate and perform a pick-and-place task through BCI-SSVEPs. We found that audio-visual synchrony between footsteps sound and actual humanoid’s walk reduces the time required for steering the robot. Thus, auditory feedback congruent with the humanoid actions may improve motor decisions of the BCI’s user and help in the feeling of control over it. Our results shed light on the possibility to increase robot’s control through the combination of multisensory feedback to a BCI user.

  14. Perception of Speech Modulation Cues by 6-Month-Old Infants

    Science.gov (United States)

    Cabrera, Laurianne; Bertoncini, Josiane; Lorenzi, Christian

    2013-01-01

    Purpose: The capacity of 6-month-old infants to discriminate a voicing contrast (/aba/--/apa/) on the basis of "amplitude modulation (AM) cues" and "frequency modulation (FM) cues" was evaluated. Method: Several vocoded speech conditions were designed to either degrade FM cues in 4 or 32 bands or degrade AM in 32 bands. Infants…

  15. Modular Sensor Environment : Audio Visual Industry Monitoring Applications

    OpenAIRE

    Guillot, Calvin

    2017-01-01

    This work was made for Electro Waves Oy. The company specializes in Audio-visual services and interactive systems. The purpose of this work is to design and implement a modular sensor environment for the company, which will be used for developing automated systems. This thesis begins with an introduction to sensor systems and their different topologies. It is followed by an introduction to the technologies used in this project. The system is divided in three parts. The client, tha...

  16. Voice over: Audio-visual congruency and content recall in the gallery setting.

    Science.gov (United States)

    Fairhurst, Merle T; Scott, Minnie; Deroy, Ophelia

    2017-01-01

    Experimental research has shown that pairs of stimuli which are congruent and assumed to 'go together' are recalled more effectively than an item presented in isolation. Will this multisensory memory benefit occur when stimuli are richer and longer, in an ecological setting? In the present study, we focused on an everyday situation of audio-visual learning and manipulated the relationship between audio guide tracks and viewed portraits in the galleries of the Tate Britain. By varying the gender and narrative style of the voice-over, we examined how the perceived congruency and assumed unity of the audio guide track with painted portraits affected subsequent recall. We show that tracks perceived as best matching the viewed portraits led to greater recall of both sensory and linguistic content. We provide the first evidence that manipulating crossmodal congruence and unity assumptions can effectively impact memory in a multisensory ecological setting, even in the absence of precise temporal alignment between sensory cues.

  17. The role of reverberation-related binaural cues in the externalization of speech

    DEFF Research Database (Denmark)

    Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

    2015-01-01

    The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners’ ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones....... The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient...

  18. Extraction of Information of Audio-Visual Contents

    Directory of Open Access Journals (Sweden)

    Carlos Aguilar

    2011-10-01

    Full Text Available In this article we show how it is possible to use Channel Theory (Barwise and Seligman, 1997 for modeling the process of information extraction realized by audiences of audio-visual contents. To do this, we rely on the concepts pro- posed by Channel Theory and, especially, its treatment of representational systems. We then show how the information that an agent is capable of extracting from the content depends on the number of channels he is able to establish between the content and the set of classifications he is able to discriminate. The agent can endeavor the extraction of information through these channels from the totality of content; however, we discuss the advantages of extracting from its constituents in order to obtain a greater number of informational items that represent it. After showing how the extraction process is endeavored for each channel, we propose a method of representation of all the informative values an agent can obtain from a content using a matrix constituted by the channels the agent is able to establish on the content (source classifications, and the ones he can understand as individual (destination classifications. We finally show how this representation allows reflecting the evolution of the informative items through the evolution of audio-visual content.

  19. Automatic summarization of soccer highlights using audio-visual descriptors.

    Science.gov (United States)

    Raventós, A; Quijada, R; Torres, Luis; Tarrés, Francesc

    2015-01-01

    Automatic summarization generation of sports video content has been object of great interest for many years. Although semantic descriptions techniques have been proposed, many of the approaches still rely on low-level video descriptors that render quite limited results due to the complexity of the problem and to the low capability of the descriptors to represent semantic content. In this paper, a new approach for automatic highlights summarization generation of soccer videos using audio-visual descriptors is presented. The approach is based on the segmentation of the video sequence into shots that will be further analyzed to determine its relevance and interest. Of special interest in the approach is the use of the audio information that provides additional robustness to the overall performance of the summarization system. For every video shot a set of low and mid level audio-visual descriptors are computed and lately adequately combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. A variety of results are presented with real soccer video sequences that prove the validity of the approach.

  20. Treatment of Speech Anxiety by Cue-Controlled Relaxation and Desensitization with Professional and Paraprofessional Counselors

    Science.gov (United States)

    Russell, Richard K.; Wise, Fred

    1976-01-01

    This investigation compared the relative effectiveness of group-administered cue-controlled relaxation and group systematic desensitization in the treatment of speech anxiety. Also examined was the role of professional versus paraprofessional counselors in implementing the treatment program. A description of the cue-controlled relaxation technique…

  1. Effects of virtual speaker density and room reverberation on spatiotemporal thresholds of audio-visual motion coherence.

    Directory of Open Access Journals (Sweden)

    Narayan Sankaran

    Full Text Available The present study examined the effects of spatial sound-source density and reverberation on the spatiotemporal window for audio-visual motion coherence. Three different acoustic stimuli were generated in Virtual Auditory Space: two acoustically "dry" stimuli via the measurement of anechoic head-related impulse responses recorded at either 1° or 5° spatial intervals (Experiment 1, and a reverberant stimulus rendered from binaural room impulse responses recorded at 5° intervals in situ in order to capture reverberant acoustics in addition to head-related cues (Experiment 2. A moving visual stimulus with invariant localization cues was generated by sequentially activating LED's along the same radial path as the virtual auditory motion. Stimuli were presented at 25°/s, 50°/s and 100°/s with a random spatial offset between audition and vision. In a 2AFC task, subjects made a judgment of the leading modality (auditory or visual. No significant differences were observed in the spatial threshold based on the point of subjective equivalence (PSE or the slope of psychometric functions (β across all three acoustic conditions. Additionally, both the PSE and β did not significantly differ across velocity, suggesting a fixed spatial window of audio-visual separation. Findings suggest that there was no loss in spatial information accompanying the reduction in spatial cues and reverberation levels tested, and establish a perceptual measure for assessing the veracity of motion generated from discrete locations and in echoic environments.

  2. Training the Brain to Weight Speech Cues Differently: A Study of Finnish Second-language Users of English

    Science.gov (United States)

    Ylinen, Sari; Uther, Maria; Latvala, Antti; Vepsalainen, Sara; Iverson, Paul; Akahane-Yamada, Reiko; Naatanen, Risto

    2010-01-01

    Foreign-language learning is a prime example of a task that entails perceptual learning. The correct comprehension of foreign-language speech requires the correct recognition of speech sounds. The most difficult speech-sound contrasts for foreign-language learners often are the ones that have multiple phonetic cues, especially if the cues are…

  3. Phonetic Category Cues in Adult-Directed Speech: Evidence from Three Languages with Distinct Vowel Characteristics

    Science.gov (United States)

    Pons, Ferran; Biesanz, Jeremy C.; Kajikawa, Sachiyo; Fais, Laurel; Narayan, Chandan R.; Amano, Shigeaki; Werker, Janet F.

    2012-01-01

    Using an artificial language learning manipulation, Maye, Werker, and Gerken (2002) demonstrated that infants' speech sound categories change as a function of the distributional properties of the input. In a recent study, Werker et al. (2007) showed that Infant-directed Speech (IDS) input contains reliable acoustic cues that support distributional…

  4. Contribution of Prosody in Audio-Visual Integration to Emotional Perception of Virtual Characters

    Directory of Open Access Journals (Sweden)

    Ekaterina Volkova

    2011-10-01

    Full Text Available Recent technology provides us with realistic looking virtual characters. Motion capture and elaborate mathematical models supply data for natural looking, controllable facial and bodily animations. With the help of computational linguistics and artificial intelligence, we can automatically assign emotional categories to appropriate stretches of text for a simulation of those social scenarios where verbal communication is important. All this makes virtual characters a valuable tool for creation of versatile stimuli for research on the integration of emotion information from different modalities. We conducted an audio-visual experiment to investigate the differential contributions of emotional speech and facial expressions on emotion identification. We used recorded and synthesized speech as well as dynamic virtual faces, all enhanced for seven emotional categories. The participants were asked to recognize the prevalent emotion of paired faces and audio. Results showed that when the voice was recorded, the vocalized emotion influenced participants' emotion identification more than the facial expression. However, when the voice was synthesized, facial expression influenced participants' emotion identification more than vocalized emotion. Additionally, individuals did worse on identifying either the facial expression or vocalized emotion when the voice was synthesized. Our experimental method can help to determine how to improve synthesized emotional speech.

  5. The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

    Science.gov (United States)

    Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

    2017-01-01

    We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework

  6. The role of continuous low-frequency harmonicity cues for interrupted speech perception in bimodal hearing.

    Science.gov (United States)

    Oh, Soo Hee; Donaldson, Gail S; Kong, Ying-Yee

    2016-04-01

    Low-frequency acoustic cues have been shown to enhance speech perception by cochlear-implant users, particularly when target speech occurs in a competing background. The present study examined the extent to which a continuous representation of low-frequency harmonicity cues contributes to bimodal benefit in simulated bimodal listeners. Experiment 1 examined the benefit of restoring a continuous temporal envelope to the low-frequency ear while the vocoder ear received a temporally interrupted stimulus. Experiment 2 examined the effect of providing continuous harmonicity cues in the low-frequency ear as compared to restoring a continuous temporal envelope in the vocoder ear. Findings indicate that bimodal benefit for temporally interrupted speech increases when continuity is restored to either or both ears. The primary benefit appears to stem from the continuous temporal envelope in the low-frequency region providing additional phonetic cues related to manner and F1 frequency; a secondary contribution is provided by low-frequency harmonicity cues when a continuous representation of the temporal envelope is present in the low-frequency, or both ears. The continuous temporal envelope and harmonicity cues of low-frequency speech are thought to support bimodal benefit by facilitating identification of word and syllable boundaries, and by restoring partial phonetic cues that occur during gaps in the temporally interrupted stimulus.

  7. Audio Visual Media Components in Educational Game for Elementary Students

    Directory of Open Access Journals (Sweden)

    Meilani Hartono

    2016-12-01

    Full Text Available The purpose of this research was to review and implement interactive audio visual media used in an educational game to improve elementary students’ interest in learning mathematics. The game was developed for desktop platform. The art of the game was set as 2D cartoon art with animation and audio in order to make students more interest. There were four mini games developed based on the researches on mathematics study. Development method used was Multimedia Development Life Cycle (MDLC that consists of requirement, design, development, testing, and implementation phase. Data collection methods used are questionnaire, literature study, and interview. The conclusion is elementary students interest with educational game that has fun and active (moving objects, with fast tempo of music, and carefree color like blue. This educational game is hoped to be an alternative teaching tool combined with conventional teaching method.

  8. Changes of the Prefrontal EEG (Electroencephalogram) Activities According to the Repetition of Audio-Visual Learning.

    Science.gov (United States)

    Kim, Yong-Jin; Chang, Nam-Kee

    2001-01-01

    Investigates the changes of neuronal response according to a four time repetition of audio-visual learning. Obtains EEG data from the prefrontal (Fp1, Fp2) lobe from 20 subjects at the 8th grade level. Concludes that the habituation of neuronal response shows up in repetitive audio-visual learning and brain hemisphericity can be changed by…

  9. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

    Science.gov (United States)

    Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

    2015-01-01

    Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.

  10. Perception of the Multisensory Coherence of Fluent Audiovisual Speech in Infancy: Its Emergence & the Role of Experience

    Science.gov (United States)

    Lewkowicz, David J.; Minar, Nicholas J.; Tift, Amy H.; Brandon, Melissa

    2014-01-01

    To investigate the developmental emergence of the ability to perceive the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8–10, and 12–14 month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor the 8–10 month-old infants exhibited audio-visual matching in that neither group exhibited greater looking at the matching monologue. In contrast, the 12–14 month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, they perceived the multisensory coherence of native-language monologues earlier in the test trials than of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12–14 month olds did not depend on audio-visual synchrony whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audio-visual synchrony cues are more important in the perception of the multisensory coherence of non-native than native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. PMID:25462038

  11. Audio-visual assistance in co-creating transition knowledge

    Science.gov (United States)

    Hezel, Bernd; Broschkowski, Ephraim; Kropp, Jürgen P.

    2013-04-01

    Earth system and climate impact research results point to the tremendous ecologic, economic and societal implications of climate change. Specifically people will have to adopt lifestyles that are very different from those they currently strive for in order to mitigate severe changes of our known environment. It will most likely not suffice to transfer the scientific findings into international agreements and appropriate legislation. A transition is rather reliant on pioneers that define new role models, on change agents that mainstream the concept of sufficiency and on narratives that make different futures appealing. In order for the research community to be able to provide sustainable transition pathways that are viable, an integration of the physical constraints and the societal dynamics is needed. Hence the necessary transition knowledge is to be co-created by social and natural science and society. To this end, the Climate Media Factory - in itself a massively transdisciplinary venture - strives to provide an audio-visual connection between the different scientific cultures and a bi-directional link to stake holders and society. Since methodology, particular language and knowledge level of the involved is not the same, we develop new entertaining formats on the basis of a "complexity on demand" approach. They present scientific information in an integrated and entertaining way with different levels of detail that provide entry points to users with different requirements. Two examples shall illustrate the advantages and restrictions of the approach.

  12. Audio-Visual Integration Modifies Emotional Judgment in Music

    Directory of Open Access Journals (Sweden)

    Shen-Yuan Su

    2011-10-01

    Full Text Available The conventional view that perceived emotion in music is derived mainly from auditory signals has led to neglect of the contribution of visual image. In this study, we manipulated mode (major vs. minor and examined the influence of a video image on emotional judgment in music. Melodies in either major or minor mode were controlled for tempo and rhythm and played to the participants. We found that Taiwanese participants, like Westerners, judged major melodies as expressing positive, and minor melodies negative, emotions. The major or minor melodies were then paired with video images of the singers, which were either emotionally congruent or incongruent with their modes. Results showed that participants perceived stronger positive or negative emotions with congruent audio-visual stimuli. Compared to listening to music alone, stronger emotions were perceived when an emotionally congruent video image was added and weaker emotions were perceived when an incongruent image was added. We therefore demonstrate that mode is important to perceive the emotional valence in music and that treating musical art as a purely auditory event might lose the enhanced emotional strength perceived in music, since going to a concert may lead to stronger perceived emotion than listening to the CD at home.

  13. Audio-visual aid in teaching "fatty liver".

    Science.gov (United States)

    Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

    2016-05-06

    Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various concepts of the topic, while keeping in view Mayer's and Ellaway guidelines for multimedia presentation. A pre-post test study on subject knowledge was conducted for 100 students with the video shown as intervention. A retrospective pre study was conducted as a survey which inquired about students understanding of the key concepts of the topic and a feedback on our video was taken. Students performed significantly better in the post test (mean score 8.52 vs. 5.45 in pre-test), positively responded in the retrospective pre-test and gave a positive feedback for our video presentation. Well-designed multimedia tools can aid in cognitive processing and enhance working memory capacity as shown in our study. In times when "smart" device penetration is high, information and communication tools in medical education, which can act as essential aid and not as replacement for traditional curriculums, can be beneficial to the students. © 2015 by The International Union of Biochemistry and Molecular Biology, 44:241-245, 2016. © 2015 The International Union of Biochemistry and Molecular Biology.

  14. Pengaruh layanan informasi bimbingan konseling berbantuan media audio visual terhadap empati siswa

    Directory of Open Access Journals (Sweden)

    Rita Kumalasari

    2017-05-01

    The results of research effective of audio-visual media counseling techniques effective and practical to increase the empathy of students are rational design, key concepts, understanding, purpose, content models, the role and qualifications tutor (counselor is expected, procedures or steps in the implementation of the audio-visual, evaluation, follow-up, support system. This research is proven effective in improving student behavior. Empathy behavior of students increases 28.9% from the previous 45.08% increase to 73.98%. This increase occurred in all aspects of empathy Keywords: Effective, Audio visual, Empathy

  15. Real-time decreased sensitivity to an audio-visual illusion during goal-directed reaching.

    Directory of Open Access Journals (Sweden)

    Luc Tremblay

    Full Text Available In humans, sensory afferences are combined and integrated by the central nervous system (Ernst MO, Bülthoff HH (2004 Trends Cogn. Sci. 8: 162-169 and appear to provide a holistic representation of the environment. Empirical studies have repeatedly shown that vision dominates the other senses, especially for tasks with spatial demands. In contrast, it has also been observed that sound can strongly alter the perception of visual events. For example, when presented with 2 flashes and 1 beep in a very brief period of time, humans often report seeing 1 flash (i.e. fusion illusion, Andersen TS, Tiippana K, Sams M (2004 Brain Res. Cogn. Brain Res. 21: 301-308. However, it is not known how an unfolding movement modulates the contribution of vision to perception. Here, we used the audio-visual illusion to demonstrate that goal-directed movements can alter visual information processing in real-time. Specifically, the fusion illusion was linearly reduced as a function of limb velocity. These results suggest that cue combination and integration can be modulated in real-time by goal-directed behaviors; perhaps through sensory gating (Chapman CE, Beauchamp E (2006 J. Neurophysiol. 96: 1664-1675 and/or altered sensory noise (Ernst MO, Bülthoff HH (2004 Trends Cogn. Sci. 8: 162-169 during limb movements.

  16. The role of reverberation-related binaural cues in the externalization of speech.

    Science.gov (United States)

    Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

    2015-08-01

    The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners' ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones. The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient for the externalization of a lateral sound source. In contrast, for a frontal source, an increased amount of binaural cues from reflections was required in order to obtain well externalized sound images. It was demonstrated that the interaction between the interaural cues of the direct sound and the reverberation strongly affects the perception of externalization. An analysis of the short-term binaural cues showed that the amount of fluctuations of the binaural cues corresponded well to the externalization ratings obtained in the listening tests. The results further suggested that the precedence effect is involved in the auditory processing of the dynamic binaural cues that are utilized for externalization perception.

  17. Psychoacoustic cues to emotion in speech prosody and music.

    Science.gov (United States)

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.

  18. Audio/visual analysis for high-speed TV advertisement detection from MPEG bitstream

    OpenAIRE

    Sadlier, David A.

    2002-01-01

    Advertisement breaks dunng or between television programmes are typically flagged by senes of black-and-silent video frames, which recurrendy occur in order to audio-visually separate individual advertisement spots from one another. It is the regular prevalence of these flags that enables automatic differentiauon between what is programme content and what is advertisement break. Detection of these audio-visual depressions within broadcast television content provides a basis on which advertise...

  19. Common cues to emotion in the dynamic facial expressions of speech and song.

    Science.gov (United States)

    Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline

    2015-01-01

    Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.

  20. Audio-visual biofeedback for respiratory-gated radiotherapy: Impact of audio instruction and audio-visual biofeedback on respiratory-gated radiotherapy

    International Nuclear Information System (INIS)

    George, Rohini; Chung, Theodore D.; Vedam, Sastry S.; Ramakrishnan, Viswanathan; Mohan, Radhe; Weiss, Elisabeth; Keall, Paul J.

    2006-01-01

    Purpose: Respiratory gating is a commercially available technology for reducing the deleterious effects of motion during imaging and treatment. The efficacy of gating is dependent on the reproducibility within and between respiratory cycles during imaging and treatment. The aim of this study was to determine whether audio-visual biofeedback can improve respiratory reproducibility by decreasing residual motion and therefore increasing the accuracy of gated radiotherapy. Methods and Materials: A total of 331 respiratory traces were collected from 24 lung cancer patients. The protocol consisted of five breathing training sessions spaced about a week apart. Within each session the patients initially breathed without any instruction (free breathing), with audio instructions and with audio-visual biofeedback. Residual motion was quantified by the standard deviation of the respiratory signal within the gating window. Results: Audio-visual biofeedback significantly reduced residual motion compared with free breathing and audio instruction. Displacement-based gating has lower residual motion than phase-based gating. Little reduction in residual motion was found for duty cycles less than 30%; for duty cycles above 50% there was a sharp increase in residual motion. Conclusions: The efficiency and reproducibility of gating can be improved by: incorporating audio-visual biofeedback, using a 30-50% duty cycle, gating during exhalation, and using displacement-based gating

  1. When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

    Science.gov (United States)

    Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

    2017-11-01

    Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition.

    Science.gov (United States)

    Jesse, Alexandra; McQueen, James M

    2014-01-01

    Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker say fragments of word pairs that were segmentally identical but differed in their stress realization (e.g., 'ca-vi from cavia "guinea pig" vs. 'ka-vi from kaviaar "caviar"). Participants were able to distinguish between these pairs from seeing a speaker alone. Only the presence of primary stress in the fragment, not its absence, was informative. Participants were able to distinguish visually primary from secondary stress on first syllables, but only when the fragment-bearing target word carried phrase-level emphasis. Furthermore, participants distinguished fragments with primary stress on their second syllable from those with secondary stress on their first syllable (e.g., pro-'jec from projector "projector" vs. 'pro-jec from projectiel "projectile"), independently of phrase-level emphasis. Seeing a speaker thus contributes to spoken-word recognition by providing suprasegmental information about the presence of primary lexical stress.

  3. Comparison of Cue-Controlled Desensitization, Rational Restructuring, and a Credible Placebo in the Treatment of Speech Anxiety.

    Science.gov (United States)

    Lent, Robert W.; And Others

    1981-01-01

    The efficacy of cue-controlled desensitization and systematic rational restructuring was compared with a placebo method and a waiting-list control in reducing public speaking and nontargeted anxieties. Cue-controlled desensitization was generally more effective than the other groups in reducing subjective speech anxiety. (Author)

  4. CREATING AUDIO VISUAL DIALOGUE TASK AS STUDENTS’ SELF ASSESSMENT TO ENHANCE THEIR SPEAKING ABILITY

    Directory of Open Access Journals (Sweden)

    Novia Trisanti

    2017-04-01

    Full Text Available The study is about giving overview of employing audio visual dialogue task as students creativity task and self assessment in EFL speaking class of tertiary education to enhance the students speaking ability. The qualitative research was done in one of the speaking classes at English Department, Semarang State University, Central Java, Indonesia. The results that can be seen from the rubric of self assessment show that the oral performance through audio visual recorded tasks done by the students as their self assessment gave positive evidences. The audio visual dialogue task can be very beneficial since it can motivate the students learning and increase their learning experiences. The self-assessment can be a valuable additional means to improve their speaking ability since it is one of the motives that drive self- evaluatioan, along with self- verification and self- enhancement.

  5. Sensitivity to audio-visual synchrony and its relation to language abilities in children with and without ASD.

    Science.gov (United States)

    Righi, Giulia; Tenenbaum, Elena J; McCormick, Carolyn; Blossom, Megan; Amso, Dima; Sheinkopf, Stephen J

    2018-04-01

    Autism Spectrum Disorder (ASD) is often accompanied by deficits in speech and language processing. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to examine whether young children with ASD show reduced sensitivity to temporal asynchronies in a speech processing task when compared to typically developing controls, and to examine how this sensitivity might relate to language proficiency. Using automated eye tracking methods, we found that children with ASD failed to demonstrate sensitivity to asynchronies of 0.3s, 0.6s, or 1.0s between a video of a woman speaking and the corresponding audio track. In contrast, typically developing children who were language-matched to the ASD group, were sensitive to both 0.6s and 1.0s asynchronies. We also demonstrated that individual differences in sensitivity to audiovisual asynchronies and individual differences in orientation to relevant facial features were both correlated with scores on a standardized measure of language abilities. Results are discussed in the context of attention to visual language and audio-visual processing as potential precursors to language impairment in ASD. Autism Res 2018, 11: 645-653. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to explore whether children with ASD process audio-visual synchrony in ways comparable to their typically developing peers, and the relationship between preference for synchrony and language ability. Results showed that

  6. Rehabilitation of balance-impaired stroke patients through audio-visual biofeedback

    DEFF Research Database (Denmark)

    Gheorghe, Cristina; Nissen, Thomas; Juul Rosengreen Christensen, Daniel

    2015-01-01

    This study explored how audio-visual biofeedback influences physical balance of seven balance-impaired stroke patients, between 33–70 years-of-age. The setup included a bespoke balance board and a music rhythm game. The procedure was designed as follows: (1) a control group who performed a balance...... training exercise without any technological input, (2) a visual biofeedback group, performing via visual input, and (3) an audio-visual biofeedback group, performing via audio and visual input. Results retrieved from comparisons between the data sets (2) and (3) suggested superior postural stability...

  7. When Meaning Is Not Enough: Distributional and Semantic Cues to Word Categorization in Child Directed Speech.

    Science.gov (United States)

    Feijoo, Sara; Muñoz, Carmen; Amadó, Anna; Serrat, Elisabet

    2017-01-01

    One of the most important tasks in first language development is assigning words to their grammatical category. The Semantic Bootstrapping Hypothesis postulates that, in order to accomplish this task, children are guided by a neat correspondence between semantic and grammatical categories, since nouns typically refer to objects and verbs to actions. It is this correspondence that guides children's initial word categorization. Other approaches, on the other hand, suggest that children might make use of distributional cues and word contexts to accomplish the word categorization task. According to such approaches, the Semantic Bootstrapping assumption offers an important limitation, as it might not be true that all the nouns that children hear refer to specific objects or people. In order to explore that, we carried out two studies based on analyses of children's linguistic input. We analyzed child-directed speech addressed to four children under the age of 2;6, taken from the CHILDES database. The corpora were selected from the Manchester corpus. The corpora from the four selected children contained a total of 10,681 word types and 364,196 word tokens. In our first study, discriminant analyses were performed using semantic cues alone. The results show that many of the nouns found in parents' speech do not relate to specific objects and that semantic information alone might not be sufficient for successful word categorization. Given that there must be an additional source of information which, alongside with semantics, might assist young learners in word categorization, our second study explores the availability of both distributional and semantic cues in child-directed speech. Our results confirm that this combination might yield better results for word categorization. These results are in line with theories that suggest the need for an integration of multiple cues from different sources in language development.

  8. A chimpanzee recognizes synthetic speech with significantly reduced acoustic cues to phonetic content.

    Science.gov (United States)

    Heimbauer, Lisa A; Beran, Michael J; Owren, Michael J

    2011-07-26

    A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human. Copyright © 2011 Elsevier Ltd. All rights reserved.

  9. Audio-visual training-aid for speechreading

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich; Gebert, H.

    2011-01-01

    People with decreasing hearing ability are more dependent on alternative personal communication channels. To ‘read and understand’ visible articulatory movements of the conversation partner, as done in the process of speechreading, is one possible solution for understanding verbal statements...... on the employment of computer‐based communication aids for hearing‐impaired, deaf and deaf‐blind people [6]. This paper presents the complete system that is composed of a 3D‐facial animation with synchronized speech synthesis, a natural language dialogue unit and a student‐teacher‐training module. Due to the very...... modular structure of the software package and the centralized event manager, it is possible to add or replace specific modules when needed. The present version of our teacher‐student module uses a hierarchically structured composition of important single words and short phrases, supplemented by easy...

  10. Teacher’s Voice on Metacognitive Strategy Based Instruction Using Audio Visual Aids for Listening

    Directory of Open Access Journals (Sweden)

    Salasiah Salasiah

    2018-02-01

    Full Text Available The paper primarily stresses on exploring the teacher’s voice toward the application of metacognitive strategy with audio-visual aid in improving listening comprehension. The metacognitive strategy model applied in the study was inspired from Vandergrift and Tafaghodtari (2010 instructional model. Thus it is modified in the procedure and applied with audio-visual aids for improving listening comprehension. The study’s setting was at SMA Negeri 2 Parepare, South Sulawesi Province, Indonesia. The population of the research was the teacher of English at tenth grade at SMAN 2. The sample was taken by using random sampling technique. The data was collected by using in depth interview during the research, recorded, and analyzed using qualitative analysis. This study explored the teacher’s response toward the modified model of metacognitive strategy with audio visual aids in class of listening which covers positive and negative response toward the strategy applied during the teaching of listening. The result of data showed that this strategy helped the teacher a lot in teaching listening comprehension as the procedure has systematic steps toward students’ listening comprehension. Also, it eases the teacher to teach listening by empowering audio visual aids such as video taken from youtube.

  11. Audio-Visual Processing in Meetings: Seven Questions and Current AMI Answers

    NARCIS (Netherlands)

    Al-Hamas, Marc; Hain, Thomas; Cernocky, Jan; Schreiber, Sascha; Poel, Mannes; Rienks, R.J.

    2007-01-01

    The project Augmented Multi-party Interaction (AMI) is concerned with the development of meeting browsers and remote meeting assistants for instrumented meeting rooms – and the required component technologies R&D themes: group dynamics, audio, visual, and multimodal processing, content abstraction,

  12. Primary School Pupils' Response to Audio-Visual Learning Process in Port-Harcourt

    Science.gov (United States)

    Olube, Friday K.

    2015-01-01

    The purpose of this study is to examine primary school children's response on the use of audio-visual learning processes--a case study of Chokhmah International Academy, Port-Harcourt (owned by Salvation Ministries). It looked at the elements that enhance pupils' response to educational television programmes and their hindrances to these…

  13. Designing between Pedagogies and Cultures: Audio-Visual Chinese Language Resources for Australian Schools

    Science.gov (United States)

    Yuan, Yifeng; Shen, Huizhong

    2016-01-01

    This design-based study examines the creation and development of audio-visual Chinese language teaching and learning materials for Australian schools by incorporating users' feedback and content writers' input that emerged in the designing process. Data were collected from workshop feedback of two groups of Chinese-language teachers from primary…

  14. Multimodal indexing of digital audio-visual documents: A case study for cultural heritage data

    NARCIS (Netherlands)

    Carmichael, J.; Larson, M.; Marlow, J.; Newman, E.; Clough, P.; Oomen, J.; Sav, S.

    2008-01-01

    This paper describes a multimedia multimodal information access sub-system (MIAS) for digital audio-visual documents, typically presented in streaming media format. The system is designed to provide both professional and general users with entry points into video documents that are relevant to their

  15. Selective attention modulates the direction of audio-visual temporal recalibration.

    Science.gov (United States)

    Ikumi, Nara; Soto-Faraco, Salvador

    2014-01-01

    Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging), was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes.

  16. Selective attention modulates the direction of audio-visual temporal recalibration.

    Directory of Open Access Journals (Sweden)

    Nara Ikumi

    Full Text Available Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging, was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes.

  17. An Annotated Guide to Audio-Visual Materials for Teaching Shakespeare.

    Science.gov (United States)

    Albert, Richard N.

    Audio-visual materials, found in a variety of periodicals, catalogs, and reference works, are listed in this guide to expedite the process of finding appropriate classroom materials for a study of William Shakespeare in the classroom. Separate listings of films, filmstrips, and recordings are provided, with subdivisions for "The Plays"…

  18. Audio-visual synchrony and feature-selective attention co-amplify early visual processing.

    Science.gov (United States)

    Keitel, Christian; Müller, Matthias M

    2016-05-01

    Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space.

  19. Interactive Football-Training Based on Rebounders with Hit Position Sensing and Audio-Visual Feedback

    DEFF Research Database (Denmark)

    Jensen, Mads Møller; Grønbæk, Kaj; Thomassen, Nikolaj

    2014-01-01

    . However, most of these tools are created with a single goal, either to measure or train, and are often used and tested in very controlled settings. In this paper, we present an interactive football-training platform, called Football Lab, featuring sensor- mounted rebounders as well as audio-visual...

  20. Online Dissection Audio-Visual Resources for Human Anatomy: Undergraduate Medical Students' Usage and Learning Outcomes

    Science.gov (United States)

    Choi-Lundberg, Derek L.; Cuellar, William A.; Williams, Anne-Marie M.

    2016-01-01

    In an attempt to improve undergraduate medical student preparation for and learning from dissection sessions, dissection audio-visual resources (DAVR) were developed. Data from e-learning management systems indicated DAVR were accessed by 28% ± 10 (mean ± SD for nine DAVR across three years) of students prior to the corresponding dissection…

  1. Attention to affective audio-visual information: Comparison between musicians and non-musicians

    NARCIS (Netherlands)

    Weijkamp, J.; Sadakata, M.

    2017-01-01

    Individuals with more musical training repeatedly demonstrate enhanced auditory perception abilities. The current study examined how these enhanced auditory skills interact with attention to affective audio-visual stimuli. A total of 16 participants with more than 5 years of musical training

  2. Acceptance of online audio-visual cultural heritage archive services: a study of the general public

    NARCIS (Netherlands)

    Ongena, G.; van de Wijngaert, Lidwien; Huizer, E.

    2013-01-01

    Introduction. This study examines the antecedents of user acceptance of an audio-visual heritage archive for a wider audience (i.e., the general public) by extending the technology acceptance model with the concepts of perceived enjoyment, nostalgia proneness and personal innovativeness. Method. A

  3. Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

    Science.gov (United States)

    Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

    This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of

  4. Enhanced audio-visual interactions in the auditory cortex of elderly cochlear-implant users.

    Science.gov (United States)

    Schierholz, Irina; Finke, Mareike; Schulte, Svenja; Hauthal, Nadine; Kantzke, Christoph; Rach, Stefan; Büchner, Andreas; Dengler, Reinhard; Sandmann, Pascale

    2015-10-01

    Auditory deprivation and the restoration of hearing via a cochlear implant (CI) can induce functional plasticity in auditory cortical areas. How these plastic changes affect the ability to integrate combined auditory (A) and visual (V) information is not yet well understood. In the present study, we used electroencephalography (EEG) to examine whether age, temporary deafness and altered sensory experience with a CI can affect audio-visual (AV) interactions in post-lingually deafened CI users. Young and elderly CI users and age-matched NH listeners performed a speeded response task on basic auditory, visual and audio-visual stimuli. Regarding the behavioral results, a redundant signals effect, that is, faster response times to cross-modal (AV) than to both of the two modality-specific stimuli (A, V), was revealed for all groups of participants. Moreover, in all four groups, we found evidence for audio-visual integration. Regarding event-related responses (ERPs), we observed a more pronounced visual modulation of the cortical auditory response at N1 latency (approximately 100 ms after stimulus onset) in the elderly CI users when compared with young CI users and elderly NH listeners. Thus, elderly CI users showed enhanced audio-visual binding which may be a consequence of compensatory strategies developed due to temporary deafness and/or degraded sensory input after implantation. These results indicate that the combination of aging, sensory deprivation and CI facilitates the coupling between the auditory and the visual modality. We suggest that this enhancement in multisensory interactions could be used to optimize auditory rehabilitation, especially in elderly CI users, by the application of strong audio-visually based rehabilitation strategies after implant switch-on. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Listeners' expectation of room acoustical parameters based on visual cues

    Science.gov (United States)

    Valente, Daniel L.

    Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audio-visual study, in which participants are instructed to make spatial congruency and quantity judgments in dynamic cross-modal environments. The results of these psychophysical tests suggest the importance of consilient audio-visual presentation to the legibility of an auditory scene. Several studies have looked into audio-visual interaction in room perception in recent years, but these studies rely on static images, speech signals, or photographs alone to represent the visual scene. Building on these studies, the aim is to propose a testing method that uses monochromatic compositing (blue-screen technique) to position a studio recording of a musical performance in a number of virtual acoustical environments and ask subjects to assess these environments. In the first experiment of the study, video footage was taken from five rooms varying in physical size from a small studio to a small performance hall. Participants were asked to perceptually align two distinct acoustical parameters---early-to-late reverberant energy ratio and reverberation time---of two solo musical performances in five contrasting visual environments according to their expectations of how the room should sound given its visual appearance. In the second experiment in the study, video footage shot from four different listening positions within a general-purpose space was coupled with sounds derived from measured binaural impulse responses (IRs). The relationship between the presented image, sound, and virtual receiver position was examined. It was found that many visual cues caused different perceived events of the acoustic environment. This included the visual attributes of the space in which the performance was located as well as the visual attributes of the performer

  6. Impact of audio-visual storytelling in simulation learning experiences of undergraduate nursing students.

    Science.gov (United States)

    Johnston, Sandra; Parker, Christina N; Fox, Amanda

    2017-09-01

    Use of high fidelity simulation has become increasingly popular in nursing education to the extent that it is now an integral component of most nursing programs. Anecdotal evidence suggests that students have difficulty engaging with simulation manikins due to their unrealistic appearance. Introduction of the manikin as a 'real patient' with the use of an audio-visual narrative may engage students in the simulated learning experience and impact on their learning. A paucity of literature currently exists on the use of audio-visual narratives to enhance simulated learning experiences. This study aimed to determine if viewing an audio-visual narrative during a simulation pre-brief altered undergraduate nursing student perceptions of the learning experience. A quasi-experimental post-test design was utilised. A convenience sample of final year baccalaureate nursing students at a large metropolitan university. Participants completed a modified version of the Student Satisfaction with Simulation Experiences survey. This 12-item questionnaire contained questions relating to the ability to transfer skills learned in simulation to the real clinical world, the realism of the simulation and the overall value of the learning experience. Descriptive statistics were used to summarise demographic information. Two tailed, independent group t-tests were used to determine statistical differences within the categories. Findings indicated that students reported high levels of value, realism and transferability in relation to the viewing of an audio-visual narrative. Statistically significant results (t=2.38, psimulation to clinical practice. The subgroups of age and gender although not significant indicated some interesting results. High satisfaction with simulation was indicated by all students in relation to value and realism. There was a significant finding in relation to transferability on knowledge and this is vital to quality educational outcomes. Copyright © 2017. Published by

  7. Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2017-01-01

    Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....

  8. Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2018-01-01

    Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....

  9. Independent Interactive Inquiry-Based Learning Modules Using Audio-Visual Instruction In Statistics

    OpenAIRE

    McDaniel, Scott N.; Green, Lisa

    2012-01-01

    Simulations can make complex ideas easier for students to visualize and understand. It has been shown that guidance in the use of these simulations enhances students’ learning. This paper describes the implementation and evaluation of the Independent Interactive Inquiry-based (I3) Learning Modules, which use existing open-source Java applets, combined with audio-visual instruction. Students are guided to discover and visualize important concepts in post-calculus and algebra-based courses in p...

  10. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.

  11. Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

    Directory of Open Access Journals (Sweden)

    Md. Rabiul Islam

    2014-01-01

    Full Text Available The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs and Linear Prediction Cepstral Coefficients (LPCCs are combined to get the audio feature vectors and Active Shape Model (ASM based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features.

  12. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

    Science.gov (United States)

    Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

    2016-06-17

    The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. © The Author(s) 2016.

  13. Investigating the impact of audio instruction and audio-visual biofeedback for lung cancer radiation therapy

    Science.gov (United States)

    George, Rohini

    Lung cancer accounts for 13% of all cancers in the Unites States and is the leading cause of deaths among both men and women. The five-year survival for lung cancer patients is approximately 15%.(ACS facts & figures) Respiratory motion decreases accuracy of thoracic radiotherapy during imaging and delivery. To account for respiration, generally margins are added during radiation treatment planning, which may cause a substantial dose delivery to normal tissues and increase the normal tissue toxicity. To alleviate the above-mentioned effects of respiratory motion, several motion management techniques are available which can reduce the doses to normal tissues, thereby reducing treatment toxicity and allowing dose escalation to the tumor. This may increase the survival probability of patients who have lung cancer and are receiving radiation therapy. However the accuracy of these motion management techniques are inhibited by respiration irregularity. The rationale of this thesis was to study the improvement in regularity of respiratory motion by breathing coaching for lung cancer patients using audio instructions and audio-visual biofeedback. A total of 331 patient respiratory motion traces, each four minutes in length, were collected from 24 lung cancer patients enrolled in an IRB-approved breathing-training protocol. It was determined that audio-visual biofeedback significantly improved the regularity of respiratory motion compared to free breathing and audio instruction, thus improving the accuracy of respiratory gated radiotherapy. It was also observed that duty cycles below 30% showed insignificant reduction in residual motion while above 50% there was a sharp increase in residual motion. The reproducibility of exhale based gating was higher than that of inhale base gating. Modeling the respiratory cycles it was found that cosine and cosine 4 models had the best correlation with individual respiratory cycles. The overall respiratory motion probability distribution

  14. Semantic congruency but not temporal synchrony enhances long-term memory performance for audio-visual scenes.

    Science.gov (United States)

    Meyerhoff, Hauke S; Huff, Markus

    2016-04-01

    Human long-term memory for visual objects and scenes is tremendous. Here, we test how auditory information contributes to long-term memory performance for realistic scenes. In a total of six experiments, we manipulated the presentation modality (auditory, visual, audio-visual) as well as semantic congruency and temporal synchrony between auditory and visual information of brief filmic clips. Our results show that audio-visual clips generally elicit more accurate memory performance than unimodal clips. This advantage even increases with congruent visual and auditory information. However, violations of audio-visual synchrony hardly have any influence on memory performance. Memory performance remained intact even with a sequential presentation of auditory and visual information, but finally declined when the matching tracks of one scene were presented separately with intervening tracks during learning. With respect to memory performance, our results therefore show that audio-visual integration is sensitive to semantic congruency but remarkably robust against asymmetries between different modalities.

  15. Effects of noise and audiovisual cues on speech processing in adults with and without ADHD.

    Science.gov (United States)

    Michalek, Anne M P; Watson, Silvana M; Ash, Ivan; Ringleb, Stacie; Raymer, Anastasia

    2014-03-01

    This study examined the interplay among internal (e.g. attention, working memory abilities) and external (e.g. background noise, visual information) factors in individuals with and without ADHD. A 2 × 2 × 6 mixed design with correlational analyses was used to compare participant results on a standardized listening in noise sentence repetition task (QuickSin; Killion et al, 2004 ), presented in an auditory and an audiovisual condition as signal-to-noise ratio (SNR) varied from 25-0 dB and to determine individual differences in working memory capacity and short-term recall. Thirty-eight young adults without ADHD and twenty-five young adults with ADHD. Diagnosis, modality, and signal-to-noise ratio all affected the ability to process speech in noise. The interaction between the diagnosis of ADHD, the presence of visual cues, and the level of noise had an effect on a person's ability to process speech in noise. conclusion: Young adults with ADHD benefited less from visual information during noise than young adults without ADHD, an effect influenced by working memory abilities.

  16. A psychophysical imaging method evidencing auditory cue extraction during speech perception: a group analysis of auditory classification images.

    Science.gov (United States)

    Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel

    2015-01-01

    Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.

  17. A psychophysical imaging method evidencing auditory cue extraction during speech perception: a group analysis of auditory classification images.

    Directory of Open Access Journals (Sweden)

    Léo Varnet

    Full Text Available Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.

  18. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech

    Directory of Open Access Journals (Sweden)

    Matthew ePoon

    2015-11-01

    Full Text Available Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound happier than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here we describe a novel, score-based exploration of the use of pitch height and timing in a set of balanced major and minor key compositions. Our corpus contained all 24 Preludes and 24 Fugues from Bach’s Well Tempered Clavier (book 1, as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor and key chroma (A, B, C, etc.. Consistent with predictions derived from speech, we found major-key (nominally happy pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally sad pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post-hoc analyses illustrate interesting trade-offs, with

  19. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech.

    Science.gov (United States)

    Poon, Matthew; Schutz, Michael

    2015-01-01

    Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound "happier" than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of "balanced" major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach's Well-Tempered Clavier (book 1), as well as all 24 of Chopin's Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma ("A," "B," "C," etc.). Consistent with predictions derived from speech, we found major-key (nominally "happy") pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally "sad") pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music.

  20. Use of audio-visual methods in radiology and physics courses

    Energy Technology Data Exchange (ETDEWEB)

    Holmberg, P

    1987-03-15

    Today's medicine utilizes sophisticated equipment for radiological, biochemical and microbiological investigation procedures and analyses. Hence it is necessary that physicans have adequate scientific and technical knowledge of the apparatus they are using so that the equipment can be used in the most effective way. Partly this knowledge is obtained from science-orientated courses in the preclinical stage of the study program for medical students. To increase the motivation to study science-courses (medical physics) audio-visual methods are used to describe diagnostic and therapeutic procedures in the clinical routines.

  1. The use of audio-visual methods in radiology and physics courses

    International Nuclear Information System (INIS)

    Holmberg, P.

    1987-01-01

    Today's medicine utilizes sophisticated equipment for radiological, biochemical and microbiological investigation procedures and analyses. Hence it is necessary that physicans have adequate scientific and technical knowledge of the apparatus they are using so that the equipment can be used in the most effective way. Partly this knowledge is obtained from science-orientated courses in the preclinical stage of the study program for medical students. To increase the motivation to study science-courses (medical physics) audio-visual methods are used to describe diagnostic and therapeutic procedures in the clinical routines. (orig.)

  2. PHYSIOLOGICAL MONITORING OPERATORS ACS IN AUDIO-VISUAL SIMULATION OF AN EMERGENCY

    Directory of Open Access Journals (Sweden)

    S. S. Aleksanin

    2010-01-01

    Full Text Available In terms of ship simulator automated control systems we have investigated the information content of physiological monitoring cardiac rhythm to assess the reliability and noise immunity of operators of various specializations with audio-visual simulation of an emergency. In parallel, studied the effectiveness of protection against the adverse effects of electromagnetic fields. Monitoring of cardiac rhythm in a virtual crash it is possible to differentiate the degree of voltage regulation systems of body functions of operators on specialization and note the positive effect of the use of means of protection from exposure of electromagnetic fields.

  3. An interactive audio-visual installation using ubiquitous hardware and web-based software deployment

    Directory of Open Access Journals (Sweden)

    Tiago Fernandes Tavares

    2015-05-01

    Full Text Available This paper describes an interactive audio-visual musical installation, namely MOTUS, that aims at being deployed using low-cost hardware and software. This was achieved by writing the software as a web application and using only hardware pieces that are built-in most modern personal computers. This scenario implies in specific technical restrictions, which leads to solutions combining both technical and artistic aspects of the installation. The resulting system is versatile and can be freely used from any computer with Internet access. Spontaneous feedback from the audience has shown that the provided experience is interesting and engaging, regardless of the use of minimal hardware.

  4. Audio-Visual Feedback for Self-monitoring Posture in Ballet Training

    DEFF Research Database (Denmark)

    Knudsen, Esben Winther; Hølledig, Malte Lindholm; Bach-Nielsen, Sebastian Siem

    2017-01-01

    An application for ballet training is presented that monitors the posture position (straightness of the spine and rotation of the pelvis) deviation from the ideal position in real-time. The human skeletal data is acquired through a Microsoft Kinect v2. The movement of the student is mirrored......-coded. In an experiment with 9-12 year-old dance students from a ballet school, comparing the audio-visual feedback modality with no feedback leads to an increase in posture accuracy (p

  5. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Georgios Mantokoudis

    Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  6. Finding the Correspondence of Audio-Visual Events by Object Manipulation

    Science.gov (United States)

    Nishibori, Kento; Takeuchi, Yoshinori; Matsumoto, Tetsuya; Kudo, Hiroaki; Ohnishi, Noboru

    A human being understands the objects in the environment by integrating information obtained by the senses of sight, hearing and touch. In this integration, active manipulation of objects plays an important role. We propose a method for finding the correspondence of audio-visual events by manipulating an object. The method uses the general grouping rules in Gestalt psychology, i.e. “simultaneity” and “similarity” among motion command, sound onsets and motion of the object in images. In experiments, we used a microphone, a camera, and a robot which has a hand manipulator. The robot grasps an object like a bell and shakes it or grasps an object like a stick and beat a drum in a periodic, or non-periodic motion. Then the object emits periodical/non-periodical events. To create more realistic scenario, we put other event source (a metronome) in the environment. As a result, we had a success rate of 73.8 percent in finding the correspondence between audio-visual events (afferent signal) which are relating to robot motion (efferent signal).

  7. Spectacular Attractions: Museums, Audio-Visuals and the Ghosts of Memory

    Directory of Open Access Journals (Sweden)

    Mandelli Elisa

    2015-12-01

    Full Text Available In the last decades, moving images have become a common feature not only in art museums, but also in a wide range of institutions devoted to the conservation and transmission of memory. This paper focuses on the role of audio-visuals in the exhibition design of history and memory museums, arguing that they are privileged means to achieve the spectacular effects and the visitors’ emotional and “experiential” engagement that constitute the main objective of contemporary museums. I will discuss this topic through the concept of “cinematic attraction,” claiming that when embedded in displays, films and moving images often produce spectacular mises en scène with immersive effects, creating wonder and astonishment, and involving visitors on an emotional, visceral and physical level. Moreover, I will consider the diffusion of audio-visual witnesses of real or imaginary historical characters, presented in Phantasmagoria-like displays that simulate ghostly and uncanny apparitions, creating an ambiguous and often problematic coexistence of truth and illusion, subjectivity and objectivity, facts and imagination.

  8. The Improvement of Students’ Leadership Ethic in Studying History by Using Baratayuda Audio Visual Media

    Directory of Open Access Journals (Sweden)

    Wendhy Rachmadhany

    2018-04-01

    Full Text Available The purpose of this research is to know the improvement of students’ leadership ethic in studying History after the implementation of Baratayuda Audio Visual Media. The population of this research is XI-Social Science-1 Class of SMAN 1 Pare, Kediri Regency, in academic year 2016/2017, consisted of 39 students. This Classroom Action Research (CAR is arranged by Pre-test, Cycle-1 and Cycle-2 which consisted by some steps, such like; planning, implementation, observation, and reflection. Collecting the data is by using questionnaire of leadership ethic, interview, and documentation. The method of data analysis in this research is descriptive analysis by comparing the improvement from one cycle to another. The result of the research is showing that: There is an improvement of leadership ethic in studying History after the implementation of Baratayuda Audio Visual media. It is shown by the results as follows; Pre-test indicates that the passing score is about 17, 95%. On Cycle-1 indicates 46, 1% and on Cycle-2 indicates a significant improvement about 71, 83%.

  9. The presentation of expert testimony via live audio-visual communication.

    Science.gov (United States)

    Miller, R D

    1991-01-01

    As part of a national effort to improve efficiency in court procedures, the American Bar Association has recommended, on the basis of a number of pilot studies, increased use of current audio-visual technology, such as telephone and live video communication, to eliminate delays caused by unavailability of participants in both civil and criminal procedures. Although these recommendations were made to facilitate court proceedings, and for the convenience of attorneys and judges, they also have the potential to save significant time for clinical expert witnesses as well. The author reviews the studies of telephone testimony that were done by the American Bar Association and other legal research groups, as well as the experience in one state forensic evaluation and treatment center. He also reviewed the case law on the issue of remote testimony. He then presents data from a national survey of state attorneys general concerning the admissibility of testimony via audio-visual means, including video depositions. Finally, he concludes that the option to testify by telephone provides a significant savings in precious clinical time for forensic clinicians in public facilities, and urges that such clinicians work actively to convince courts and/or legislatures in states that do not permit such testimony (currently the majority), to consider accepting it, to improve the effective use of scarce clinical resources in public facilities.

  10. Designing Promotion Strategy of Malang Raya’s Tourism Destination Branding through Audio Visual Media

    Directory of Open Access Journals (Sweden)

    Chanira Nuansa

    2014-04-01

    Full Text Available This study examines the suitability concept of destination branding with existing models of Malang tourism promotion. This research is qualitative by taking the data directly in the form of existing promotional models of Malang, namely: information portal sites, blogs, social networking, and video via the Internet. This study used SWOT analysis to find strengths, weaknesses, opportunities, and threats on existing models of the tourism promotion. The data is analyzed based on destination branding’s concept indicators. Results of analysis are used as a basis in designing solutions for Malang tourism promotion through a new integrated tourism advertising model. Through the analysis we found that video is the most suitable media that used to promote Malang tourism in the form of advertisements. Videos are able to show the objectivity of the fact that intact better through audio-visual form, making it easier to associate the viewer thoughts on the phenomenon of destination. Moreover, video creation of Malang tourism as well as conceptualized ad is still rare. This is an opportunity, because later models of audio-visual advertisements made of this study is expected to be an example for concerned parties to conceptualize the next Malang tourism advertising.Keywords: Advertise, SWOT Analysis, Malang City, tourism promotion

  11. Open-Loop Audio-Visual Stimulation (AVS): A Useful Tool for Management of Insomnia?

    Science.gov (United States)

    Tang, Hsin-Yi Jean; Riegel, Barbara; McCurry, Susan M; Vitiello, Michael V

    2016-03-01

    Audio Visual Stimulation (AVS), a form of neurofeedback, is a non-pharmacological intervention that has been used for both performance enhancement and symptom management. We review the history of AVS, its two sub-types (close- and open-loop), and discuss its clinical implications. We also describe a promising new application of AVS to improve sleep, and potentially decrease pain. AVS research can be traced back to the late 1800s. AVS's efficacy has been demonstrated for both performance enhancement and symptom management. Although AVS is commonly used in clinical settings, there is limited literature evaluating clinical outcomes and mechanisms of action. One of the challenges to AVS research is the lack of standardized terms, which makes systematic review and literature consolidation difficult. Future studies using AVS as an intervention should; (1) use operational definitions that are consistent with the existing literature, such as AVS, Audio-visual Entrainment, or Light and Sound Stimulation, (2) provide a clear rationale for the chosen training frequency modality, (3) use a randomized controlled design, and (4) follow the Consolidated Standards of Reporting Trials and/or related guidelines when disseminating results.

  12. Effects of audio-visual aids on foreign language test anxiety, reading and listening comprehension, and retention in EFL learners.

    Science.gov (United States)

    Lee, Shu-Ping; Lee, Shin-Da; Liao, Yuan-Lin; Wang, An-Chi

    2015-04-01

    This study examined the effects of audio-visual aids on anxiety, comprehension test scores, and retention in reading and listening to short stories in English as a Foreign Language (EFL) classrooms. Reading and listening tests, general and test anxiety, and retention were measured in English-major college students in an experimental group with audio-visual aids (n=83) and a control group without audio-visual aids (n=94) with similar general English proficiency. Lower reading test anxiety, unchanged reading comprehension scores, and better reading short-term and long-term retention after four weeks were evident in the audiovisual group relative to the control group. In addition, lower listening test anxiety, higher listening comprehension scores, and unchanged short-term and long-term retention were found in the audiovisual group relative to the control group after the intervention. Audio-visual aids may help to reduce EFL learners' listening test anxiety and enhance their listening comprehension scores without facilitating retention of such materials. Although audio-visual aids did not increase reading comprehension scores, they helped reduce EFL learners' reading test anxiety and facilitated retention of reading materials.

  13. Concurrent audio-visual feedback for supporting drivers at intersections: A study using two linked driving simulators.

    Science.gov (United States)

    Houtenbos, M; de Winter, J C F; Hale, A R; Wieringa, P A; Hagenzieker, M P

    2017-04-01

    A large portion of road traffic crashes occur at intersections for the reason that drivers lack necessary visual information. This research examined the effects of an audio-visual display that provides real-time sonification and visualization of the speed and direction of another car approaching the crossroads on an intersecting road. The location of red blinking lights (left vs. right on the speedometer) and the lateral input direction of beeps (left vs. right ear in headphones) corresponded to the direction from where the other car approached, and the blink and beep rates were a function of the approaching car's speed. Two driving simulators were linked so that the participant and the experimenter drove in the same virtual world. Participants (N = 25) completed four sessions (two with the audio-visual display on, two with the audio-visual display off), each session consisting of 22 intersections at which the experimenter approached from the left or right and either maintained speed or slowed down. Compared to driving with the display off, the audio-visual display resulted in enhanced traffic efficiency (i.e., greater mean speed, less coasting) while not compromising safety (i.e., the time gap between the two vehicles was equivalent). A post-experiment questionnaire showed that the beeps were regarded as more useful than the lights. It is argued that the audio-visual display is a promising means of supporting drivers until fully automated driving is technically feasible. Copyright © 2016. Published by Elsevier Ltd.

  14. Attitude of medical students towards the use of audio visual aids during didactic lectures in pharmacology in a medical college of central India

    OpenAIRE

    Mehul Agrawal; Rajanish Kumar Sankdia

    2016-01-01

    Background: Students favour teaching methods employing audio visual aids over didactic lectures not using these aids. However, the optimum use of audio visual aids is essential for deriving their benefits. During a lecture, both the visual and auditory senses are used to absorb information. Different methods of lecture are and ndash; chalk and board, power point presentations (PPT) and mix of aids. This study was done to know the students' preference regarding the various audio visual aids, ...

  15. Sonority's Effect as a Surface Cue on Lexical Speech Perception of Children With Cochlear Implants.

    Science.gov (United States)

    Hamza, Yasmeen; Okalidou, Areti; Kyriafinis, George; van Wieringen, Astrid

    2018-03-06

    Sonority is the relative perceptual prominence/loudness of speech sounds of the same length, stress, and pitch. Children with cochlear implants (CIs), with restored audibility and relatively intact temporal processing, are expected to benefit from the perceptual prominence cues of highly sonorous sounds. Sonority also influences lexical access through the sonority-sequencing principle (SSP), a grammatical phonotactic rule, which facilitates the recognition and segmentation of syllables within speech. The more nonsonorous the onset of a syllable is, the larger is the degree of sonority rise to the nucleus, and the more optimal the SSP. Children with CIs may experience hindered or delayed development of the language-learning rule SSP, as a result of their deprived/degraded auditory experience. The purpose of the study was to explore sonority's role in speech perception and lexical access of prelingually deafened children with CIs. A case-control study with 15 children with CIs, 25 normal-hearing children (NHC), and 50 normal-hearing adults was conducted, using a lexical identification task of novel, nonreal CV-CV words taught via fast mapping. The CV-CV words were constructed according to four sonority conditions, entailing syllables with sonorous onsets/less optimal SSP (SS) and nonsonorous onsets/optimal SSP (NS) in all combinations, that is, SS-SS, SS-NS, NS-SS, and NS-NS. Outcome measures were accuracy and reaction times (RTs). A subgroup analysis of 12 children with CIs pair matched to 12 NHC on hearing age aimed to study the effect of oral-language exposure period on the sonority-related performance. The children groups showed similar accuracy performance, overall and across all the sonority conditions. However, within-group comparisons showed that the children with CIs scored more accurately on the SS-SS condition relative to the NS-NS and NS-SS conditions, while the NHC performed equally well across all conditions. Additionally, adult-comparable accuracy

  16. PERANCANGAN MEDIA PEMBELAJARAN BERBASIS AUDIO VISUAL UNTUK MATA KULIAH TIPOGRAFI PADA PROGRAM STUDI DESAIN KOMUNIKASI VISUAL UNIVERSITAS DIAN NUSWANTORO

    Directory of Open Access Journals (Sweden)

    Puri Sulistiyawati

    2017-02-01

    Full Text Available Abstrak Tipografi merupakan salah satu mata kuliah pada bidang desain komunikasi visual yang mengutamakan aspek visual. Namun berdasarkan hasil observasi diketahui bahwa media pembelajaran yang selama ini digunakan kurang efektif karena kurangnya pemanfaatan teknologi informasi, sehingga mahasiswa kurang maksimal dalam memahami materi kuliah yang disampaikan oleh pengajar. Perkembangan teknologi informasi saat ini banyak memberikan dampak positif bagi kemajuan bidang pendidikan diantaranya dapat digunakan untuk mendukung media dalam proses pembelajaran. Tujuan penelitian ini adalah merancang media pembelajaran untuk mata kuliah tipografi dengan memanfaatkan teknologi informasi yaitu media audio visual. Metode yang digunakan dalam penelitian ini adalah Research and Development dengan pendekatan model ADDIE (Analysis, Design, Development, Implementation, Evaluation. Dengan diciptakannya media pembelajaran audio visual ini diharapkan proses pembelajaran mata kuliah Tipografi dapat lebih efektif dan materi kuliah lebih mudah dipahami oleh mahasiswa. Kata Kunci : audio visual, media pembelajaran, tipografi Abstract Typography is one of the subjects in the field of visual communication design that prioritizes the visual aspect. However, based on the observation note that the media has been used less effective because the lack of use information technology, so students can't understand the course material that explained by lecturers. Today, the development of information technology is being positive impact for the advancement of education which can be used to support the media in the learning process. The purpose of this research is to design learning media for the course of typography by utilizing information technology, called audio-visual media.  The method that used in this research is Research and Development with ADDIE model (Analysis, Design, Development, Implementation, Evaluation. With the creation of audio-visual learning media is expected

  17. Literary Genres in Social Life: A Narrative, Audio-visual and Poetic Approach

    Directory of Open Access Journals (Sweden)

    Luis Felipe González Gutiérrez

    2008-05-01

    Full Text Available The proposal, "Literary Genres in Social Life: a Narrative, Audio-visual and Poetic Approach", attempts, by objective, to present/display to the academic psychology community and compatible social science disciplines the main contributions of literary genre theory through a social constructionist understanding of narrations and daily stories, and by means of an interactive construction of narrative collage. This work, sustained by an investigation financed by the University Santo Tomás in Bogota, Colombia, "Understanding of structuralist literary theories in the development of the narrative 'I' within the social constructionist approach", tries to propose alternative spaces for the presentation of its investigative results through the expression of metaphors, visual narrative sequences and interactive artistic forms, which invite the spectator to share in and to include/understand important concepts in the consolidation of social forms of construction of the quotidian. URN: urn:nbn:de:0114-fqs0802373

  18. Rhythmic synchronization tapping to an audio-visual metronome in budgerigars.

    Science.gov (United States)

    Hasegawa, Ai; Okanoya, Kazuo; Hasegawa, Toshikazu; Seki, Yoshimasa

    2011-01-01

    In all ages and countries, music and dance have constituted a central part in human culture and communication. Recently, vocal-learning animals such as parrots and elephants have been found to share rhythmic ability with humans. Thus, we investigated the rhythmic synchronization of budgerigars, a vocal-mimicking parrot species, under controlled conditions and a systematically designed experimental paradigm as a first step in understanding the evolution of musical entrainment. We trained eight budgerigars to perform isochronous tapping tasks in which they pecked a key to the rhythm of audio-visual metronome-like stimuli. The budgerigars showed evidence of entrainment to external stimuli over a wide range of tempos. They seemed to be inherently inclined to tap at fast tempos, which have a similar time scale to the rhythm of budgerigars' natural vocalizations. We suggest that vocal learning might have contributed to their performance, which resembled that of humans.

  19. Inner Sound: Altered States of Consciousness in Electronic Music and Audio-Visual Media

    DEFF Research Database (Denmark)

    Weinel, Jonathan

    Over the last century, developments in electronic music and art have enabled new possibilities for creating audio and audio-visual artworks. With this new potential has come the possibility for representing subjective internal conscious states, such as the experience of hallucinations, using...... the creative influence of ASCs, from Amazonian chicha festivals to the synaesthetic assaults of neon raves; and from an immersive outdoor electroacoustic performance on an Athenian hilltop to a mushroom trip on a tropical island in virtual reality. Beginning with a discussion of consciousness, the book...... explores how our subjective realities may change during states of dream, psychedelic experience, meditation, and trance. Taking a broad view across a wide range of genres, Inner Sound draws connections between shamanic art and music, and the modern technoshamanism of psychedelic rock, electronic dance...

  20. Insects and the Kafkaesque: Insectuous Re-Writings in Visual and Audio-Visual Media

    Directory of Open Access Journals (Sweden)

    Damianos Grammatikopoulos

    2017-09-01

    Full Text Available In this article, I examine techniques at work in visual and audio-visual media that deal with the creative imitation of central Kafkan themes, particularly those related to hybrid insects and bodily deformity. In addition, the opening section of my study offers a detailed and thorough discussion of the concept of the “Kafkaesque”, and an attempt will be made to circumscribe its signifying limits. The main objective of the study is to explore the relationship between Kafka’s texts and the works of contemporary cartoonists, illustrators (Charles Burns, and filmmakers (David Cronenberg and identify themes and motifs that they have in common. My approach is informed by transtextual practices and source studies, and I draw systematically on Gerard Genette’s Palimpsests and Harold Bloom’s The Anxiety of Influence.

  1. Neuromorphic Audio-Visual Sensor Fusion on a Sound-Localising Robot

    Directory of Open Access Journals (Sweden)

    Vincent Yue-Sek Chan

    2012-02-01

    Full Text Available This paper presents the first robotic system featuring audio-visual sensor fusion with neuromorphic sensors. We combine a pair of silicon cochleae and a silicon retina on a robotic platform to allow the robot to learn sound localisation through self-motion and visual feedback, using an adaptive ITD-based sound localisation algorithm. After training, the robot can localise sound sources (white or pink noise in a reverberant environment with an RMS error of 4 to 5 degrees in azimuth. In the second part of the paper, we investigate the source binding problem. An experiment is conducted to test the effectiveness of matching an audio event with a corresponding visual event based on their onset time. The results show that this technique can be quite effective, despite its simplicity.

  2. Integration of Audio Visual Multimedia for Special Education Pre-Service Teachers' Self Reflections in Developing Teaching Competencies

    Science.gov (United States)

    Sediyani, Tri; Yufiarti; Hadi, Eko

    2017-01-01

    This study aims to develop a model of learning by integrating multimedia and audio-visual self-reflective learners. This multimedia was developed as a tool for prospective teachers as learners in the education of children with special needs to reflect on their teaching competencies before entering the world of education. Research methods to…

  3. The Dynamics and Neural Correlates of Audio-Visual Integration Capacity as Determined by Temporal Unpredictability, Proactive Interference, and SOA.

    Directory of Open Access Journals (Sweden)

    Jonathan M P Wilbiks

    Full Text Available Over 5 experiments, we challenge the idea that the capacity of audio-visual integration need be fixed at 1 item. We observe that the conditions under which audio-visual integration is most likely to exceed 1 occur when stimulus change operates at a slow rather than fast rate of presentation and when the task is of intermediate difficulty such as when low levels of proactive interference (3 rather than 8 interfering visual presentations are combined with the temporal unpredictability of the critical frame (Experiment 2, or, high levels of proactive interference are combined with the temporal predictability of the critical frame (Experiment 4. Neural data suggest that capacity might also be determined by the quality of perceptual information entering working memory. Experiment 5 supported the proposition that audio-visual integration was at play during the previous experiments. The data are consistent with the dynamic nature usually associated with cross-modal binding, and while audio-visual integration capacity likely cannot exceed uni-modal capacity estimates, performance may be better than being able to associate only one visual stimulus with one auditory stimulus.

  4. The Dynamics and Neural Correlates of Audio-Visual Integration Capacity as Determined by Temporal Unpredictability, Proactive Interference, and SOA.

    Science.gov (United States)

    Wilbiks, Jonathan M P; Dyson, Benjamin J

    2016-01-01

    Over 5 experiments, we challenge the idea that the capacity of audio-visual integration need be fixed at 1 item. We observe that the conditions under which audio-visual integration is most likely to exceed 1 occur when stimulus change operates at a slow rather than fast rate of presentation and when the task is of intermediate difficulty such as when low levels of proactive interference (3 rather than 8 interfering visual presentations) are combined with the temporal unpredictability of the critical frame (Experiment 2), or, high levels of proactive interference are combined with the temporal predictability of the critical frame (Experiment 4). Neural data suggest that capacity might also be determined by the quality of perceptual information entering working memory. Experiment 5 supported the proposition that audio-visual integration was at play during the previous experiments. The data are consistent with the dynamic nature usually associated with cross-modal binding, and while audio-visual integration capacity likely cannot exceed uni-modal capacity estimates, performance may be better than being able to associate only one visual stimulus with one auditory stimulus.

  5. Concurrent Unimodal Learning Enhances Multisensory Responses of Bi-Directional Crossmodal Learning in Robotic Audio-Visual Tracking

    DEFF Research Database (Denmark)

    Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

    2018-01-01

    modalities to independently update modality-specific neural weights on a moment-by-moment basis, in response to dynamic changes in noisy sensory stimuli. The circuit is embodied as a non-holonomic robotic agent that must orient a towards a moving audio-visual target. The circuit continuously learns the best...

  6. Exploring determinants of early user acceptance for an audio-visual heritage archive service using the vignette method

    NARCIS (Netherlands)

    Ongena, G.; van de Wijngaert, Lidwien; Huizer, E.

    2013-01-01

    The purpose of this study is to investigate factors, which explain the behavioural intention of the use of a new audio-visual cultural heritage archive service. An online survey in combination with a factorial survey is utilised to investigate the predictable strength of technological, individual

  7. Concurrent audio-visual feedback for supporting drivers at intersections : a study using two linked driving simulators.

    NARCIS (Netherlands)

    Houtenbos, M. Winter, J.C.F. de Hale, A.R. Wieringa, P.A. & Hagenzieker, M.P.

    2016-01-01

    A large portion of road traffic crashes occur at intersections for the reason that drivers lack necessary visual information. This research examined the effects of an audio-visual display that provides real-time sonification and visualization of the speed and direction of another car approaching the

  8. Discrimination and streaming of speech sounds based on differences in interaural and spectral cues.

    Science.gov (United States)

    David, Marion; Lavandier, Mathieu; Grimault, Nicolas; Oxenham, Andrew J

    2017-09-01

    Differences in spatial cues, including interaural time differences (ITDs), interaural level differences (ILDs) and spectral cues, can lead to stream segregation of alternating noise bursts. It is unknown how effective such cues are for streaming sounds with realistic spectro-temporal variations. In particular, it is not known whether the high-frequency spectral cues associated with elevation remain sufficiently robust under such conditions. To answer these questions, sequences of consonant-vowel tokens were generated and filtered by non-individualized head-related transfer functions to simulate the cues associated with different positions in the horizontal and median planes. A discrimination task showed that listeners could discriminate changes in interaural cues both when the stimulus remained constant and when it varied between presentations. However, discrimination of changes in spectral cues was much poorer in the presence of stimulus variability. A streaming task, based on the detection of repeated syllables in the presence of interfering syllables, revealed that listeners can use both interaural and spectral cues to segregate alternating syllable sequences, despite the large spectro-temporal differences between stimuli. However, only the full complement of spatial cues (ILDs, ITDs, and spectral cues) resulted in obligatory streaming in a task that encouraged listeners to integrate the tokens into a single stream.

  9. What Information Is Necessary for Speech Categorization? Harnessing Variability in the Speech Signal by Integrating Cues Computed Relative to Expectations

    Science.gov (United States)

    McMurray, Bob; Jongman, Allard

    2011-01-01

    Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important are the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context dependent. This study assessed the…

  10. Online dissection audio-visual resources for human anatomy: Undergraduate medical students' usage and learning outcomes.

    Science.gov (United States)

    Choi-Lundberg, Derek L; Cuellar, William A; Williams, Anne-Marie M

    2016-11-01

    In an attempt to improve undergraduate medical student preparation for and learning from dissection sessions, dissection audio-visual resources (DAVR) were developed. Data from e-learning management systems indicated DAVR were accessed by 28% ± 10 (mean ± SD for nine DAVR across three years) of students prior to the corresponding dissection sessions, representing at most 58% ± 20 of assigned dissectors. Approximately 50% of students accessed all available DAVR by the end of semester, while 10% accessed none. Ninety percent of survey respondents (response rate 58%) generally agreed that DAVR improved their preparation for and learning from dissection when used. Of several learning resources, only DAVR usage had a significant positive correlation (P = 0.002) with feeling prepared for dissection. Results on cadaveric anatomy practical examination questions in year 2 (Y2) and year 3 (Y3) cohorts were 3.9% (P learning outcomes of more students. Anat Sci Educ 9: 545-554. © 2016 American Association of Anatomists. © 2016 American Association of Anatomists.

  11. Semantics and the multisensory brain: how meaning modulates processes of audio-visual integration.

    Science.gov (United States)

    Doehrmann, Oliver; Naumer, Marcus J

    2008-11-25

    By using meaningful stimuli, multisensory research has recently started to investigate the impact of stimulus content on crossmodal integration. Variations in this respect have often been termed as "semantic". In this paper we will review work related to the question for which tasks the influence of semantic factors has been found and which cortical networks are most likely to mediate these effects. More specifically, the focus of this paper will be on processing of object stimuli presented in the auditory and visual sensory modalities. Furthermore, we will investigate which cortical regions are particularly responsive to experimental variations of content by comparing semantically matching ("congruent") and mismatching ("incongruent") experimental conditions. In this context, recent neuroimaging studies point toward a possible functional differentiation of temporal and frontal cortical regions, with the former being more responsive to semantically congruent and the latter to semantically incongruent audio-visual (AV) stimulation. To account for these differential effects, we will suggest in the final section of this paper a possible synthesis of these data on semantic modulation of AV integration with findings from neuroimaging studies and theoretical accounts of semantic memory.

  12. “A Real China” on User-Generated Videos? Audio-Visual Narratives of Confucianism

    Directory of Open Access Journals (Sweden)

    Jianxiu Hao

    2014-03-01

    Full Text Available Beneath the “Chinese successful story”, social stratification, class polarization, and cultural displacement have been accelerated. The Chinese Communist Party has not found a coherent solution to the challenges of reconciling social interests, since Communism has been more and more becoming mere “lip service”. However, it has been claimed that Confucian values can provide sources to dissolve the downsides of modernization in contemporary Chinese society. This study intends to investigate the revival of Confucianism, as a source for criticism and construction in Chinese socio-culture, as portrayed in user-generated videos which are produced/consumed by the largest Internet using population in the world, under the Chinese authoritarian regime which controls over communication. By means of a thematic audio-visual narrative analysis, this study has investigated 20 hours of Youku Paike videos published between 2007 and 2013. It has been detected:  (1 about one third of the user-generated videos can be interpreted as Confucian thematic narratives; and there is a slightly increasing trend portraying Confucian values; (2 Confucianism can become a source for the formation of a new online socio-culture, in the circumstances of China’s modernization and cyberization, to advocate social actors’ cultivation and humanity’s flourishing.

  13. Bayesian networks and information theory for audio-visual perception modeling.

    Science.gov (United States)

    Besson, Patricia; Richiardi, Jonas; Bourdin, Christophe; Bringoux, Lionel; Mestre, Daniel R; Vercher, Jean-Louis

    2010-09-01

    Thanks to their different senses, human observers acquire multiple information coming from their environment. Complex cross-modal interactions occur during this perceptual process. This article proposes a framework to analyze and model these interactions through a rigorous and systematic data-driven process. This requires considering the general relationships between the physical events or factors involved in the process, not only in quantitative terms, but also in term of the influence of one factor on another. We use tools from information theory and probabilistic reasoning to derive relationships between the random variables of interest, where the central notion is that of conditional independence. Using mutual information analysis to guide the model elicitation process, a probabilistic causal model encoded as a Bayesian network is obtained. We exemplify the method by using data collected in an audio-visual localization task for human subjects, and we show that it yields a well-motivated model with good predictive ability. The model elicitation process offers new prospects for the investigation of the cognitive mechanisms of multisensory perception.

  14. Visual-Auditory Integration during Speech Imitation in Autism

    Science.gov (United States)

    Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

    2004-01-01

    Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…

  15. The efectiveness of mnemonic audio-visual aids in teaching content words to EFL students at a Turkish university

    OpenAIRE

    Kılınç, A Reha

    1996-01-01

    Ankara : Institute of Economics and Social Sciences, Bilkent University, 1996. Thesis(Master's) -- Bilkent University, 1996. Includes bibliographical references leaves 63-67 This experimental study aimed at investigating the effects of mnemonic audio-visual aids on recognition and recall of vocabulary items in comparison to a dictionary using control group. The study was conducted at Middle East Technical University Department of Basic English. The participants were 64 beginner and u...

  16. Interactional convergence in conversational storytelling: when reported speech is a cue of alignment and/or affiliation

    Directory of Open Access Journals (Sweden)

    Mathilde eGuardiola

    2013-10-01

    Full Text Available This paper investigates how and when interactional convergence is established by participants in conversation. We analyze sequences of storytelling using an original method which combines Conversation Analysis and a corpus-based approach. In storytelling, the participant in the position of listener is expected to produce either generic or specific responses adapted to the storyteller’s narrative. The listener’s behavior produced within the current activity, is a cue of his or her interactional alignment. We show here that the listener can produce a specific type of (aligned response which we term a reported speech utterance in echo. The participant who is not telling the story is nonetheless able to animate the characters, while reversing the usual asymmetric roles of storyteller and listener. The use of this device is a way for the listener to display a stance toward the events told by the storyteller. If the listener’s stance is congruent with that of the storyteller, this reveals a high degree of affiliation between the participants. We present seventeen extracts from a collection of 94 instances of echo reported speech which we examined using the concepts of alignment and affiliation in order to show how different kinds of convergent sequences are constructed. We demonstrate that this phenomenon is mainly used by the listener in order to align and affiliate with the storyteller by means of reformulative or overbidding Echo Reported Speech. We also show that in affiliative sequences, reported speech can be used by the listener in a humorous way in order to temporarily disalign. This disalignment constitutes a potential starting point for an oblique sequence, which, if accepted and continued by the storyteller, gives rise to a highly convergent sequence.

  17. MRI-compatible audio/visual system: impact on pediatric sedation

    International Nuclear Information System (INIS)

    Harned, R.K. II; Strain, J.D.

    2001-01-01

    Background. While sedation is necessary for much pediatric imaging, there are new alternatives that may help patients hold still without medication. Objective. We examined the effect of an audio/visual system consisting of video goggles and earphones on the need for sedation during magnetic resonance imaging (MRI). Materials and methods. All MRI examinations from May 1999 to October 1999 performed after installation of the MRVision 2000 (Resonance Technology, Inc.) were compared to the same 6-month period in 1998. Imaging and sedation protocols remained constant. Data collected included: patient age, type of examination, use of intravenous contrast enhancement, and need for sedation. The average supply charge and nursing cost per sedated patient were calculated. Results. The 955 patients from 1998 and 1,112 patients from 1999 were similar in demographics and examination distribution. There was an overall reduction in the percent of patients requiring sedation in the group using the video goggle system from 49 to 40 % (P < 0.001). There was no significant change for 0-2 years (P = 0.805), but there was a reduction from 53 to 40 % for age 3-10 years (P < 0.001) and 16 to 8 % for those older than 10 years (P < 0.001). There was a 17 % decrease in MRI room time for those patients whose examinations could be performed without sedation. Sedation costs per patient were $80 for nursing and $29 for supplies. Conclusion. The use of this video system reduced the number of children requiring sedation for MRI examination by 18 %. In addition to reducing patient risk, this can potentially reduce cost. (orig.)

  18. Subcortical encoding of speech cues in children with attention deficit hyperactivity disorder.

    Science.gov (United States)

    Jafari, Zahra; Malayeri, Saeed; Rostami, Reza

    2015-02-01

    There is little information about processing of nonspeech and speech stimuli at the subcortical level in individuals with attention deficit hyperactivity disorder (ADHD). The auditory brainstem response (ABR) provides information about the function of the auditory brainstem pathways. We aim to investigate the subcortical function in neural encoding of click and speech stimuli in children with ADHD. The subjects include 50 children with ADHD and 34 typically developing (TD) children between the ages of 8 and 12 years. Click ABR (cABR) and speech ABR (sABR) with 40 ms synthetic /da/ syllable stimulus were recorded. Latencies of cABR in waves of III and V and duration of V-Vn (P⩽0.027), and latencies of sABR in waves A, D, E, F and O and duration of V-A (P⩽0.034) were significantly longer in children with ADHD than in TD children. There were no apparent differences in components the sustained frequency following response (FFR). We conclude that children with ADHD have deficits in temporal neural encoding of both nonspeech and speech stimuli. There is a common dysfunction in the processing of click and speech stimuli at the brainstem level in children with suspected ADHD. Copyright © 2015. Published by Elsevier Ireland Ltd.

  19. Working Memory and Speech Recognition in Noise under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type among Adults with Hearing Loss

    Science.gov (United States)

    Miller, Christi W.; Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly

    2017-01-01

    Purpose: This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method: Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2…

  20. Fusion of audio and visual cues for laughter detection

    NARCIS (Netherlands)

    Petridis, Stavros; Pantic, Maja

    Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audio- visual approach to distinguishing laughter from speech and we show that integrating the information from audio and video channels leads to improved performance over single-modal

  1. Interactional convergence in conversational storytelling: when reported speech is a cue of alignment and/or affiliation.

    Science.gov (United States)

    Guardiola, Mathilde; Bertrand, Roxane

    2013-01-01

    This paper investigates how and when interactional convergence is established by participants in conversation. We analyze sequences of storytelling using an original method that combines Conversation Analysis and a corpus-based approach. In storytelling, the participant in the position of "listener" is expected to produce either generic or specific responses adapted to the storyteller's narrative. The listener's behavior produced within the current activity is a cue of his/her interactional alignment. We show here that the listener can produce a specific type of (aligned) response, which we term a reported speech utterance in echo. The participant who is not telling the story is nonetheless able to animate the characters, while reversing the usual asymmetric roles of storyteller and listener. The use of this device is a way for the listener to display his/her stance toward the events told by the storyteller. If the listener's stance is congruent with that of the storyteller, this reveals a high degree of affiliation between the participants. We present seventeen excerpts from a collection of 94 instances of Echo Reported Speech (ERS) which we examined using the concepts of alignment and affiliation in order to show how different kinds of convergent sequences are constructed. We demonstrate that this phenomenon is mainly used by the listener to align and affiliate with the storyteller by means of reformulative, enumerative, or overbidding ERS. We also show that in affiliative sequences, reported speech can be used by the listener in a humorous way in order to temporarily disalign. This disalignment constitutes a potential starting point for an oblique sequence, which, if accepted and continued by the storyteller, gives rise to a highly convergent sequence.

  2. Internet Video Telephony Allows Speech Reading by Deaf Individuals and Improves Speech Perception by Cochlear Implant Users

    Science.gov (United States)

    Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D.; Senn, Pascal

    2013-01-01

    Objective To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Methods Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280×720, 640×480, 320×240, 160×120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0–500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Results Higher frame rate (>7 fps), higher camera resolution (>640×480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Conclusion Webcameras have the potential to improve telecommunication of hearing-impaired individuals. PMID:23359119

  3. Audiovisual Speech Synchrony Measure: Application to Biometrics

    Directory of Open Access Journals (Sweden)

    Gérard Chollet

    2007-01-01

    Full Text Available Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, transformations performed on audio, visual, or joint audiovisual feature spaces, and the actual measure of correspondence between audio and visual speech. Finally, the use of synchrony measure for biometric identity verification based on talking faces is experimented on the BANCA database.

  4. Pitch contour impairment in congenital amusia: New insights from the Self-paced Audio-visual Contour Task (SACT.

    Directory of Open Access Journals (Sweden)

    Xuejing Lu

    Full Text Available Individuals with congenital amusia usually exhibit impairments in melodic contour processing when asked to compare pairs of melodies that may or may not be identical to one another. However, it is unclear whether the impairment observed in contour processing is caused by an impairment of pitch discrimination, or is a consequence of poor pitch memory. To help resolve this ambiguity, we designed a novel Self-paced Audio-visual Contour Task (SACT that evaluates sensitivity to contour while placing minimal burden on memory. In this task, participants control the pace of an auditory contour that is simultaneously accompanied by a visual contour, and they are asked to judge whether the two contours are congruent or incongruent. In Experiment 1, melodic contours varying in pitch were presented with a series of dots that varied in spatial height. Amusics exhibited reduced sensitivity to audio-visual congruency in comparison to control participants. To exclude the possibility that the impairment arises from a general deficit in cross-modal mapping, Experiment 2 examined sensitivity to cross-modal mapping for two other auditory dimensions: timbral brightness and loudness. Amusics and controls were significantly more sensitive to large than small contour changes, and to changes in loudness than changes in timbre. However, there were no group differences in cross-modal mapping, suggesting that individuals with congenital amusia can comprehend spatial representations of acoustic information. Taken together, the findings indicate that pitch contour processing in congenital amusia remains impaired even when pitch memory is relatively unburdened.

  5. Pitch contour impairment in congenital amusia: New insights from the Self-paced Audio-visual Contour Task (SACT).

    Science.gov (United States)

    Lu, Xuejing; Sun, Yanan; Ho, Hao Tam; Thompson, William Forde

    2017-01-01

    Individuals with congenital amusia usually exhibit impairments in melodic contour processing when asked to compare pairs of melodies that may or may not be identical to one another. However, it is unclear whether the impairment observed in contour processing is caused by an impairment of pitch discrimination, or is a consequence of poor pitch memory. To help resolve this ambiguity, we designed a novel Self-paced Audio-visual Contour Task (SACT) that evaluates sensitivity to contour while placing minimal burden on memory. In this task, participants control the pace of an auditory contour that is simultaneously accompanied by a visual contour, and they are asked to judge whether the two contours are congruent or incongruent. In Experiment 1, melodic contours varying in pitch were presented with a series of dots that varied in spatial height. Amusics exhibited reduced sensitivity to audio-visual congruency in comparison to control participants. To exclude the possibility that the impairment arises from a general deficit in cross-modal mapping, Experiment 2 examined sensitivity to cross-modal mapping for two other auditory dimensions: timbral brightness and loudness. Amusics and controls were significantly more sensitive to large than small contour changes, and to changes in loudness than changes in timbre. However, there were no group differences in cross-modal mapping, suggesting that individuals with congenital amusia can comprehend spatial representations of acoustic information. Taken together, the findings indicate that pitch contour processing in congenital amusia remains impaired even when pitch memory is relatively unburdened.

  6. Audio-Visual and Autogenic Relaxation Alter Amplitude of Alpha EEG Band, Causing Improvements in Mental Work Performance in Athletes.

    Science.gov (United States)

    Mikicin, Mirosław; Kowalczyk, Marek

    2015-09-01

    The aim of the present study was to investigate the effect of regular audio-visual relaxation combined with Schultz's autogenic training on: (1) the results of behavioral tests that evaluate work performance during burdensome cognitive tasks (Kraepelin test), (2) changes in classical EEG alpha frequency band, neocortex (frontal, temporal, occipital, parietal), hemisphere (left, right) versus condition (only relaxation 7-12 Hz). Both experimental (EG) and age-and skill-matched control group (CG) consisted of eighteen athletes (ten males and eight females). After 7-month training EG demonstrated changes in the amplitude of mean electrical activity of the EEG alpha bend at rest and an improvement was significantly changing and an improvement in almost all components of Kraepelin test. The same examined variables in CG were unchanged following the period without the intervention. Summing up, combining audio-visual relaxation with autogenic training significantly improves athlete's ability to perform a prolonged mental effort. These changes are accompanied by greater amplitude of waves in alpha band in the state of relax. The results suggest usefulness of relaxation techniques during performance of mentally difficult sports tasks (sports based on speed and stamina, sports games, combat sports) and during relax of athletes.

  7. AN EXPERIMENTAL EVALUATION OF AUDIO-VISUAL METHODS--CHANGING ATTITUDES TOWARD EDUCATION.

    Science.gov (United States)

    LOWELL, EDGAR L.; AND OTHERS

    AUDIOVISUAL PROGRAMS FOR PARENTS OF DEAF CHILDREN WERE DEVELOPED AND EVALUATED. EIGHTEEN SOUND FILMS AND ACCOMPANYING RECORDS PRESENTED INFORMATION ON HEARING, LIPREADING AND SPEECH, AND ATTEMPTED TO CHANGE PARENTAL ATTITUDES TOWARD CHILDREN AND SPOUSES. TWO VERSIONS OF THE FILMS AND RECORDS WERE NARRATED BY (1) "STARS" WHO WERE…

  8. An audio-visual dataset of human-human interactions in stressful situations

    NARCIS (Netherlands)

    Lefter, I.; Burghouts, G.J.; Rothkrantz, L.J.M.

    2014-01-01

    Stressful situations are likely to occur at human operated service desks, as well as at human-computer interfaces used in public domain. Automatic surveillance can help notifying when extra assistance is needed. Human communication is inherently multimodal e.g. speech, gestures, facial expressions.

  9. "Frequent frames" in German child-directed speech: a limited cue to grammatical categories.

    Science.gov (United States)

    Stumper, Barbara; Bannard, Colin; Lieven, Elena; Tomasello, Michael

    2011-08-01

    Mintz (2003) found that in English child-directed speech, frequently occurring frames formed by linking the preceding (A) and succeeding (B) word (A_x_B) could accurately predict the syntactic category of the intervening word (x). This has been successfully extended to French (Chemla, Mintz, Bernal, & Christophe, 2009). In this paper, we show that, as for Dutch (Erkelens, 2009), frequent frames in German do not enable such accurate lexical categorization. This can be explained by the characteristics of German including a less restricted word order compared to English or French and the frequent use of some forms as both determiner and pronoun in colloquial German. Finally, we explore the relationship between the accuracy of frames and their potential utility and find that even some of those frames showing high token-based accuracy are of limited value because they are in fact set phrases with little or no variability in the slot position. Copyright © 2011 Cognitive Science Society, Inc.

  10. Visually induced gains in pitch discrimination: Linking audio-visual processing with auditory abilities.

    Science.gov (United States)

    Møller, Cecilie; Højlund, Andreas; Bærentsen, Klaus B; Hansen, Niels Chr; Skewes, Joshua C; Vuust, Peter

    2018-05-01

    Perception is fundamentally a multisensory experience. The principle of inverse effectiveness (PoIE) states how the multisensory gain is maximal when responses to the unisensory constituents of the stimuli are weak. It is one of the basic principles underlying multisensory processing of spatiotemporally corresponding crossmodal stimuli that are well established at behavioral as well as neural levels. It is not yet clear, however, how modality-specific stimulus features influence discrimination of subtle changes in a crossmodally corresponding feature belonging to another modality. Here, we tested the hypothesis that reliance on visual cues to pitch discrimination follow the PoIE at the interindividual level (i.e., varies with varying levels of auditory-only pitch discrimination abilities). Using an oddball pitch discrimination task, we measured the effect of varying visually perceived vertical position in participants exhibiting a wide range of pitch discrimination abilities (i.e., musicians and nonmusicians). Visual cues significantly enhanced pitch discrimination as measured by the sensitivity index d', and more so in the crossmodally congruent than incongruent condition. The magnitude of gain caused by compatible visual cues was associated with individual pitch discrimination thresholds, as predicted by the PoIE. This was not the case for the magnitude of the congruence effect, which was unrelated to individual pitch discrimination thresholds, indicating that the pitch-height association is robust to variations in auditory skills. Our findings shed light on individual differences in multisensory processing by suggesting that relevant multisensory information that crucially aids some perceivers' performance may be of less importance to others, depending on their unisensory abilities.

  11. Influences of selective adaptation on perception of audiovisual speech

    Science.gov (United States)

    Dias, James W.; Cook, Theresa C.; Rosenblum, Lawrence D.

    2016-01-01

    Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers “heard” the audio-visual stimulus as an integrated “va” percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual “va” percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. PMID:27041781

  12. News video story segmentation method using fusion of audio-visual features

    Science.gov (United States)

    Wen, Jun; Wu, Ling-da; Zeng, Pu; Luan, Xi-dao; Xie, Yu-xiang

    2007-11-01

    News story segmentation is an important aspect for news video analysis. This paper presents a method for news video story segmentation. Different form prior works, which base on visual features transform, the proposed technique uses audio features as baseline and fuses visual features with it to refine the results. At first, it selects silence clips as audio features candidate points, and selects shot boundaries and anchor shots as two kinds of visual features candidate points. Then this paper selects audio feature candidates as cues and develops different fusion method, which effectively using diverse type visual candidates to refine audio candidates, to get story boundaries. Experiment results show that this method has high efficiency and adaptability to different kinds of news video.

  13. Knitting Relational Documentary Networks: The Database Meta-Documentary Filming Revolution as a paradigm of bringing interactive audio-visual archives alive

    NARCIS (Netherlands)

    Wiehl, Anna

    2016-01-01

    abstractOne phenomenon in the emerging field of digital documentary are experiments with rhizomatic interfaces and database-logics to bring audio-visual archives 'alive'. A paradigm hereof is Filming Revolution (2015), an interactive platform which gathers and interlinks films of the uprisings in

  14. Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

    2016-01-01

    Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…

  15. A scheme for racquet sports video analysis with the combination of audio-visual information

    Science.gov (United States)

    Xing, Liyuan; Ye, Qixiang; Zhang, Weigang; Huang, Qingming; Yu, Hua

    2005-07-01

    As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results.

  16. Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

    Directory of Open Access Journals (Sweden)

    Elena V Kushnerenko

    2013-07-01

    Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.

  17. Effects of a brisk walk on blood pressure responses to the Stroop, a speech task and a smoking cue among temporarily abstinent smokers.

    Science.gov (United States)

    Taylor, Adrian; Katomeri, Magdalena

    2006-01-01

    A review and meta-analysis by Hamer et al. (2006) showed that a single session of exercise can attenuate post-exercise blood pressure (BP) responses to stress, but no studies examined the effects among smokers or with brisk walking. Healthy volunteers (n=60), averaging 28 years of age and smoking 15 cigarettes daily, abstained from smoking for 2 h before being randomly assigned to a 15-min brisk semi-self-paced walk or passive control condition. Subject characteristics, typical smoking cue-elicited cravings and BP were assessed at baseline. After each condition, BP was assessed before and after three psycho-social stressors were carried out: (1) computerised Stroop word-colour interference task, (2) speech task and (3) only handling a lit cigarette. A two-way mixed ANCOVA (controlling for baseline) revealed a significant overall interaction effect for time by condition for both systolic blood pressure (SBP) and diastolic blood pressure (DBP). Univariate ANCOVAs (to compare between-groups post-stressor BP, controlling for pre-stressor BP) revealed that exercise attenuated systolic BP and diastolic BP responses to the Stroop and speech tasks and SBP to the lit cigarette equivalent to an attenuated SBP and DBP of up to 3.8 mmHg. Post-exercise attenuation effects were moderated by resting blood pressure and self-reported smoking cue-elicited craving. Effects were strongest among those with higher blood pressure and smokers who reported typically stronger cravings when faced with smoking cues. Blood pressure responses to the lit cigarette were not associated with responses to the Stroop and speech task. A self-paced 15-min walk can reduce smokers' SBP and DBP responses to stress, of a magnitude similar on average to non-smokers.

  18. Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes

    Directory of Open Access Journals (Sweden)

    Annalisa eSetti

    2013-09-01

    Full Text Available Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the ‘McGurk illusion’, in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.

  19. Relationship between age at menarche and exposure to sexual content in audio-visual media and other factors in Islamic junior high school girls

    Directory of Open Access Journals (Sweden)

    Tity Wulandari

    2018-01-01

    Full Text Available Background In recent decades, girls have experienced menarche at earlier ages, which may have negative effects on health. Exposure to audio-visual media and other factors may influence the age at menarche, although past studies have produced inconsistent results. Objective To assess for relationships between the age at menarche and audio-visual media exposure, socio-economic status, nutritional status, physical activity, and psychosocial dysfunction in adolescent girls. Methods This cross-sectional study was conducted from August to October 2015 in students from two integrated Islamic junior high schools in Medan, North Sumatera. There were 216 students who met the inclusion criteria: aged 10-16 years and experienced menarche. They were asked to fill out questionnaires that had been previously validated, regarding their history of exposure to audio-visual media, physical activity, and psychosocial dysfunction. The data were analyzed by Chi-square and Fisher’s exact tests in order to assess for relationships between audio-visual media exposure and other potential factors with the age at menarche. Results Of 261 female students at the two schools, 216 had undergone menarche, with a mean age at menarche of 11.6 (SD 1.13 years. There was no significant relationship between age at menarche and audio-visual media exposure (P=0.68. Also, there were no significant relationships between factors such as socio-economic and psychosocial status with age at menarche (P=0.64 and P=0.28, respectively. However, there were significant relationships between earlier age at menarche and overweight/obese nutritional status (P=0.02 as well as low physical activity (P=0.01. Multivariate logistic regression analysis showed that low physical activity had the strongest influence on early menarche (RP=2.40; 95%CI 0.92 to 6.24. Conclusion Age at menarche is not significantly associated with sexual content of audio-visual media exposure. However, there were significant

  20. Claroscura Representation: An Audio-visual and Theoretical Exploration of the Representation of the Past Through Documentary Filmmaking

    Directory of Open Access Journals (Sweden)

    Gerrit Stollbrock Trujillo

    2017-09-01

    Full Text Available At the nexus between audio-visual production and theoretical research, this article is based on the experience of producing a documentary on the history of a cement plant in Colombia: La Siberia. The tensions between the narratives constructed in the documentary and the immensity of the discarded archives from the plant drive a theoretical quest to respond to its own iconoclast and the post-structuralist critique of history. This brought us to the formulation of the concept of claroscura representation, defined as representation that is transparent about its own limitations. I put this concept to the test through the medium of documentary film, talking specifically about the making of La Siberia, and suggest its relevance in other projects that attempt to represent the past or history through film. I suggest that this theory drives us towards the formulation of a new artistic project. The research process, and the dialogue between theory and practice, is interpreted using the model of abduction proposed by Charles Sanders Peirce.

  1. Perception of co-speech gestures in aphasic patients: a visual exploration study during the observation of dyadic conversations.

    Science.gov (United States)

    Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M

    2015-03-01

    Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Functional Imaging of Audio-Visual Selective Attention in Monkeys and Humans: How do Lapses in Monkey Performance Affect Cross-Species Correspondences?

    Science.gov (United States)

    Rinne, Teemu; Muers, Ross S; Salo, Emma; Slater, Heather; Petkov, Christopher I

    2017-06-01

    The cross-species correspondences and differences in how attention modulates brain responses in humans and animal models are poorly understood. We trained 2 monkeys to perform an audio-visual selective attention task during functional magnetic resonance imaging (fMRI), rewarding them to attend to stimuli in one modality while ignoring those in the other. Monkey fMRI identified regions strongly modulated by auditory or visual attention. Surprisingly, auditory attention-related modulations were much more restricted in monkeys than humans performing the same tasks during fMRI. Further analyses ruled out trivial explanations, suggesting that labile selective-attention performance was associated with inhomogeneous modulations in wide cortical regions in the monkeys. The findings provide initial insights into how audio-visual selective attention modulates the primate brain, identify sources for "lost" attention effects in monkeys, and carry implications for modeling the neurobiology of human cognition with nonhuman animals. © The Author 2017. Published by Oxford University Press.

  3. Television and the Internet: The Role Digital Technologies Play in Adolescents’ Audio-Visual Media Consumption. Young Television Audiences in Catalonia (Spain

    Directory of Open Access Journals (Sweden)

    Meritxell Roca

    2014-03-01

    Full Text Available The aim of this reported study was to investigate adolescents TV consumption habits and perceptions. Although there appears to be no general consensus on how the Internet affects TV consumption by teenagers, and data vary depending on the country, according to our study, Spanish adolescents perceive television as a habit “of the past” and find the computer a device more suited to their recreational and audio-visual consumption needs. The data obtained from eight focus groups of teenagers aged between 12 and 18 and an online survey sent to their parents show that watching TV is an activity usually linked to the home’s communal spaces. On the contrary, online audio-visual consumption (understood as a wider term not limited to just TV shows is perceived by adolescents as a more convenient activity as it adapts to their own schedules and needs.

  4. Pengaruh Model Pembelajaran Kooperatif Tipe Stad Berbantuan Media Audio Visual Terhadap Hasil Belajar IPA Siswa Kelas III SD Negeri 42 Pekanbaru

    OpenAIRE

    Oktarianda, Ranty; Alpusari, Mahmud; Noviana, Eddy

    2017-01-01

    This research is motivated by the teacher who still uses the teaching method with the old method and the difficulty of the students to understand the abstract science learning, thus causing the low value of the students' science. Implementation of STAD cooperative learning method using media audio visual is expected to be influential improving science achivements. This research uses quasi experimental method with nonequivalent control group design. The purpose of this study is to determine th...

  5. IST BENOGO (IST – 2001-39184) Deliverable I-AAU-05-01: Role of sound in VR and Audio Visual Preferences

    DEFF Research Database (Denmark)

    Nordahl, Rolf

    This Periodic Progres Report (PPR) document reports on the studies done in Aalborg University on December 2004 concerning role of sound in VR, audio-visual correlations and attention triggering. The report contains a description and evaluation of the experiments run, together with the analysis...... of the data captured by the head tracker, which provide valuable insights on the role of sound events in VR....

  6. PENGGUNAAN MEDIA AUDIO VISUAL UNTUK MENINGKATKAN HASIL BELAJAR MATERI MERODA PADA SENAM LANTAI KELAS VIII SMP NEGERI 13 SEMARANG TAHUN 2013/2014

    Directory of Open Access Journals (Sweden)

    Sigit Budi Prastyyo

    2015-01-01

    Full Text Available The purpose of this study was to determine the improvement of teaching physical education in schools through the use of audio-visual media aids the learning outcomes gymnastics floor meroda the eighth grade students of SMP Negeri 13 Semarang. In this research, a classroom action research (CAR cycle , the study was conducted in two cycles of action . Methods of data collection using the methods of documentation , observation , and testing . Analysis of the data using descriptive method by way of student learning outcomes after the action . Based on the results obtained by the use of audio-visual media in the learning material meroda floor exercises can improve learning outcomes eighth grade at Junior High School 13 Semarang 2013/2014 . This is evidenced by the acquisition value of the learning outcomes of each cycle has increased . The average value of students in the first cycle the average test score of students reached 70.51 , reaching 84.72 in the second cycle . Classical completeness in the first cycle of 54.84 % and the second cycle was 90.32 % . From the research results obtained it can be concluded that the learning material meroda floor exercises with the use of audio-visual media can improve learning outcomes students of SMP Negeri 13 Semarang .

  7. N1 enhancement in synesthesia during visual and audio-visual perception in semantic cross-modal conflict situations: an ERP study

    Directory of Open Access Journals (Sweden)

    Christopher eSinke

    2014-01-01

    Full Text Available Synesthesia entails a special kind of sensory perception, where stimulation in one sensory modality leads to an internally generated perceptual experience of another, not stimulated sensory modality. This phenomenon can be viewed as an abnormal multisensory integration process as here the synesthetic percept is aberrantly fused with the stimulated modality. Indeed, recent synesthesia research has focused on multimodal processing even outside of the specific synesthesia-inducing context and has revealed changed multimodal integration, thus suggesting perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and inanimated objects in an audio-visual congruent and incongruent manner. Fourteen subjects with auditory-visual and/or grapheme-color synesthesia and 14 control subjects participated in the experiment. During presentation of the stimuli, event-related potentials were recorded from 32 electrodes. The analysis of reaction times and error rates revealed no group differences with best performance for audio-visually congruent stimulation indicating the well-known multimodal facilitation effect. We found an enhanced amplitude of the N1 component over occipital electrode sites for synesthetes compared to controls. The differences occurred irrespective of the experimental condition and therefore suggest a global influence on early sensory processing in synesthetes.

  8. UNDERSTANDING PROSE THROUGH TASK ORIENTED AUDIO-VISUAL ACTIVITY: AN AMERICAN MODERN PROSE COURSE AT THE FACULTY OF LETTERS, PETRA CHRISTIAN UNIVERSITY

    Directory of Open Access Journals (Sweden)

    Sarah Prasasti

    2001-01-01

    Full Text Available The method presented here provides the basis for a course in American prose for EFL students. Understanding and appreciation of American prose is a difficult task for the students because they come into contact with works that are full of cultural baggage and far apart from their own world. The audio visual aid is one of the alternatives to sensitize the students to the topic and the cultural background. Instead of proving the ready-made audio visual aids, teachers can involve students to actively engage in a more task oriented audiovisual project. Here, the teachers encourage their students to create their own audio visual aids using colors, pictures, sound and gestures as a point of initiation for further discussion. The students can use color that has become a strong element of fiction to help them calling up a forceful visual representation. Pictures can also stimulate the students to build their mental image. Sound and silence, which are a part of the fabric of literature, may also help them to increase the emotional impact.

  9. Non-fluent speech following stroke is caused by impaired efference copy.

    Science.gov (United States)

    Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

    2017-09-01

    Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.

  10. Penerapan Model Pembelajaran Treffinger dengan Bantuan Media Audio Visual Untuk Meningkatkan Aktivitas dan Hasil Belajar IPA Terpadu pada Siswa Kelas VII SMP Frater Makassar

    Directory of Open Access Journals (Sweden)

    Nur Indah Sari

    2016-08-01

    Full Text Available Penelitian ini adalah  jenis Penelitian Tindakan Kelas (Classroom Action Research yang bertujuan untuk meningkatkan hasil belajar siswa pada pembelajaran IPA Terpadu melalui penerapan model pembelajaran Treffinger dengan bantuan media audio visual pada materi ekosistem siswa kelas VII SMP FRATER Makassar. Teknik pengumpulan data dilakukan dengan observasi aktivitas belajar siswa dan evaluasi pada setiap akhir siklus. Data yang terkumpul dianalisis dengan menggunakan analisis statistik deskriptif dan dilengkapi dengan tabel frekuensi dan presentase. Dari hasil kegiatan pembelajaran yang telah dilakukan terjadi peningkatan hasil belajar siswa, siklus I sebanyak 14 orang dengan presentase 37,83%, sedangkan pada siklus II sebanyak 32 orang dengan persentase 86,48%. dan terjadi peningkatan aktivitas belajar siswa, Semangat siswa dalam mengikuti pembelajaran IPA Terpadu pada siklus I 50,15% dan  meningkat pada siklus II menjadi 80,05%. Hasil penelitian ini menunjukkan bahwa penerapan model pembelajaran Treffinger dengan bantuan media audio visual dapat meningkatkan hasil belajar IPA Terpadu pada materi ekosistem pada siswa kelas VII A SMP FRATER Makassar.Kata kunci: model pembelajaran treffinger, hasil belajar, ipa terpadu.ABSTRACTThis study is classroom action research study that aims to increase activity and study results of Integrated Science of  student by using Treffinger model with audio visual media on ecosystem material of Class VII Student at SMP Frater Makassar. Data collection used in this study was observation and achievement test in the end of every cycle. Analytical data by using descriptive statistical analysis include the frequency tables and percentages. The results of this study indicate that: Treffinger  model with audio visual media showed a positive tendency from 14 students with 37,83% in cycle I and improve to 32 students with 86,48% in cycle II and showed a positive tendency on student’s activity in study. Student

  11. Audio Visual Center

    Data.gov (United States)

    Federal Laboratory Consortium — The Audiovisual Services Center provides still photographic documentation with laboratory support, video documentation, video editing, video duplication, photo/video...

  12. UPAYA MENINGKATKAN AKTIVITAS DAN HASIL BELAJAR MATERI APRESIASI TERHADAP KEUNIKAN SENI MUSIK DAERAH SETEMPAT DENGAN MENGGUNAKAN MEDIA AUDIO VISUAL PADA SISWA KELAS VII A SMP NEGERI 3 RANDUDONGKAL

    Directory of Open Access Journals (Sweden)

    Rina Muktinurasih

    2014-02-01

    Full Text Available Folk music is an element of simplicity and regionalism. Improving activities toward the appreciation on the work of art, especially folk music, was carried out by identifying the variety of folk songs, according to the personal view of most students. Over the course of the years, most students can only enjoy music. Because it takes an interest in advance so that students can express the music. Music learningneeds a lot of practice, however most of the times teachers are dominating the classroom time allocation meanwhile the students do not have adequate time to practice. The problems addressed in this study are : (1 whether or not the use of Audio Visual media can improvestudents learning activity in folk music appreciation (2 whether or not the use ofAudio Visual media can improve students learning outcomes in folk music appreciation material. The method used in this study was classroom action research with two cycles, each cycle consists of 4 phases: (1 planning (2 implementation (3 observation/ evaluation, (4 reflection. The research results shows that there were improvements both in the students learning activities and outcome from the use of Audio Visual learning media in folk music appreciation material. During the pre cycle there were only 16 out of 34 students passed (47.07%, onthe first cycle there were 20 out of 34 students passed (74.24%, and finally onthe second cycle there were 28out of 34 students passed (82.35%. Therefore it can be concluded that by the end of this second cycle, the indicator of the overall success has achieved the required frequency.

  13. Multisensory and Modality Specific Processing of Visual Speech in Different Regions of the Premotor Cortex

    Directory of Open Access Journals (Sweden)

    Daniel eCallan

    2014-05-01

    Full Text Available Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex has been shown to be active during both observation and execution of action (‘Mirror System’ properties, and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI study, participants identified vowels produced by a speaker in audio-visual (saw the speaker’s articulating face and heard her voice, visual only (only saw the speaker’s articulating face, and audio only (only heard the speaker’s voice conditions with varying audio signal-to-noise ratios in order to determine the regions of the premotor cortex involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the fMRI analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and premotor cortex. The left ventral inferior premotor cortex showed properties of multimodal (audio-visual enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the premotor cortex are involved with mapping unimodal (in this case visual sensory features of the speech signal with

  14. EFEKTIVITAS MODEL PROBLEM BASED LEARNING BERBANTUAN MEDIA AUDIO VISUAL DITINJAU DARI HASIL BELAJAR IPA SISWA KELAS 5 SDN 1 GADU SAMBONG - BLORA SEMESTER 2 TAHUN 2014/2015

    Directory of Open Access Journals (Sweden)

    Andhini Virgiana

    2016-05-01

    Full Text Available Tujuan dari penelitian ini adalah untuk mengetahui perbedaan tingkat hasil belajar antara model problem based learning berbantuan media audio visual dengan model pembelajaran think pair share berbantuan media visual pada pembelajaran IPA siswa kelas 5 SDN 1 Gadu Sambong Kabupaten Blora semester 2 tahun pelajaran 2014/2015. Penelitian ini merupakan penelitian quasi experiment dengan nonequivalent control group design. Subjek penelitian dalam penelitian ini adalah siswa kelas 5 SDN 1 Gadu dan siswa kelas 5 SDN 2 Gagakan. Teknik  pengumpulan data dalam penelitian adalah tes dan observasi. Teknik analisis data yang digunakan adalah statistik deskriptif, statistik parametrik, dan uji t dengan  independent sample t-tes pada taraf signifikansi 5% (α = 0,05. Berdasarkan hasil penelitian dan pembahasan, maka dapat disimpulkan bahwa terdapat perbedaan tingkat efektivitas antara model problem based learning berbantu media audio visual dengan model pembelajaran think pair share berbantu media visual terhadap hasil belajar IPA siswa kelas 5 SDN 1 Gadu Kecamatan Sambong Kabupaten Blora semester 2 tahun 2014/2015. Terbukti hal ini ditunjukkan oleh hasil uji t-test sebesar 3,603 > 1,999 dan signifikansi sebesar 0,001 rata-rata kelas kontrol yaitu 87,0588 > 80,2000.

  15. Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users.

    Science.gov (United States)

    Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha

    2015-03-01

    While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed

  16. Audio-visual synchronization in reading while listening to texts: Effects on visual behavior and verbal learning

    OpenAIRE

    Gerbier , Emilie; Bailly , Gérard; Bosse , Marie-Line

    2018-01-01

    International audience; Reading while listening to texts (RWL) is a promising way to improve the learning benefits provided by a reading experience. In an exploratory study, we investigated the effect of synchronizing the highlighting of words (visual) with their auditory (speech) counterpart during a RWL task. Forty French children from 3rd to 5th grade read short stories in their native language while hearing the story spoken by a narrator. In the non-synchronized (S-) condition the text wa...

  17. What's in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English.

    Science.gov (United States)

    Weisleder, Adriana; Waxman, Sandra R

    2010-11-01

    Recent analyses have revealed that child-directed speech contains distributional regularities that could, in principle, support young children's discovery of distinct grammatical categories (noun, verb, adjective). In particular, a distributional unit known as the frequent frame appears to be especially informative (Mintz, 2003). However, analyses have focused almost exclusively on the distributional information available in English. Because languages differ considerably in how the grammatical forms are marked within utterances, the scarcity of cross-linguistic evidence represents an unfortunate gap. We therefore advance the developmental evidence by analyzing the distributional information available in frequent frames across two languages (Spanish and English), across sentence positions (phrase medial and phrase final), and across grammatical forms (noun, verb, adjective). We selected six parent-child corpora from the CHILDES database (three English; three Spanish), and analyzed the input when children were aged 2 ; 6 or younger. In each language, frequent frames did indeed offer systematic cues to grammatical category assignment. We also identify differences in the accuracy of these frames across languages, sentences positions and grammatical classes.

  18. Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

    Science.gov (United States)

    Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-01-01

    A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production

  19. Cardiac and pulmonary dose reduction for tangentially irradiated breast cancer, utilizing deep inspiration breath-hold with audio-visual guidance, without compromising target coverage

    International Nuclear Information System (INIS)

    Vikstroem, Johan; Hjelstuen, Mari H.B.; Mjaaland, Ingvil; Dybvik, Kjell Ivar

    2011-01-01

    Background and purpose. Cardiac disease and pulmonary complications are documented risk factors in tangential breast irradiation. Respiratory gating radiotherapy provides a possibility to substantially reduce cardiopulmonary doses. This CT planning study quantifies the reduction of radiation doses to the heart and lung, using deep inspiration breath-hold (DIBH). Patients and methods. Seventeen patients with early breast cancer, referred for adjuvant radiotherapy, were included. For each patient two CT scans were acquired; the first during free breathing (FB) and the second during DIBH. The scans were monitored by the Varian RPM respiratory gating system. Audio coaching and visual feedback (audio-visual guidance) were used. The treatment planning of the two CT studies was performed with conformal tangential fields, focusing on good coverage (V95>98%) of the planning target volume (PTV). Dose-volume histograms were calculated and compared. Doses to the heart, left anterior descending (LAD) coronary artery, ipsilateral lung and the contralateral breast were assessed. Results. Compared to FB, the DIBH-plans obtained lower cardiac and pulmonary doses, with equal coverage of PTV. The average mean heart dose was reduced from 3.7 to 1.7 Gy and the number of patients with >5% heart volume receiving 25 Gy or more was reduced from four to one of the 17 patients. With DIBH the heart was completely out of the beam portals for ten patients, with FB this could not be achieved for any of the 17 patients. The average mean dose to the LAD coronary artery was reduced from 18.1 to 6.4 Gy. The average ipsilateral lung volume receiving more than 20 Gy was reduced from 12.2 to 10.0%. Conclusion. Respiratory gating with DIBH, utilizing audio-visual guidance, reduces cardiac and pulmonary doses for tangentially treated left sided breast cancer patients without compromising the target coverage

  20. Audio-Visual Biofeedback Does Not Improve the Reliability of Target Delineation Using Maximum Intensity Projection in 4-Dimensional Computed Tomography Radiation Therapy Planning

    International Nuclear Information System (INIS)

    Lu, Wei; Neuner, Geoffrey A.; George, Rohini; Wang, Zhendong; Sasor, Sarah; Huang, Xuan; Regine, William F.; Feigenberg, Steven J.; D'Souza, Warren D.

    2014-01-01

    Purpose: To investigate whether coaching patients' breathing would improve the match between ITV MIP (internal target volume generated by contouring in the maximum intensity projection scan) and ITV 10 (generated by combining the gross tumor volumes contoured in 10 phases of a 4-dimensional CT [4DCT] scan). Methods and Materials: Eight patients with a thoracic tumor and 5 patients with an abdominal tumor were included in an institutional review board-approved prospective study. Patients underwent 3 4DCT scans with: (1) free breathing (FB); (2) coaching using audio-visual (AV) biofeedback via the Real-Time Position Management system; and (3) coaching via a spirometer system (Active Breathing Coordinator or ABC). One physician contoured all scans to generate the ITV 10 and ITV MIP . The match between ITV MIP and ITV 10 was quantitatively assessed with volume ratio, centroid distance, root mean squared distance, and overlap/Dice coefficient. We investigated whether coaching (AV or ABC) or uniform expansions (1, 2, 3, or 5 mm) of ITV MIP improved the match. Results: Although both AV and ABC coaching techniques improved frequency reproducibility and ABC improved displacement regularity, neither improved the match between ITV MIP and ITV 10 over FB. On average, ITV MIP underestimated ITV 10 by 19%, 19%, and 21%, with centroid distance of 1.9, 2.3, and 1.7 mm and Dice coefficient of 0.87, 0.86, and 0.88 for FB, AV, and ABC, respectively. Separate analyses indicated a better match for lung cancers or tumors not adjacent to high-intensity tissues. Uniform expansions of ITV MIP did not correct for the mismatch between ITV MIP and ITV 10 . Conclusions: In this pilot study, audio-visual biofeedback did not improve the match between ITV MIP and ITV 10 . In general, ITV MIP should be limited to lung cancers, and modification of ITV MIP in each phase of the 4DCT data set is recommended

  1. Cardiac and pulmonary dose reduction for tangentially irradiated breast cancer, utilizing deep inspiration breath-hold with audio-visual guidance, without compromising target coverage

    Energy Technology Data Exchange (ETDEWEB)

    Vikstroem, Johan; Hjelstuen, Mari H.B.; Mjaaland, Ingvil; Dybvik, Kjell Ivar (Dept. of Radiotherapy, Stavanger Univ. Hospital, Stavanger (Norway)), e-mail: vijo@sus.no

    2011-01-15

    Background and purpose. Cardiac disease and pulmonary complications are documented risk factors in tangential breast irradiation. Respiratory gating radiotherapy provides a possibility to substantially reduce cardiopulmonary doses. This CT planning study quantifies the reduction of radiation doses to the heart and lung, using deep inspiration breath-hold (DIBH). Patients and methods. Seventeen patients with early breast cancer, referred for adjuvant radiotherapy, were included. For each patient two CT scans were acquired; the first during free breathing (FB) and the second during DIBH. The scans were monitored by the Varian RPM respiratory gating system. Audio coaching and visual feedback (audio-visual guidance) were used. The treatment planning of the two CT studies was performed with conformal tangential fields, focusing on good coverage (V95>98%) of the planning target volume (PTV). Dose-volume histograms were calculated and compared. Doses to the heart, left anterior descending (LAD) coronary artery, ipsilateral lung and the contralateral breast were assessed. Results. Compared to FB, the DIBH-plans obtained lower cardiac and pulmonary doses, with equal coverage of PTV. The average mean heart dose was reduced from 3.7 to 1.7 Gy and the number of patients with >5% heart volume receiving 25 Gy or more was reduced from four to one of the 17 patients. With DIBH the heart was completely out of the beam portals for ten patients, with FB this could not be achieved for any of the 17 patients. The average mean dose to the LAD coronary artery was reduced from 18.1 to 6.4 Gy. The average ipsilateral lung volume receiving more than 20 Gy was reduced from 12.2 to 10.0%. Conclusion. Respiratory gating with DIBH, utilizing audio-visual guidance, reduces cardiac and pulmonary doses for tangentially treated left sided breast cancer patients without compromising the target coverage

  2. Self-organizing maps for measuring similarity of audiovisual speech percepts

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich

    The goal of this work is to find a way to measure similarity of audiovisual speech percepts. Phoneme-related self-organizing maps (SOM) with a rectangular basis are trained with data material from a (labeled) video film. For the training, a combination of auditory speech features and corresponding....... Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled...... sentences in German with a balanced phoneme repertoire. As a result it can be stated that (i) the SOM can be trained to map auditory and visual features in a topology-preserving way and (ii) they show strain due to the influence of other audio-visual units. The SOM can be used to measure similarity amongst...

  3. Eliciting extra prominence in read-speech tasks: The effects of different text-highlighting methods on acoustic cues to perceived prominence

    DEFF Research Database (Denmark)

    Berger, Stephanie; Niebuhr, Oliver; Fischer, Kerstin

    2018-01-01

    The research initiative Innovating Speech EliCitation Techniques (INSPECT) aims to describe and quantify how recording methods, situations and materials influence speech produc-tion in lab-speech experiments. On this basis, INSPECT aims to develop methods that reliably stimulate specific patterns...... and styles of speech, like expressive or conversational speech or different types emphatic accents. The present study investigates if and how different text highlighting methods (yellow background, bold, capital letter, italics, and underlining) make speakers reinforce the level of perceived prominence...

  4. Talker Variability in Audiovisual Speech Perception

    Directory of Open Access Journals (Sweden)

    Shannon eHeald

    2014-07-01

    Full Text Available A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition. So far, this talker-variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target-word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

  5. Audio-Visual Biofeedback Does Not Improve the Reliability of Target Delineation Using Maximum Intensity Projection in 4-Dimensional Computed Tomography Radiation Therapy Planning

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Wei, E-mail: wlu@umm.edu [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States); Neuner, Geoffrey A.; George, Rohini; Wang, Zhendong; Sasor, Sarah [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States); Huang, Xuan [Research and Development, Care Management Department, Johns Hopkins HealthCare LLC, Glen Burnie, Maryland (United States); Regine, William F.; Feigenberg, Steven J.; D' Souza, Warren D. [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States)

    2014-01-01

    Purpose: To investigate whether coaching patients' breathing would improve the match between ITV{sub MIP} (internal target volume generated by contouring in the maximum intensity projection scan) and ITV{sub 10} (generated by combining the gross tumor volumes contoured in 10 phases of a 4-dimensional CT [4DCT] scan). Methods and Materials: Eight patients with a thoracic tumor and 5 patients with an abdominal tumor were included in an institutional review board-approved prospective study. Patients underwent 3 4DCT scans with: (1) free breathing (FB); (2) coaching using audio-visual (AV) biofeedback via the Real-Time Position Management system; and (3) coaching via a spirometer system (Active Breathing Coordinator or ABC). One physician contoured all scans to generate the ITV{sub 10} and ITV{sub MIP}. The match between ITV{sub MIP} and ITV{sub 10} was quantitatively assessed with volume ratio, centroid distance, root mean squared distance, and overlap/Dice coefficient. We investigated whether coaching (AV or ABC) or uniform expansions (1, 2, 3, or 5 mm) of ITV{sub MIP} improved the match. Results: Although both AV and ABC coaching techniques improved frequency reproducibility and ABC improved displacement regularity, neither improved the match between ITV{sub MIP} and ITV{sub 10} over FB. On average, ITV{sub MIP} underestimated ITV{sub 10} by 19%, 19%, and 21%, with centroid distance of 1.9, 2.3, and 1.7 mm and Dice coefficient of 0.87, 0.86, and 0.88 for FB, AV, and ABC, respectively. Separate analyses indicated a better match for lung cancers or tumors not adjacent to high-intensity tissues. Uniform expansions of ITV{sub MIP} did not correct for the mismatch between ITV{sub MIP} and ITV{sub 10}. Conclusions: In this pilot study, audio-visual biofeedback did not improve the match between ITV{sub MIP} and ITV{sub 10}. In general, ITV{sub MIP} should be limited to lung cancers, and modification of ITV{sub MIP} in each phase of the 4DCT data set is recommended.

  6. Recall and decay of consent information among parents of infants participating in a randomized controlled clinical trial using an audio-visual tool in The Gambia.

    Science.gov (United States)

    Mboizi, Robert B; Afolabi, Muhammed O; Okoye, Michael; Kampmann, Beate; Roca, Anna; Idoko, Olubukola T

    2017-09-02

    Communicating essential research information to low literacy research participants in Africa is highly challenging, since this population is vulnerable to poor comprehension of consent information. Several supportive materials have been developed to aid participant comprehension in these settings. Within the framework of a pneumococcal vaccine trial in The Gambia, we evaluated the recall and decay of consent information during the trial which used an audio-visual tool called 'Speaking Book', to foster comprehension among parents of participating infants. The Speaking Book was developed in the 2 most widely spoken local languages. Four-hundred and 9 parents of trial infants gave consent to participate in this nested study and were included in the baseline assessment of their knowledge about trial participation. An additional assessment was conducted approximately 90 d later, following completion of the clinical trial protocol. All parents received a Speaking Book at the start of the trial. Trial knowledge was already high at the baseline assessment with no differences related to socio-economic status or education. Knowledge of key trial information was retained at the completion of the study follow-up. The Speaking Book (SB) was well received by the study participants. We hypothesize that the SB may have contributed to the retention of information over the trial follow-up. Further studies evaluating the impact of this innovative tool are thus warranted.

  7. The challenge of reducing scientific complexity for different target groups (without losing the essence) - experiences from interdisciplinary audio-visual media production

    Science.gov (United States)

    Hezel, Bernd; Broschkowski, Ephraim; Kropp, Jürgen

    2013-04-01

    The Climate Media Factory originates from an interdisciplinary media lab run by the Film and Television University "Konrad Wolf" Potsdam-Babelsberg (HFF) and the Potsdam Institute for Climate Impact Research (PIK). Climate scientists, authors, producers and media scholars work together to develop media products on climate change and sustainability. We strive towards communicating scientific content via different media platforms reconciling the communication needs of scientists and the audience's need to understand the complexity of topics that are relevant in their everyday life. By presenting four audio-visual examples, that have been designed for very different target groups, we show (i) the interdisciplinary challenges during the production process and the lessons learnt and (ii) possibilities to reach the required degree of simplification without the need for dumbing down the content. "We know enough about climate change" is a short animated film that was produced for the German Agency for International Cooperation (GIZ) for training programs and conferences on adaptation in the target countries including Indonesia, Tunisia and Mexico. "Earthbook" is a short animation produced for "The Year of Science" to raise awareness for the topics of sustainability among digital natives. "What is Climate Engineering?". Produced for the Institute for Advanced Sustainability Studies (IASS) the film is meant for an informed and interested public. "Wimmelwelt Energie!" is a prototype of an iPad application for children from 4-6 years of age to help them learn about different forms of energy and related greenhouse gas emissions.

  8. Gestión documental de la información audiovisual deportiva en las televisiones generalistas Documentary management of the sport audio-visual information in the generalist televisions

    Directory of Open Access Journals (Sweden)

    Jorge Caldera Serrano

    2005-01-01

    Full Text Available Se analiza la gestión de la información audiovisual deportiva en el marco de los Sistemas de Información Documental de las cadenas estatales, zonales y locales. Para ello se realiza un realiza un recorrido por la cadena documental que realiza la información audiovisual deportiva con el fin de ir analizando cada uno de los parámetros, mostrando así una serie de recomendaciones y normativas para la confección del registro audiovisual deportivo. Evidentemente la documentación deportiva audiovisual no se diferencia en exceso del análisis de otros tipos documentales televisivos por lo que se lleva a cabo una profundización yampliación de su gestión y difusión, mostrando el flujo informacional dentro del Sistema.The management of the sport audio-visual documentation of the Information Systems of the state, zonal and local chains is analyzed within the framework. For it it is made makes a route by the documentary chain that makes the sport audio-visual information with the purpose of being analyzing each one of the parameters, showing therefore a series of recommendations and norms for the preparation of the sport audio-visual registry. Evidently the audio-visual sport documentation difference in excess of the analysis of other televising documentary types reason why is not carried out a deepening and extension of its management and diffusion, showing the informational flow within the System.

  9. PEMBELAJARAN LAY UP SHOOT MENGGUNAKAN MEDIA AUDIO VISUAL BASIC LAY UP SHOOT UNTUK MENINGKATKAN HASILBELAJAR LAY UP SHOOT PADA SISWA KELAS VIIIA SMP KANISIUS PATI TAHUN 2013/2014

    Directory of Open Access Journals (Sweden)

    Frendy Nurochwan Febryanto

    2015-01-01

    Full Text Available The purpose of this study was to determine the learning lay up shoot using basic audiovisual media shoot lay ups can improve learning outcomes shoot lay ups in class VIIIA Starch Canisius junior year 2013/2014 . This study uses Classroom Action Research ( CAR. The technique of collecting data through observation and assessment of learning outcomes shoot basketball lay up. Data analysis techniques used in this research is descriptive . At the end of the first cycle activity of teachers in teaching basic techniques lay up shoot using audio-visual media reaches 76.19%, whereas at the end of the first cycle of student activity during the learning process lay up shoot using audio-visualmediareaches78.57%. At the end of the second cycle of activity of teachers in teaching basic techniques lay up shoot using audio-visual media reaches 85.71%, whereas at the end of the second cycle of activity of students during the learning process lay up shoot using audio-visual media reaches 92.86%. Based on the results of the study it can be concluded that learning the lay-up shoot using basic audiovisual media shoot lay ups can improve student learning outcomes at Canisius junior class VIIIA Pati year 2013/2014.

  10. Effectiveness of respiratory-gated radiotherapy with audio-visual biofeedback for synchrotron-based scanned heavy-ion beam delivery

    Science.gov (United States)

    He, Pengbo; Li, Qiang; Zhao, Ting; Liu, Xinguo; Dai, Zhongying; Ma, Yuanyuan

    2016-12-01

    A synchrotron-based heavy-ion accelerator operates in pulse mode at a low repetition rate that is comparable to a patient’s breathing rate. To overcome inefficiencies and interplay effects between the residual motion of the target and the scanned heavy-ion beam delivery process for conventional free breathing (FB)-based gating therapy, a novel respiratory guidance method was developed to help patients synchronize their breathing patterns with the synchrotron excitation patterns by performing short breath holds with the aid of personalized audio-visual biofeedback (BFB) system. The purpose of this study was to evaluate the treatment precision, efficiency and reproducibility of the respiratory guidance method in scanned heavy-ion beam delivery mode. Using 96 breathing traces from eight healthy volunteers who were asked to breathe freely and guided to perform short breath holds with the aid of BFB, a series of dedicated four-dimensional dose calculations (4DDC) were performed on a geometric model which was developed assuming a linear relationship between external surrogate and internal tumor motions. The outcome of the 4DDCs was quantified in terms of the treatment time, dose-volume histograms (DVH) and dose homogeneity index. Our results show that with the respiratory guidance method the treatment efficiency increased by a factor of 2.23-3.94 compared with FB gating, depending on the duty cycle settings. The magnitude of dose inhomogeneity for the respiratory guidance methods was 7.5 times less than that of the non-gated irradiation, and good reproducibility of breathing guidance among different fractions was achieved. Thus, our study indicates that the respiratory guidance method not only improved the overall treatment efficiency of respiratory-gated scanned heavy-ion beam delivery, but also had the advantages of lower dose uncertainty and better reproducibility among fractions.

  11. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    Directory of Open Access Journals (Sweden)

    Akitoshi Ogawa

    Full Text Available The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion. Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround, 3D with monaural sound (3D-Mono, 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG. The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life

  12. La regulación audiovisual: argumentos a favor y en contra The audio-visual regulation: the arguments for and against

    Directory of Open Access Journals (Sweden)

    Jordi Sopena Palomar

    2008-03-01

    Full Text Available El artículo analiza la efectividad de la regulación audiovisual y valora los diversos argumentos a favor y en contra de la existencia de consejos reguladores a nivel estatal. El debate sobre la necesidad de un organismo de este calado en España todavía persiste. La mayoría de los países comunitarios se han dotado de consejos competentes en esta materia, como es el caso del OFCOM en el Reino Unido o el CSA en Francia. En España, la regulación audiovisual se limita a organismos de alcance autonómico, como son el Consejo Audiovisual de Navarra, el de Andalucía y el Consell de l’Audiovisual de Catalunya (CAC, cuyo modelo también es abordado en este artículo. The article analyzes the effectiveness of the audio-visual regulation and assesses the different arguments for and against the existence of the broadcasting authorities at the state level. The debate of the necessity of a Spanish organism of regulation is still active. Most of the European countries have created some competent authorities, like the OFCOM in United Kingdom and the CSA in France. In Spain, the broadcasting regulation is developed by regional organisms, like the Consejo Audiovisual de Navarra, the Consejo Audiovisual de Andalucía and the Consell de l’Audiovisual de Catalunya (CAC, whose case is also studied in this article.

  13. A randomized controlled pilot study feasibility of a tablet-based guided audio-visual relaxation intervention for reducing stress and pain in adults with sickle cell disease.

    Science.gov (United States)

    Ezenwa, Miriam O; Yao, Yingwei; Engeland, Christopher G; Molokie, Robert E; Wang, Zaijie Jim; Suarez, Marie L; Wilkie, Diana J

    2016-06-01

    To test feasibility of a guided audio-visual relaxation intervention protocol for reducing stress and pain in adults with sickle cell disease. Sickle cell pain is inadequately controlled using opioids, necessitating further intervention such as guided relaxation to reduce stress and pain. Attention-control, randomized clinical feasibility pilot study with repeated measures. Randomized to guided relaxation or control groups, all patients recruited between 2013-2014 during clinical visits, completed stress and pain measures via a Galaxy Internet-enabled Android tablet at the Baseline visit (pre/post intervention), 2-week posttest visit and also daily at home between the two visits. Experimental group patients were asked to use a guided relaxation intervention at the Baseline visit and at least once daily for 2 weeks. Control group patients engaged in a recorded sickle cell discussion at the Baseline visit. Data were analysed using linear regression with bootstrapping. At baseline, 27/28 of consented patients completed the study protocol. Group comparison showed that guided relaxation significantly reduced current stress and pain. At the 2-week posttest, 24/27 of patients completed the study, all of whom reported liking the study. Patients completed tablet-based measures on 71% of study days (69% in control group, 72% in experiment group). At the 2-week posttest, the experimental group had significantly lower composite pain index scores, but the two groups did not differ significantly on stress intensity. This study protocol appears feasible. The tablet-based guided relaxation intervention shows promise for reducing sickle cell pain and warrants a larger efficacy trial. The ClinicalTrials.gov Identifier is: NCT02501447. © 2016 John Wiley & Sons Ltd.

  14. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    Science.gov (United States)

    Ogawa, Akitoshi; Bordier, Cecile; Macaluso, Emiliano

    2013-01-01

    The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion). Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround), 3D with monaural sound (3D-Mono), 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG). The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life-like stimuli.

  15. Is Birdsong More Like Speech or Music?

    Science.gov (United States)

    Shannon, Robert V

    2016-04-01

    Music and speech share many acoustic cues but not all are equally important. For example, harmonic pitch is essential for music but not for speech. When birds communicate is their song more like speech or music? A new study contrasting pitch and spectral patterns shows that birds perceive their song more like humans perceive speech. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

    Directory of Open Access Journals (Sweden)

    A. A. Karpov

    2014-09-01

    Full Text Available We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating all the components for generation of audio, visual and signed speech. The proposed system performs automatic translation of input textual information into speech (audio information and gestures (video information, information fusion and its output in the form of multimedia information. A user can input any grammatically correct text in Russian or Czech languages to the system; it is analyzed by the text processor to detect sentences, words and characters. Then this textual information is converted into symbols of the sign language notation. We apply international «Hamburg Notation System» - HamNoSys, which describes the main differential features of each manual sign: hand shape, hand orientation, place and type of movement. On their basis the 3D signing avatar displays the elements of the sign language. The virtual 3D model of human’s head and upper body has been created using VRML virtual reality modeling language, and it is controlled by the software based on OpenGL graphical library. The developed multimodal synthesis system is a universal one since it is oriented for both regular users and disabled people (in particular, for the hard-of-hearing and visually impaired, and it serves for multimedia output (by audio and visual modalities of input textual information.

  17. Robust speech dereverberation with a neural network-based post-filter that exploits multi-conditional training of binaural cues

    DEFF Research Database (Denmark)

    May, Tobias

    2018-01-01

    -frequency (T-F) units. A multi-conditional training (MCT) procedure was used to simulate the uncertainties of short-term binaural cues in response to room reverberation by mixing the direct part of head related impulse responses (HRIRs) with diffuse noise. Despite being trained with only anechoic HRIRs...

  18. Respiratory motion management using audio-visual biofeedback for respiratory-gated radiotherapy of synchrotron-based pulsed heavy-ion beam delivery

    International Nuclear Information System (INIS)

    He, Pengbo; Ma, Yuanyuan; Huang, Qiyan; Yan, Yuanlin; Li, Qiang; Liu, Xinguo; Dai, Zhongying; Zhao, Ting; Fu, Tingyan; Shen, Guosheng

    2014-01-01

    Purpose: To efficiently deliver respiratory-gated radiation during synchrotron-based pulsed heavy-ion radiotherapy, a novel respiratory guidance method combining a personalized audio-visual biofeedback (BFB) system, breath hold (BH), and synchrotron-based gating was designed to help patients synchronize their respiratory patterns with synchrotron pulses and to overcome typical limitations such as low efficiency, residual motion, and discomfort. Methods: In-house software was developed to acquire body surface marker positions and display BFB, gating signals, and real-time beam profiles on a LED screen. Patients were prompted to perform short BHs or short deep breath holds (SDBH) with the aid of BFB following a personalized standard BH/SDBH (stBH/stSDBH) guiding curve or their own representative BH/SDBH (reBH/reSDBH) guiding curve. A practical simulation was performed for a group of 15 volunteers to evaluate the feasibility and effectiveness of this method. Effective dose rates (EDRs), mean absolute errors between the guiding curves and the measured curves, and mean absolute deviations of the measured curves were obtained within 10%–50% duty cycles (DCs) that were synchronized with the synchrotron’s flat-top phase. Results: All maneuvers for an individual volunteer took approximately half an hour, and no one experienced discomfort during the maneuvers. Using the respiratory guidance methods, the magnitude of residual motion was almost ten times less than during nongated irradiation, and increases in the average effective dose rate by factors of 2.39–4.65, 2.39–4.59, 1.73–3.50, and 1.73–3.55 for the stBH, reBH, stSDBH, and reSDBH guiding maneuvers, respectively, were observed in contrast with conventional free breathing-based gated irradiation, depending on the respiratory-gated duty cycle settings. Conclusions: The proposed respiratory guidance method with personalized BFB was confirmed to be feasible in a group of volunteers. Increased effective dose

  19. Respiratory motion management using audio-visual biofeedback for respiratory-gated radiotherapy of synchrotron-based pulsed heavy-ion beam delivery

    Energy Technology Data Exchange (ETDEWEB)

    He, Pengbo; Ma, Yuanyuan; Huang, Qiyan; Yan, Yuanlin [Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000 (China); Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou 730000 (China); School of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049 (China); Li, Qiang, E-mail: liqiang@impcas.ac.cn; Liu, Xinguo; Dai, Zhongying; Zhao, Ting; Fu, Tingyan; Shen, Guosheng [Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000 (China); Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou 730000 (China)

    2014-11-01

    Purpose: To efficiently deliver respiratory-gated radiation during synchrotron-based pulsed heavy-ion radiotherapy, a novel respiratory guidance method combining a personalized audio-visual biofeedback (BFB) system, breath hold (BH), and synchrotron-based gating was designed to help patients synchronize their respiratory patterns with synchrotron pulses and to overcome typical limitations such as low efficiency, residual motion, and discomfort. Methods: In-house software was developed to acquire body surface marker positions and display BFB, gating signals, and real-time beam profiles on a LED screen. Patients were prompted to perform short BHs or short deep breath holds (SDBH) with the aid of BFB following a personalized standard BH/SDBH (stBH/stSDBH) guiding curve or their own representative BH/SDBH (reBH/reSDBH) guiding curve. A practical simulation was performed for a group of 15 volunteers to evaluate the feasibility and effectiveness of this method. Effective dose rates (EDRs), mean absolute errors between the guiding curves and the measured curves, and mean absolute deviations of the measured curves were obtained within 10%–50% duty cycles (DCs) that were synchronized with the synchrotron’s flat-top phase. Results: All maneuvers for an individual volunteer took approximately half an hour, and no one experienced discomfort during the maneuvers. Using the respiratory guidance methods, the magnitude of residual motion was almost ten times less than during nongated irradiation, and increases in the average effective dose rate by factors of 2.39–4.65, 2.39–4.59, 1.73–3.50, and 1.73–3.55 for the stBH, reBH, stSDBH, and reSDBH guiding maneuvers, respectively, were observed in contrast with conventional free breathing-based gated irradiation, depending on the respiratory-gated duty cycle settings. Conclusions: The proposed respiratory guidance method with personalized BFB was confirmed to be feasible in a group of volunteers. Increased effective dose

  20. The Galker test of speech reception in noise

    DEFF Research Database (Denmark)

    Lauritsen, Maj-Britt Glenn; Söderström, Margareta; Kreiner, Svend

    2016-01-01

    PURPOSE: We tested "the Galker test", a speech reception in noise test developed for primary care for Danish preschool children, to explore if the children's ability to hear and understand speech was associated with gender, age, middle ear status, and the level of background noise. METHODS......: The Galker test is a 35-item audio-visual, computerized word discrimination test in background noise. Included were 370 normally developed children attending day care center. The children were examined with the Galker test, tympanometry, audiometry, and the Reynell test of verbal comprehension. Parents...... and daycare teachers completed questionnaires on the children's ability to hear and understand speech. As most of the variables were not assessed using interval scales, non-parametric statistics (Goodman-Kruskal's gamma) were used for analyzing associations with the Galker test score. For comparisons...

  1. Using ILD or ITD Cues for Sound Source Localization and Speech Understanding in a Complex Listening Environment by Listeners with Bilateral and with Hearing-Preservation Cochlear Implants

    Science.gov (United States)

    Loiselle, Louise H.; Dorman, Michael F.; Yost, William A.; Cook, Sarah J.; Gifford, Rene H.

    2016-01-01

    Purpose: To assess the role of interaural time differences and interaural level differences in (a) sound-source localization, and (b) speech understanding in a cocktail party listening environment for listeners with bilateral cochlear implants (CIs) and for listeners with hearing-preservation CIs. Methods: Eleven bilateral listeners with MED-EL…

  2. Speech misperception: speaking and seeing interfere differently with hearing.

    Directory of Open Access Journals (Sweden)

    Takemi Mochida

    Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.

  3. Strategies for Characterizing the Sensory Environment: Objective and Subjective Evaluation Methods using the VisiSonic Real Space 64/5 Audio-Visual Panoramic Camera

    Science.gov (United States)

    2017-11-01

    This link between objects, action, and perception engages higher-level cognitive /semantic networks that have been difficult to objectively...sound (excluding music and speech) that they heard, the actions and objects involved, and their location at the time of the entry. Once completed

  4. Parametric Representation of the Speaker's Lips for Multimodal Sign Language and Speech Recognition

    Science.gov (United States)

    Ryumin, D.; Karpov, A. A.

    2017-05-01

    In this article, we propose a new method for parametric representation of human's lips region. The functional diagram of the method is described and implementation details with the explanation of its key stages and features are given. The results of automatic detection of the regions of interest are illustrated. A speed of the method work using several computers with different performances is reported. This universal method allows applying parametrical representation of the speaker's lipsfor the tasks of biometrics, computer vision, machine learning, and automatic recognition of face, elements of sign languages, and audio-visual speech, including lip-reading.

  5. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

    Directory of Open Access Journals (Sweden)

    Patterson Eric K

    2002-01-01

    Full Text Available Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are

  6. Encoding Specificity and Nonverbal Cue Context: An Expansion of Episodic Memory Research.

    Science.gov (United States)

    Woodall, W. Gill; Folger, Joseph P.

    1981-01-01

    Reports two studies demonstrating the ability of nonverbal contextual cues to act as retrieval mechanisms for co-occurring language. Suggests that visual contextual cues, such as speech primacy and motor primacy gestures, can access linguistic target information. Motor primacy cues are shown to act as stronger retrieval cues. (JMF)

  7. Estimating the relative weights of visual and auditory tau versus heuristic-based cues for time-to-contact judgments in realistic, familiar scenes by older and younger adults.

    Science.gov (United States)

    Keshavarz, Behrang; Campos, Jennifer L; DeLucia, Patricia R; Oberfeld, Daniel

    2017-04-01

    Estimating time to contact (TTC) involves multiple sensory systems, including vision and audition. Previous findings suggested that the ratio of an object's instantaneous optical size/sound intensity to its instantaneous rate of change in optical size/sound intensity (τ) drives TTC judgments. Other evidence has shown that heuristic-based cues are used, including final optical size or final sound pressure level. Most previous studies have used decontextualized and unfamiliar stimuli (e.g., geometric shapes on a blank background). Here we evaluated TTC estimates by using a traffic scene with an approaching vehicle to evaluate the weights of visual and auditory TTC cues under more realistic conditions. Younger (18-39 years) and older (65+ years) participants made TTC estimates in three sensory conditions: visual-only, auditory-only, and audio-visual. Stimuli were presented within an immersive virtual-reality environment, and cue weights were calculated for both visual cues (e.g., visual τ, final optical size) and auditory cues (e.g., auditory τ, final sound pressure level). The results demonstrated the use of visual τ as well as heuristic cues in the visual-only condition. TTC estimates in the auditory-only condition, however, were primarily based on an auditory heuristic cue (final sound pressure level), rather than on auditory τ. In the audio-visual condition, the visual cues dominated overall, with the highest weight being assigned to visual τ by younger adults, and a more equal weighting of visual τ and heuristic cues in older adults. Overall, better characterizing the effects of combined sensory inputs, stimulus characteristics, and age on the cues used to estimate TTC will provide important insights into how these factors may affect everyday behavior.

  8. Congruent and Incongruent Cues in Highly Familiar Audiovisual Action Sequences: An ERP Study

    Directory of Open Access Journals (Sweden)

    SM Wuerger

    2012-07-01

    Full Text Available In a previous fMRI study we found significant differences in BOLD responses for congruent and incongruent semantic audio-visual action sequences (whole-body actions and speech actions in bilateral pSTS, left SMA, left IFG, and IPL (Meyer, Greenlee, & Wuerger, JOCN, 2011. Here, we present results from a 128-channel ERP study that examined the time-course of these interactions using a one-back task. ERPs in response to congruent and incongruent audio-visual actions were compared to identify regions and latencies of differences. Responses to congruent and incongruent stimuli differed between 240–280 ms, 340–420 ms, and 460–660 ms after stimulus onset. A dipole analysis revealed that the difference around 250 ms can be partly explained by a modulation of sources in the vicinity of the superior temporal area, while the responses after 400 ms are consistent with sources in inferior frontal areas. Our results are in line with a model that postulates early recognition of congruent audiovisual actions in the pSTS, perhaps as a sensory memory buffer, and a later role of the IFG, perhaps in a generative capacity, in reconciling incongruent signals.

  9. Word segmentation with universal prosodic cues.

    Science.gov (United States)

    Endress, Ansgar D; Hauser, Marc D

    2010-09-01

    When listening to speech from one's native language, words seem to be well separated from one another, like beads on a string. When listening to a foreign language, in contrast, words seem almost impossible to extract, as if there was only one bead on the same string. This contrast reveals that there are language-specific cues to segmentation. The puzzle, however, is that infants must be endowed with a language-independent mechanism for segmentation, as they ultimately solve the segmentation problem for any native language. Here, we approach the acquisition problem by asking whether there are language-independent cues to segmentation that might be available to even adult learners who have already acquired a native language. We show that adult learners recognize words in connected speech when only prosodic cues to word-boundaries are given from languages unfamiliar to the participants. In both artificial and natural speech, adult English speakers, with no prior exposure to the test languages, readily recognized words in natural languages with critically different prosodic patterns, including French, Turkish and Hungarian. We suggest that, even though languages differ in their sound structures, they carry universal prosodic characteristics. Further, these language-invariant prosodic cues provide a universally accessible mechanism for finding words in connected speech. These cues may enable infants to start acquiring words in any language even before they are fine-tuned to the sound structure of their native language. Copyright © 2010. Published by Elsevier Inc.

  10. Promoting smoke-free homes: a novel behavioral intervention using real-time audio-visual feedback on airborne particle levels.

    Directory of Open Access Journals (Sweden)

    Neil E Klepeis

    Full Text Available Interventions are needed to protect the health of children who live with smokers. We pilot-tested a real-time intervention for promoting behavior change in homes that reduces second hand tobacco smoke (SHS levels. The intervention uses a monitor and feedback system to provide immediate auditory and visual signals triggered at defined thresholds of fine particle concentration. Dynamic graphs of real-time particle levels are also shown on a computer screen. We experimentally evaluated the system, field-tested it in homes with smokers, and conducted focus groups to obtain general opinions. Laboratory tests of the monitor demonstrated SHS sensitivity, stability, precision equivalent to at least 1 µg/m(3, and low noise. A linear relationship (R(2 = 0.98 was observed between the monitor and average SHS mass concentrations up to 150 µg/m(3. Focus groups and interviews with intervention participants showed in-home use to be acceptable and feasible. The intervention was evaluated in 3 homes with combined baseline and intervention periods lasting 9 to 15 full days. Two families modified their behavior by opening windows or doors, smoking outdoors, or smoking less. We observed evidence of lower SHS levels in these homes. The remaining household voiced reluctance to changing their smoking activity and did not exhibit lower SHS levels in main smoking areas or clear behavior change; however, family members expressed receptivity to smoking outdoors. This study established the feasibility of the real-time intervention, laying the groundwork for controlled trials with larger sample sizes. Visual and auditory cues may prompt family members to take immediate action to reduce SHS levels. Dynamic graphs of SHS levels may help families make decisions about specific mitigation approaches.

  11. Automatic discrimination between laughter and speech

    NARCIS (Netherlands)

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the

  12. Speech Problems

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech Problems KidsHealth / For Teens / Speech Problems What's in ... a person's ability to speak clearly. Some Common Speech and Language Disorders Stuttering is a problem that ...

  13. Attentional reorienting triggers spatial asymmetries in a search task with cross-modal spatial cueing.

    Directory of Open Access Journals (Sweden)

    Rebecca E Paladini

    Full Text Available Cross-modal spatial cueing can affect performance in a visual search task. For example, search performance improves if a visual target and an auditory cue originate from the same spatial location, and it deteriorates if they originate from different locations. Moreover, it has recently been postulated that multisensory settings, i.e., experimental settings, in which critical stimuli are concurrently presented in different sensory modalities (e.g., visual and auditory, may trigger asymmetries in visuospatial attention. Thereby, a facilitation has been observed for visual stimuli presented in the right compared to the left visual space. However, it remains unclear whether auditory cueing of attention differentially affects search performance in the left and the right hemifields in audio-visual search tasks. The present study investigated whether spatial asymmetries would occur in a search task with cross-modal spatial cueing. Participants completed a visual search task that contained no auditory cues (i.e., unimodal visual condition, spatially congruent, spatially incongruent, and spatially non-informative auditory cues. To further assess participants' accuracy in localising the auditory cues, a unimodal auditory spatial localisation task was also administered. The results demonstrated no left/right asymmetries in the unimodal visual search condition. Both an additional incongruent, as well as a spatially non-informative, auditory cue resulted in lateral asymmetries. Thereby, search times were increased for targets presented in the left compared to the right hemifield. No such spatial asymmetry was observed in the congruent condition. However, participants' performance in the congruent condition was modulated by their tone localisation accuracy. The findings of the present study demonstrate that spatial asymmetries in multisensory processing depend on the validity of the cross-modal cues, and occur under specific attentional conditions, i.e., when

  14. Turn-taking cue delays in human-robot communication

    NARCIS (Netherlands)

    Cuijpers, R. H.; Van Den Goor, V. J.P.

    2017-01-01

    Fluent communication between a human and a robot relies on the use of effective turn-taking cues. In human speech staying silent after a sequence of utterances is usually accompanied by an explicit turnyielding cue to signal the end of a turn. Here we study the effect of the timing of four

  15. Temporal visual cues aid speech recognition

    DEFF Research Database (Denmark)

    Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue

    2006-01-01

    of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p......BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize...... that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features...

  16. Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

    Directory of Open Access Journals (Sweden)

    W. H. Adams

    2003-02-01

    Full Text Available We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM, hidden Markov models (HMM, and support vector machines (SVM. Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.

  17. Real-time speech-driven animation of expressive talking faces

    Science.gov (United States)

    Liu, Jia; You, Mingyu; Chen, Chun; Song, Mingli

    2011-05-01

    In this paper, we present a real-time facial animation system in which speech drives mouth movements and facial expressions synchronously. Considering five basic emotions, a hierarchical structure with an upper layer of emotion classification is established. Based on the recognized emotion label, the under-layer classification at sub-phonemic level has been modelled on the relationship between acoustic features of frames and audio labels in phonemes. Using certain constraint, the predicted emotion labels of speech are adjusted to gain the facial expression labels which are combined with sub-phonemic labels. The combinations are mapped into facial action units (FAUs), and audio-visual synchronized animation with mouth movements and facial expressions is generated by morphing between FAUs. The experimental results demonstrate that the two-layer structure succeeds in both emotion and sub-phonemic classifications, and the synthesized facial sequences reach a comparative convincing quality.

  18. A speech reception in noise test for preschool children (the Galker-test)

    DEFF Research Database (Denmark)

    Lauritsen, Maj-Britt Glenn; Kreiner, Svend; Söderström, Margareta

    2015-01-01

    Purpose: This study evaluates initial validity and reliability of the “Galker test of speech reception in noise” developed for Danish preschool children suspected to have problems with hearing or understanding speech against strict psychometric standards and assesses acceptance by the children....... Methods:The Galker test is an audio-visual, computerised, word discrimination test in background noise, originally comprised of 50 word pairs. Three hundred and eighty eight children attending ordinary day care centres and aged 3–5 years were included. With multiple regression and the Rasch item response...... model it was examined whether the total score of the Galker test validly reflected item responses across subgroups defined by sex, age, bilingualism, tympanometry, audiometry and verbal comprehension. Results: A total of 370 children (95%) accepted testing and 339 (87%) completed all 50 items...

  19. Perception and the temporal properties of speech

    Science.gov (United States)

    Gordon, Peter C.

    1991-11-01

    Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.

  20. Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

    Science.gov (United States)

    Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

    2015-07-01

    It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line

  1. Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

    Directory of Open Access Journals (Sweden)

    Vincent Aubanel

    2016-08-01

    Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

  2. Speech Compression

    Directory of Open Access Journals (Sweden)

    Jerry D. Gibson

    2016-06-01

    Full Text Available Speech compression is a key technology underlying digital cellular communications, VoIP, voicemail, and voice response systems. We trace the evolution of speech coding based on the linear prediction model, highlight the key milestones in speech coding, and outline the structures of the most important speech coding standards. Current challenges, future research directions, fundamental limits on performance, and the critical open problem of speech coding for emergency first responders are all discussed.

  3. Audio-Visual: Disembodied Voices in Theory

    OpenAIRE

    Le Fèvre-Berthelot, Anaïs

    2013-01-01

    After a survey of the major critical trends since the generalization of synchronized film sound, this bibliographical essay sets out to delineate the way film sound studies have developed around issues of taxonomy, meaning, and reception. Focusing on the treatment of the disembodied voice by various theorists, three trends can be identified: borrowing from semiology and narratology, an essentially descriptive approach first emerges that creates a new vocabulary to talk about sound and analyze...

  4. Audio-Visual Classification of Sports Types

    DEFF Research Database (Denmark)

    Gade, Rikke; Abou-Zleikha, Mohamed; Christensen, Mads Græsbøll

    2015-01-01

    In this work we propose a method for classification of sports types from combined audio and visual features ex- tracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modali...

  5. Treating speech subsystems in childhood apraxia of speech with tactual input: the PROMPT approach.

    Science.gov (United States)

    Dale, Philip S; Hayden, Deborah A

    2013-11-01

    Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT; Hayden, 2004; Hayden, Eigen, Walker, & Olsen, 2010)-a treatment approach for the improvement of speech sound disorders in children-uses tactile-kinesthetic- proprioceptive (TKP) cues to support and shape movements of the oral articulators. No research to date has systematically examined the efficacy of PROMPT for children with childhood apraxia of speech (CAS). Four children (ages 3;6 [years;months] to 4;8), all meeting the American Speech-Language-Hearing Association (2007) criteria for CAS, were treated using PROMPT. All children received 8 weeks of 2 × per week treatment, including at least 4 weeks of full PROMPT treatment that included TKP cues. During the first 4 weeks, 2 of the 4 children received treatment that included all PROMPT components except TKP cues. This design permitted both between-subjects and within-subjects comparisons to evaluate the effect of TKP cues. Gains in treatment were measured by standardized tests and by criterion-referenced measures based on the production of untreated probe words, reflecting change in speech movements and auditory perceptual accuracy. All 4 children made significant gains during treatment, but measures of motor speech control and untreated word probes provided evidence for more gain when TKP cues were included. PROMPT as a whole appears to be effective for treating children with CAS, and the inclusion of TKP cues appears to facilitate greater effect.

  6. Tuning Neural Phase Entrainment to Speech.

    Science.gov (United States)

    Falk, Simone; Lanzilotti, Cosima; Schön, Daniele

    2017-08-01

    Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.

  7. Speech Matters

    DEFF Research Database (Denmark)

    Hasse Jørgensen, Stina

    2011-01-01

    About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011.......About Speech Matters - Katarina Gregos, the Greek curator's exhibition at the Danish Pavillion, the Venice Biannual 2011....

  8. Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex.

    Science.gov (United States)

    Rhone, Ariane E; Nourski, Kirill V; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A; McMurray, Bob

    In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.

  9. Speech-to-Speech Relay Service

    Science.gov (United States)

    Consumer Guide Speech to Speech Relay Service Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that allows persons with hearing and speech disabilities ...

  10. Prediction and constraint in audiovisual speech perception.

    Science.gov (United States)

    Peelle, Jonathan E; Sommers, Mitchell S

    2015-07-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration

  11. Prediction and constraint in audiovisual speech perception

    Science.gov (United States)

    Peelle, Jonathan E.; Sommers, Mitchell S.

    2015-01-01

    During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported

  12. Apraxia of Speech

    Science.gov (United States)

    ... Health Info » Voice, Speech, and Language Apraxia of Speech On this page: What is apraxia of speech? ... about apraxia of speech? What is apraxia of speech? Apraxia of speech (AOS)—also known as acquired ...

  13. Introductory speeches

    International Nuclear Information System (INIS)

    2001-01-01

    This CD is multimedia presentation of programme safety upgrading of Bohunice V1 NPP. This chapter consist of introductory commentary and 4 introductory speeches (video records): (1) Introductory speech of Vincent Pillar, Board chairman and director general of Slovak electric, Plc. (SE); (2) Introductory speech of Stefan Schmidt, director of SE - Bohunice Nuclear power plants; (3) Introductory speech of Jan Korec, Board chairman and director general of VUJE Trnava, Inc. - Engineering, Design and Research Organisation, Trnava; Introductory speech of Dietrich Kuschel, Senior vice-president of FRAMATOME ANP Project and Engineering

  14. A causal test of the motor theory of speech perception: a case of impaired speech production and spared speech perception.

    Science.gov (United States)

    Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z

    2015-01-01

    The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.

  15. Familiar units prevail over statistical cues in word segmentation.

    Science.gov (United States)

    Poulin-Charronnat, Bénédicte; Perruchet, Pierre; Tillmann, Barbara; Peereman, Ronald

    2017-09-01

    In language acquisition research, the prevailing position is that listeners exploit statistical cues, in particular transitional probabilities between syllables, to discover words of a language. However, other cues are also involved in word discovery. Assessing the weight learners give to these different cues leads to a better understanding of the processes underlying speech segmentation. The present study evaluated whether adult learners preferentially used known units or statistical cues for segmenting continuous speech. Before the exposure phase, participants were familiarized with part-words of a three-word artificial language. This design allowed the dissociation of the influence of statistical cues and familiar units, with statistical cues favoring word segmentation and familiar units favoring (nonoptimal) part-word segmentation. In Experiment 1, performance in a two-alternative forced choice (2AFC) task between words and part-words revealed part-word segmentation (even though part-words were less cohesive in terms of transitional probabilities and less frequent than words). By contrast, an unfamiliarized group exhibited word segmentation, as usually observed in standard conditions. Experiment 2 used a syllable-detection task to remove the likely contamination of performance by memory and strategy effects in the 2AFC task. Overall, the results suggest that familiar units overrode statistical cues, ultimately questioning the need for computation mechanisms of transitional probabilities (TPs) in natural language speech segmentation.

  16. Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

    Science.gov (United States)

    Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

    2018-05-01

    Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.

  17. Testing the influence of external and internal cues on smoking motivation using a community sample.

    Science.gov (United States)

    Litvin, Erika B; Brandon, Thomas H

    2010-02-01

    Exposing smokers to either external cues (e.g., pictures of cigarettes) or internal cues (e.g., negative affect induction) can induce urge to smoke and other behavioral and physiological responses. However, little is known about whether the two types of cues interact when presented in close proximity, as is likely the case in the real word. Additionally, potential moderators of cue reactivity have rarely been examined. Finally, few cue-reactivity studies have used representative samples of smokers. In a randomized 2 x 2 crossed factorial between-subjects design, the current study tested the effects of a negative affect cue intended to produce anxiety (speech preparation task) and an external smoking cue on urge and behavioral reactivity in a community sample of adult smokers (N = 175), and whether trait impulsivity moderated the effects. Both types of cues produced main effects on urges to smoke, despite the speech task failing to increase anxiety significantly. The speech task increased smoking urge related to anticipation of negative affect relief, whereas the external smoking cues increased urges related to anticipation of pleasure; however, the cues did not interact. Impulsivity measures predicted urge and other smoking-related variables, but did not moderate cue-reactivity. Results suggest independent rather than synergistic effects of these contributors to smoking motivation. (PsycINFO Database Record (c) 2010 APA, all rights reserved).

  18. SPEECH ACT ANALYSIS OF IGBO UTTERANCES IN FUNERAL ...

    African Journals Online (AJOL)

    Dean SPGS NAU

    In other words, a speech act is a .... relationship with that one single person and to share those memories ... identifies four conditions or rules for the effective performance of a ... In other words, the rules establish a system for the ... 54 shaped by the interplay of particular speech acts and non verbal cues. ..... Retrieved from.

  19. Predicting Intelligibility Gains in Dysarthria through Automated Speech Feature Analysis

    Science.gov (United States)

    Fletcher, Annalise R.; Wisler, Alan A.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Liss, Julie M.

    2017-01-01

    Purpose: Behavioral speech modifications have variable effects on the intelligibility of speakers with dysarthria. In the companion article, a significant relationship was found between measures of speakers' baseline speech and their intelligibility gains following cues to speak louder and reduce rate (Fletcher, McAuliffe, Lansford, Sinex, &…

  20. Recognizing intentions in infant-directed speech: evidence for universals.

    Science.gov (United States)

    Bryant, Gregory A; Barrett, H Clark

    2007-08-01

    In all languages studied to date, distinct prosodic contours characterize different intention categories of infant-directed (ID) speech. This vocal behavior likely exists universally as a species-typical trait, but little research has examined whether listeners can accurately recognize intentions in ID speech using only vocal cues, without access to semantic information. We recorded native-English-speaking mothers producing four intention categories of utterances (prohibition, approval, comfort, and attention) as both ID and adult-directed (AD) speech, and we then presented the utterances to Shuar adults (South American hunter-horticulturalists). Shuar subjects were able to reliably distinguish ID from AD speech and were able to reliably recognize the intention categories in both types of speech, although performance was significantly better with ID speech. This is the first demonstration that adult listeners in an indigenous, nonindustrialized, and nonliterate culture can accurately infer intentions from both ID speech and AD speech in a language they do not speak.

  1. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  2. Causal inference of asynchronous audiovisual speech

    Directory of Open Access Journals (Sweden)

    John F Magnotti

    2013-11-01

    Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.

  3. P2-13: Location word Cues' Effect on Location Discrimination Task: Cross-Modal Study

    Directory of Open Access Journals (Sweden)

    Satoko Ohtsuka

    2012-10-01

    Full Text Available As is well known, participants are slower and make more errors in responding to the display color of an incongruent color word than a congruent one. This traditional stroop effect is often accounted for with relatively automatic and dominant word processing. Although the word dominance account has been widely supported, it is not clear in what extent of perceptual tasks it is valid. Here we aimed to examine whether the word dominance effect is observed in location stroop tasks and in audio-visual situations. The participants were required to press a key according to the location of visual (Experiment 1 and audio (Experiment 2 targets, left or right, as soon as possible. A cue of written (Experiments 1a and 2a or spoken (Experiments 1b and 2b location words, “left” or “right”, was presented on the left or right side of the fixation with cue lead times (CLT of 200 ms and 1200 ms. Reaction time from target presentation to key press was recorded as a dependent variable. The results were that the location validity effect was marked in within-modality but less so in cross-modality trials. The word validity effect was strong in within- but not in cross-modality trials. The CLT gave some effect of inhibition of return. So the word dominance could be less effective in location tasks and in cross-modal situations. The spatial correspondence seems to overcome the word effect.

  4. Evolution of non-speech sound memory in postlingual deafness: implications for cochlear implant rehabilitation.

    Science.gov (United States)

    Lazard, D S; Giraud, A L; Truy, E; Lee, H J

    2011-07-01

    Neurofunctional patterns assessed before or after cochlear implantation (CI) are informative markers of implantation outcome. Because phonological memory reorganization in post-lingual deafness is predictive of the outcome, we investigated, using a cross-sectional approach, whether memory of non-speech sounds (NSS) produced by animals or objects (i.e. non-human sounds) is also reorganized, and how this relates to speech perception after CI. We used an fMRI auditory imagery task in which sounds were evoked by pictures of noisy items for post-lingual deaf candidates for CI and for normal-hearing subjects. When deaf subjects imagined sounds, the left inferior frontal gyrus, the right posterior temporal gyrus and the right amygdala were less activated compared to controls. Activity levels in these regions decreased with duration of auditory deprivation, indicating declining NSS representations. Whole brain correlations with duration of auditory deprivation and with speech scores after CI showed an activity decline in dorsal, fronto-parietal, cortical regions, and an activity increase in ventral cortical regions, the right anterior temporal pole and the hippocampal gyrus. Both dorsal and ventral reorganizations predicted poor speech perception outcome after CI. These results suggest that post-CI speech perception relies, at least partially, on the integrity of a neural system used for processing NSS that is based on audio-visual and articulatory mapping processes. When this neural system is reorganized, post-lingual deaf subjects resort to inefficient semantic- and memory-based strategies. These results complement those of other studies on speech processing, suggesting that both speech and NSS representations need to be maintained during deafness to ensure the success of CI. Copyright © 2011 Elsevier Ltd. All rights reserved.

  5. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

    OpenAIRE

    Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combi...

  6. Neural Entrainment to Speech Modulates Speech Intelligibility

    NARCIS (Netherlands)

    Riecke, Lars; Formisano, Elia; Sorger, Bettina; Baskent, Deniz; Gaudrain, Etienne

    2018-01-01

    Speech is crucial for communication in everyday life. Speech-brain entrainment, the alignment of neural activity to the slow temporal fluctuations (envelope) of acoustic speech input, is a ubiquitous element of current theories of speech processing. Associations between speech-brain entrainment and

  7. Speech Research

    Science.gov (United States)

    Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

  8. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  9. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  10. Development in children's interpretation of pitch cues to emotions.

    Science.gov (United States)

    Quam, Carolyn; Swingley, Daniel

    2012-01-01

    Young infants respond to positive and negative speech prosody (A. Fernald, 1993), yet 4-year-olds rely on lexical information when it conflicts with paralinguistic cues to approval or disapproval (M. Friend, 2003). This article explores this surprising phenomenon, testing one hundred eighteen 2- to 5-year-olds' use of isolated pitch cues to emotions in interactive tasks. Only 4- to 5-year-olds consistently interpreted exaggerated, stereotypically happy or sad pitch contours as evidence that a puppet had succeeded or failed to find his toy (Experiment 1) or was happy or sad (Experiments 2, 3). Two- and 3-year-olds exploited facial and body-language cues in the same task. The authors discuss the implications of this late-developing use of pitch cues to emotions, relating them to other functions of pitch. © 2011 The Authors. Child Development © 2011 Society for Research in Child Development, Inc.

  11. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  12. Voice-associated static face image releases speech from informational masking.

    Science.gov (United States)

    Gao, Yayue; Cao, Shuyang; Qu, Tianshu; Wu, Xihong; Li, Haifeng; Zhang, Jinsheng; Li, Liang

    2014-06-01

    In noisy, multipeople talking environments such as a cocktail party, listeners can use various perceptual and/or cognitive cues to improve recognition of target speech against masking, particularly informational masking. Previous studies have shown that temporally prepresented voice cues (voice primes) improve recognition of target speech against speech masking but not noise masking. This study investigated whether static face image primes that have become target-voice associated (i.e., facial images linked through associative learning with voices reciting the target speech) can be used by listeners to unmask speech. The results showed that in 32 normal-hearing younger adults, temporally prepresenting a voice-priming sentence with the same voice reciting the target sentence significantly improved the recognition of target speech that was masked by irrelevant two-talker speech. When a person's face photograph image became associated with the voice reciting the target speech by learning, temporally prepresenting the target-voice-associated face image significantly improved recognition of target speech against speech masking, particularly for the last two keywords in the target sentence. Moreover, speech-recognition performance under the voice-priming condition was significantly correlated to that under the face-priming condition. The results suggest that learned facial information on talker identity plays an important role in identifying the target-talker's voice and facilitating selective attention to the target-speech stream against the masking-speech stream. © 2014 The Institute of Psychology, Chinese Academy of Sciences and Wiley Publishing Asia Pty Ltd.

  13. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ...-Speech Services for Individuals with Hearing and Speech Disabilities, Report and Order (Order), document...] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities...

  14. Multiperson visual focus of attention from head pose and meeting contextual cues.

    Science.gov (United States)

    Ba, Sileye O; Odobez, Jean-Marc

    2011-01-01

    This paper introduces a novel contextual model for the recognition of people's visual focus of attention (VFOA) in meetings from audio-visual perceptual cues. More specifically, instead of independently recognizing the VFOA of each meeting participant from his own head pose, we propose to jointly recognize the participants' visual attention in order to introduce context-dependent interaction models that relate to group activity and the social dynamics of communication. Meeting contextual information is represented by the location of people, conversational events identifying floor holding patterns, and a presentation activity variable. By modeling the interactions between the different contexts and their combined and sometimes contradictory impact on the gazing behavior, our model allows us to handle VFOA recognition in difficult task-based meetings involving artifacts, presentations, and moving people. We validated our model through rigorous evaluation on a publicly available and challenging data set of 12 real meetings (5 hours of data). The results demonstrated that the integration of the presentation and conversation dynamical context using our model can lead to significant performance improvements.

  15. Sound frequency affects speech emotion perception: results from congenital amusia.

    Science.gov (United States)

    Lolli, Sydney L; Lewenstein, Ari D; Basurto, Julian; Winnik, Sean; Loui, Psyche

    2015-01-01

    Congenital amusics, or "tone-deaf" individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech.

  16. Zebra finches are sensitive to prosodic features of human speech.

    Science.gov (United States)

    Spierings, Michelle J; ten Cate, Carel

    2014-07-22

    Variation in pitch, amplitude and rhythm adds crucial paralinguistic information to human speech. Such prosodic cues can reveal information about the meaning or emphasis of a sentence or the emotional state of the speaker. To examine the hypothesis that sensitivity to prosodic cues is language independent and not human specific, we tested prosody perception in a controlled experiment with zebra finches. Using a go/no-go procedure, subjects were trained to discriminate between speech syllables arranged in XYXY patterns with prosodic stress on the first syllable and XXYY patterns with prosodic stress on the final syllable. To systematically determine the salience of the various prosodic cues (pitch, duration and amplitude) to the zebra finches, they were subjected to five tests with different combinations of these cues. The zebra finches generalized the prosodic pattern to sequences that consisted of new syllables and used prosodic features over structural ones to discriminate between stimuli. This strong sensitivity to the prosodic pattern was maintained when only a single prosodic cue was available. The change in pitch was treated as more salient than changes in the other prosodic features. These results show that zebra finches are sensitive to the same prosodic cues known to affect human speech perception. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  17. The Interaction between Prosody and Meaning in Second Language Speech Production

    Science.gov (United States)

    Jackson, Carrie N.; O'Brien, Mary Grantham

    2011-01-01

    Research has shown that English and German native speakers use prosodic cues during speech production to convey the intended meaning of an utterance. However, little is known about whether American L2 learners of German also use such cues during L2 production. The present study shows that inter-mediate-level L2 learners of German (English L1) use…

  18. Speech disorders - children

    Science.gov (United States)

    ... disorder; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder; Speech disorder - stuttering ... evaluation tools that can help identify and diagnose speech disorders: Denver Articulation Screening Examination Goldman-Fristoe Test of ...

  19. Replacing Maladaptive Speech with Verbal Labeling Responses: An Analysis of Generalized Responding.

    Science.gov (United States)

    Foxx, R. M.; And Others

    1988-01-01

    Three mentally handicapped students (aged 13, 36, and 40) with maladaptive speech received training to answer questions with verbal labels. The results of their cues-pause-point training showed that the students replaced their maladaptive speech with correct labels (answers) to questions in the training setting and three generalization settings.…

  20. The Perception of "Sine-Wave Speech" by Adults with Developmental Dyslexia.

    Science.gov (United States)

    Rosner, Burton S.; Talcott, Joel B.; Witton, Caroline; Hogg, James D.; Richardson, Alexandra J.; Hansen, Peter C.; Stein, John F.

    2003-01-01

    "Sine-wave speech" sentences contain only four frequency-modulated sine waves, lacking many acoustic cues present in natural speech. Adults with (n=19) and without (n=14) dyslexia were asked to reproduce orally sine-wave utterances in successive trials. Results suggest comprehension of sine-wave sentences is impaired in some adults with…

  1. Visual cues and listening effort: individual variability.

    Science.gov (United States)

    Picou, Erin M; Ricketts, Todd A; Hornsby, Benjamin W Y

    2011-10-01

    To investigate the effect of visual cues on listening effort as well as whether predictive variables such as working memory capacity (WMC) and lipreading ability affect the magnitude of listening effort. Twenty participants with normal hearing were tested using a paired-associates recall task in 2 conditions (quiet and noise) and 2 presentation modalities (audio only [AO] and auditory-visual [AV]). Signal-to-noise ratios were adjusted to provide matched speech recognition across audio-only and AV noise conditions. Also measured were subjective perceptions of listening effort and 2 predictive variables: (a) lipreading ability and (b) WMC. Objective and subjective results indicated that listening effort increased in the presence of noise, but on average the addition of visual cues did not significantly affect the magnitude of listening effort. Although there was substantial individual variability, on average participants who were better lipreaders or had larger WMCs demonstrated reduced listening effort in noise in AV conditions. Overall, the results support the hypothesis that integrating auditory and visual cues requires cognitive resources in some participants. The data indicate that low lipreading ability or low WMC is associated with relatively effortful integration of auditory and visual information in noise.

  2. Speech Processing.

    Science.gov (United States)

    1983-05-01

    The VDE system developed had the capability of recognizing up to 248 separate words in syntactic structures. 4 The two systems described are isolated...AND SPEAKER RECOGNITION by M.J.Hunt 5 ASSESSMENT OF SPEECH SYSTEMS ’ ..- * . by R.K.Moore 6 A SURVEY OF CURRENT EQUIPMENT AND RESEARCH’ by J.S.Bridle...TECHNOLOGY IN NAVY TRAINING SYSTEMS by R.Breaux, M.Blind and R.Lynchard 10 9 I-I GENERAL REVIEW OF MILITARY APPLICATIONS OF VOICE PROCESSING DR. BRUNO

  3. Speech Recognition

    Directory of Open Access Journals (Sweden)

    Adrian Morariu

    2009-01-01

    Full Text Available This paper presents a method of speech recognition by pattern recognition techniques. Learning consists in determining the unique characteristics of a word (cepstral coefficients by eliminating those characteristics that are different from one word to another. For learning and recognition, the system will build a dictionary of words by determining the characteristics of each word to be used in the recognition. Determining the characteristics of an audio signal consists in the following steps: noise removal, sampling it, applying Hamming window, switching to frequency domain through Fourier transform, calculating the magnitude spectrum, filtering data, determining cepstral coefficients.

  4. Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion.

    Science.gov (United States)

    Schutz, Michael

    2017-01-01

    Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally "happy") pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers "trade off" cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music-widely recognized for its artistic significance-complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.

  5. Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion

    Directory of Open Access Journals (Sweden)

    Michael Schutz

    2017-11-01

    Full Text Available Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor, a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy” pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015. Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.

  6. Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion

    Science.gov (United States)

    Schutz, Michael

    2017-01-01

    Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy”) pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech. PMID:29249997

  7. Improving Understanding of Emotional Speech Acoustic Content

    Science.gov (United States)

    Tinnemore, Anna

    Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.

  8. Spectro-temporal cues enhance modulation sensitivity in cochlear implant users.

    Science.gov (United States)

    Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y

    2017-08-01

    Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband "ripple" stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects' spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Spectro-temporal cues enhance modulation sensitivity in cochlear implant users

    Science.gov (United States)

    Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y.

    2018-01-01

    Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband “ripple” stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects’ spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. PMID:28601530

  10. Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

    Science.gov (United States)

    Borrie, Stephanie A

    2015-03-01

    This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.

  11. Intelligibility for Binaural Speech with Discarded Low-SNR Speech Components.

    Science.gov (United States)

    Schoenmaker, Esther; van de Par, Steven

    2016-01-01

    Speech intelligibility in multitalker settings improves when the target speaker is spatially separated from the interfering speakers. A factor that may contribute to this improvement is the improved detectability of target-speech components due to binaural interaction in analogy to the Binaural Masking Level Difference (BMLD). This would allow listeners to hear target speech components within specific time-frequency intervals that have a negative SNR, similar to the improvement in the detectability of a tone in noise when these contain disparate interaural difference cues. To investigate whether these negative-SNR target-speech components indeed contribute to speech intelligibility, a stimulus manipulation was performed where all target components were removed when local SNRs were smaller than a certain criterion value. It can be expected that for sufficiently high criterion values target speech components will be removed that do contribute to speech intelligibility. For spatially separated speakers, assuming that a BMLD-like detection advantage contributes to intelligibility, degradation in intelligibility is expected already at criterion values below 0 dB SNR. However, for collocated speakers it is expected that higher criterion values can be applied without impairing speech intelligibility. Results show that degradation of intelligibility for separated speakers is only seen for criterion values of 0 dB and above, indicating a negligible contribution of a BMLD-like detection advantage in multitalker settings. These results show that the spatial benefit is related to a spatial separation of speech components at positive local SNRs rather than to a BMLD-like detection improvement for speech components at negative local SNRs.

  12. Sequence analysis in multilevel models. A study on different sources of patient cues in medical consultations.

    Science.gov (United States)

    Del Piccolo, Lidia; Mazzi, Maria Angela; Dunn, Graham; Sandri, Marco; Zimmermann, Christa

    2007-12-01

    The aims of the study were to explore the importance of macro (patient, physician, consultation) and micro (doctor-patient speech sequences) variables in promoting patient cues (unsolicited new information or expressions of feelings), and to describe the methodological implications related to the study of speech sequences. Patient characteristics, a consultation index of partnership and doctor-patient speech sequences were recorded for 246 primary care consultations in six primary care surgeries in Verona, Italy. Homogeneity and stationarity conditions of speech sequences allowed the creation of a hierarchy of multilevel logit models including micro and macro level variables, with the presence/absence of cues as the dependent variable. We found that emotional distress of the patient increased cues and that cues appeared among other patient expressions and were preceded by physicians' facilitations and handling of emotion. Partnership, in terms of open-ended inquiry, active listening skills and handling of emotion by the physician and active participation by the patient throughout the consultation, reduced cue frequency.

  13. Newborn infants' sensitivity to perceptual cues to lexical and grammatical words.

    Science.gov (United States)

    Shi, R; Werker, J F; Morgan, J L

    1999-09-30

    In our study newborn infants were presented with lists of lexical and grammatical words prepared from natural maternal speech. The results show that newborns are able to categorically discriminate these sets of words based on a constellation of perceptual cues that distinguish them. This general ability to detect and categorically discriminate sets of words on the basis of multiple acoustic and phonological cues may provide a perceptual base that can help older infants bootstrap into the acquisition of grammatical categories and syntactic structure.

  14. A configural dominant account of contextual cueing: Configural cues are stronger than colour cues.

    Science.gov (United States)

    Kunar, Melina A; John, Rebecca; Sweetman, Hollie

    2014-01-01

    Previous work has shown that reaction times to find a target in displays that have been repeated are faster than those for displays that have never been seen before. This learning effect, termed "contextual cueing" (CC), has been shown using contexts such as the configuration of the distractors in the display and the background colour. However, it is not clear how these two contexts interact to facilitate search. We investigated this here by comparing the strengths of these two cues when they appeared together. In Experiment 1, participants searched for a target that was cued by both colour and distractor configural cues, compared with when the target was only predicted by configural information. The results showed that the addition of a colour cue did not increase contextual cueing. In Experiment 2, participants searched for a target that was cued by both colour and distractor configuration compared with when the target was only cued by colour. The results showed that adding a predictive configural cue led to a stronger CC benefit. Experiments 3 and 4 tested the disruptive effects of removing either a learned colour cue or a learned configural cue and whether there was cue competition when colour and configural cues were presented together. Removing the configural cue was more disruptive to CC than removing colour, and configural learning was shown to overshadow the learning of colour cues. The data support a configural dominant account of CC, where configural cues act as the stronger cue in comparison to colour when they are presented together.

  15. An analysis of machine translation and speech synthesis in speech-to-speech translation system

    OpenAIRE

    Hashimoto, K.; Yamagishi, J.; Byrne, W.; King, S.; Tokuda, K.

    2011-01-01

    This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, ...

  16. Performance evaluation of a motor-imagery-based EEG-Brain computer interface using a combined cue with heterogeneous training data in BCI-Naive subjects

    Directory of Open Access Journals (Sweden)

    Lee Youngbum

    2011-10-01

    Full Text Available Abstract Background The subjects in EEG-Brain computer interface (BCI system experience difficulties when attempting to obtain the consistent performance of the actual movement by motor imagery alone. It is necessary to find the optimal conditions and stimuli combinations that affect the performance factors of the EEG-BCI system to guarantee equipment safety and trust through the performance evaluation of using motor imagery characteristics that can be utilized in the EEG-BCI testing environment. Methods The experiment was carried out with 10 experienced subjects and 32 naive subjects on an EEG-BCI system. There were 3 experiments: The experienced homogeneous experiment, the naive homogeneous experiment and the naive heterogeneous experiment. Each experiment was compared in terms of the six audio-visual cue combinations and consisted of 50 trials. The EEG data was classified using the least square linear classifier in case of the naive subjects through the common spatial pattern filter. The accuracy was calculated using the training and test data set. The p-value of the accuracy was obtained through the statistical significance test. Results In the case in which a naive subject was trained by a heterogeneous combined cue and tested by a visual cue, the result was not only the highest accuracy (p Conclusions We propose the use of this measuring methodology of a heterogeneous combined cue for training data and a visual cue for test data by the typical EEG-BCI algorithm on the EEG-BCI system to achieve effectiveness in terms of consistence, stability, cost, time, and resources management without the need for a trial and error process.

  17. Spoken Word Recognition of Chinese Words in Continuous Speech

    Science.gov (United States)

    Yip, Michael C. W.

    2015-01-01

    The present study examined the role of positional probability of syllables played in recognition of spoken word in continuous Cantonese speech. Because some sounds occur more frequently at the beginning position or ending position of Cantonese syllables than the others, so these kinds of probabilistic information of syllables may cue the locations…

  18. The Influence of Direct and Indirect Speech on Mental Representations

    NARCIS (Netherlands)

    A. Eerland (Anita); J.A.A. Engelen (Jan A.A.); R.A. Zwaan (Rolf)

    2013-01-01

    textabstractLanguage can be viewed as a set of cues that modulate the comprehender's thought processes. It is a very subtle instrument. For example, the literature suggests that people perceive direct speech (e.g., Joanne said: 'I went out for dinner last night') as more vivid and perceptually

  19. Speech and Language Delay

    Science.gov (United States)

    ... OTC Relief for Diarrhea Home Diseases and Conditions Speech and Language Delay Condition Speech and Language Delay Share Print Table of Contents1. ... Treatment6. Everyday Life7. Questions8. Resources What is a speech and language delay? A speech and language delay ...

  20. Reacting to Neighborhood Cues?

    DEFF Research Database (Denmark)

    Danckert, Bolette; Dinesen, Peter Thisted; Sønderskov, Kim Mannemar

    2017-01-01

    is founded on politically sophisticated individuals having a greater comprehension of news and other mass-mediated sources, which makes them less likely to rely on neighborhood cues as sources of information relevant for political attitudes. Based on a unique panel data set with fine-grained information...

  1. Sound frequency affects speech emotion perception: Results from congenital amusia

    Directory of Open Access Journals (Sweden)

    Sydney eLolli

    2015-09-01

    Full Text Available Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody (MBEP were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under band-pass and unfiltered speech conditions. Results showed a significant correlation between pitch discrimination threshold and emotion identification accuracy for band-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold > 16 Hz performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between band-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation.

  2. Dog-directed speech: why do we use it and do dogs pay attention to it?

    Science.gov (United States)

    Ben-Aderet, Tobey; Gallego-Abenza, Mario; Reby, David; Mathevon, Nicolas

    2017-01-11

    Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners. © 2017 The Author(s).

  3. Music and speech prosody: a common rhythm.

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).

  4. Music and speech prosody: A common rhythm

    Directory of Open Access Journals (Sweden)

    Maija eHausen

    2013-09-01

    Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.

  5. Music and speech prosody: a common rhythm

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022

  6. Markers of Deception in Italian Speech

    Directory of Open Access Journals (Sweden)

    Katelyn eSpence

    2012-10-01

    Full Text Available Lying is a universal activity and the detection of lying a universal concern. Presently, there is great interest in determining objective measures of deception. The examination of speech, in particular, holds promise in this regard; yet, most of what we know about the relationship between speech and lying is based on the assessment of English-speaking participants. Few studies have examined indicators of deception in languages other than English. The world’s languages differ in significant ways, and cross-linguistic studies of deceptive communications are a research imperative. Here we review some of these differences amongst the world’s languages, and provide an overview of a number of recent studies demonstrating that cross-linguistic research is a worthwhile endeavour. In addition, we report the results of an empirical investigation of pitch, response latency, and speech rate as cues to deception in Italian speech. True and false opinions were elicited in an audio-taped interview. A within subjects analysis revealed no significant difference between the average pitch of the two conditions; however, speech rate was significantly slower, while response latency was longer, during deception compared with truth-telling. We explore the implications of these findings and propose directions for future research, with the aim of expanding the cross-linguistic branch of research on markers of deception.

  7. Mind your pricing cues.

    Science.gov (United States)

    Anderson, Eric; Simester, Duncan

    2003-09-01

    For most of the items they buy, consumers don't have an accurate sense of what the price should be. Ask them to guess how much a four-pack of 35-mm film costs, and you'll get a variety of wrong answers: Most people will underestimate; many will only shrug. Research shows that consumers' knowledge of the market is so far from perfect that it hardly deserves to be called knowledge at all. Yet people happily buy film and other products every day. Is this because they don't care what kind of deal they're getting? No. Remarkably, it's because they rely on retailers to tell them whether they're getting a good price. In subtle and not-so-subtle ways, retailers send signals to customers, telling them whether a given price is relatively high or low. In this article, the authors review several common pricing cues retailers use--"sale" signs, prices that end in 9, signpost items, and price-matching guarantees. They also offer some surprising facts about how--and how well--those cues work. For instance, the authors' tests with several mail-order catalogs reveal that including the word "sale" beside a price can increase demand by more than 50%. The practice of using a 9 at the end of a price to denote a bargain is so common, you'd think customers would be numb to it. Yet in a study the authors did involving a women's clothing catalog, they increased demand by a third just by changing the price of a dress from $34 to $39. Pricing cues are powerful tools for guiding customers' purchasing decisions, but they must be applied judiciously. Used inappropriately, the cues may breach customers' trust, reduce brand equity, and give rise to lawsuits.

  8. SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

    Directory of Open Access Journals (Sweden)

    Giampiero Salvi

    2009-01-01

    Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.

  9. Sound of mind : electrophysiological and behavioural evidence for the role of context, variation and informativity in human speech processing

    NARCIS (Netherlands)

    Nixon, Jessie Sophia

    2014-01-01

    Spoken communication involves transmission of a message which takes physical form in acoustic waves. Within any given language, acoustic cues pattern in language-specific ways along language-specific acoustic dimensions to create speech sound contrasts. These cues are utilized by listeners to

  10. Effects of Visual Speech on Early Auditory Evoked Fields - From the Viewpoint of Individual Variance

    Science.gov (United States)

    Yahata, Izumi; Kanno, Akitake; Hidaka, Hiroshi; Sakamoto, Shuichi; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio

    2017-01-01

    The effects of visual speech (the moving image of the speaker’s face uttering speech sound) on early auditory evoked fields (AEFs) were examined using a helmet-shaped magnetoencephalography system in 12 healthy volunteers (9 males, mean age 35.5 years). AEFs (N100m) in response to the monosyllabic sound /be/ were recorded and analyzed under three different visual stimulus conditions, the moving image of the same speaker’s face uttering /be/ (congruent visual stimuli) or uttering /ge/ (incongruent visual stimuli), and visual noise (still image processed from speaker’s face using a strong Gaussian filter: control condition). On average, latency of N100m was significantly shortened in the bilateral hemispheres for both congruent and incongruent auditory/visual (A/V) stimuli, compared to the control A/V condition. However, the degree of N100m shortening was not significantly different between the congruent and incongruent A/V conditions, despite the significant differences in psychophysical responses between these two A/V conditions. Moreover, analysis of the magnitudes of these visual effects on AEFs in individuals showed that the lip-reading effects on AEFs tended to be well correlated between the two different audio-visual conditions (congruent vs. incongruent visual stimuli) in the bilateral hemispheres but were not significantly correlated between right and left hemisphere. On the other hand, no significant correlation was observed between the magnitudes of visual speech effects and psychophysical responses. These results may indicate that the auditory-visual interaction observed on the N100m is a fundamental process which does not depend on the congruency of the visual information. PMID:28141836

  11. Secure access to patient's health records using SpeechXRays a mutli-channel biometrics platform for user authentication.

    Science.gov (United States)

    Spanakis, Emmanouil G; Spanakis, Marios; Karantanas, Apostolos; Marias, Kostas

    2016-08-01

    The most commonly used method for user authentication in ICT services or systems is the application of identification tools such as passwords or personal identification numbers (PINs). The rapid development in ICT technology regarding smart devices (laptops, tablets and smartphones) has allowed also the advance of hardware components that capture several biometric traits such as fingerprints and voice. These components are aiming among others to overcome weaknesses and flaws of password usage under the prism of improved user authentication with higher level of security, privacy and usability. To this respect, the potential application of biometrics for secure user authentication regarding access in systems with sensitive data (i.e. patient's data from electronic health records) shows great potentials. SpeechXRays aims to provide a user recognition platform based on biometrics of voice acoustics analysis and audio-visual identity verification. Among others, the platform aims to be applied as an authentication tool for medical personnel in order to gain specific access to patient's electronic health records. In this work a short description of SpeechXrays implementation tool regarding eHealth is provided and analyzed. This study explores security and privacy issues, and offers a comprehensive overview of biometrics technology applications in addressing the e-Health security challenges. We present and describe the necessary requirement for an eHealth platform concerning biometric security.

  12. Speech Recognition with the Advanced Combination Encoder and Transient Emphasis Spectral Maxima Strategies in Nucleus 24 Recipients

    Science.gov (United States)

    Holden, Laura K.; Vandali, Andrew E.; Skinner, Margaret W.; Fourakis, Marios S.; Holden, Timothy A.

    2005-01-01

    One of the difficulties faced by cochlear implant (CI) recipients is perception of low-intensity speech cues. A. E. Vandali (2001) has developed the transient emphasis spectral maxima (TESM) strategy to amplify short-duration, low-level sounds. The aim of the present study was to determine whether speech scores would be significantly higher with…

  13. Weighting of Acoustic Cues to a Manner Distinction by Children with and without Hearing Loss

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H.

    2015-01-01

    Purpose: Children must develop optimal perceptual weighting strategies for processing speech in their first language. Hearing loss can interfere with that development, especially if cochlear implants are required. The three goals of this study were to measure, for children with and without hearing loss: (a) cue weighting for a manner distinction,…

  14. Cross-Linguistic Differences in Prosodic Cues to Syntactic Disambiguation in German and English

    Science.gov (United States)

    O'Brien, Mary Grantham; Jackson, Carrie N.; Gardner, Christine E.

    2014-01-01

    This study examined whether late-learning English-German second language (L2) learners and late-learning German-English L2 learners use prosodic cues to disambiguate temporarily ambiguous first language and L2 sentences during speech production. Experiments 1a and 1b showed that English-German L2 learners and German-English L2 learners used a…

  15. Availability of binaural cues for pediatric bilateral cochlear implant recipients.

    Science.gov (United States)

    Sheffield, Sterling W; Haynes, David S; Wanna, George B; Labadie, Robert F; Gifford, René H

    2015-03-01

    Bilateral implant recipients theoretically have access to binaural cues. Research in postlingually deafened adults with cochlear implants (CIs) indicates minimal evidence for true binaural hearing. Congenitally deafened children who experience spatial hearing with bilateral CIs, however, might perceive binaural cues in the CI signal differently. There is limited research examining binaural hearing in children with CIs, and the few published studies are limited by the use of unrealistic speech stimuli and background noise. The purposes of this study were to (1) replicate our previous study of binaural hearing in postlingually deafened adults with AzBio sentences in prelingually deafened children with the pediatric version of the AzBio sentences, and (2) replicate previous studies of binaural hearing in children with CIs using more open-set sentences and more realistic background noise (i.e., multitalker babble). The study was a within-participant, repeated-measures design. The study sample consisted of 14 children with bilateral CIs with at least 25 mo of listening experience. Speech recognition was assessed using sentences presented in multitalker babble at a fixed signal-to-noise ratio. Test conditions included speech at 0° with noise presented at 0° (S0N0), on the side of the first CI (90° or 270°) (S0N1stCI), and on the side of the second CI (S0N2ndCI) as well as speech presented at 0° with noise presented semidiffusely from eight speakers at 45° intervals. Estimates of summation, head shadow, squelch, and spatial release from masking were calculated. Results of test conditions commonly reported in the literature (S0N0, S0N1stCI, S0N2ndCI) are consistent with results from previous research in adults and children with bilateral CIs, showing minimal summation and squelch but typical head shadow and spatial release from masking. However, bilateral benefit over the better CI with speech at 0° was much larger with semidiffuse noise. Congenitally deafened

  16. Speech and Communication Disorders

    Science.gov (United States)

    ... to being completely unable to speak or understand speech. Causes include Hearing disorders and deafness Voice problems, ... or those caused by cleft lip or palate Speech problems like stuttering Developmental disabilities Learning disorders Autism ...

  17. Cue conflicts in context

    DEFF Research Database (Denmark)

    Boeg Thomsen, Ditte; Poulsen, Mads

    2015-01-01

    When learning their first language, children develop strategies for assigning semantic roles to sentence structures, depending on morphosyntactic cues such as case and word order. Traditionally, comprehension experiments have presented transitive clauses in isolation, and crosslinguistically...... preschoolers. However, object-first clauses may be context-sensitive structures, which are infelicitous in isolation. In a second act-out study we presented OVS clauses in supportive and unsupportive discourse contexts and in isolation and found that five-to-six-year-olds’ OVS comprehension was enhanced...

  18. Audio-visual interactions in product sound design

    NARCIS (Netherlands)

    Özcan, E.; Van Egmond, R.

    2010-01-01

    Consistent product experience requires congruity between product properties such as visual appearance and sound. Therefore, for designing appropriate product sounds by manipulating their spectral-temporal structure, product sounds should preferably not be considered in isolation but as an integral

  19. Conditioning Influences Audio-Visual Integration by Increasing Sound Saliency

    Directory of Open Access Journals (Sweden)

    Fabrizio Leo

    2011-10-01

    Full Text Available We investigated the effect of prior conditioning of an auditory stimulus on audiovisual integration in a series of four psychophysical experiments. The experiments factorially manipulated the conditioning procedure (picture vs monetary conditioning and multisensory paradigm (2AFC visual detection vs redundant target paradigm. In the conditioning sessions, subjects were presented with three pure tones (= conditioned stimulus, CS that were paired with neutral, positive, or negative unconditioned stimuli (US, monetary: +50 euro cents,.–50 cents, 0 cents; pictures: highly pleasant, unpleasant, and neutral IAPS. In a 2AFC visual selective attention paradigm, detection of near-threshold Gabors was improved by concurrent sounds that had previously been paired with a positive (monetary or negative (picture outcome relative to neutral sounds. In the redundant target paradigm, sounds previously paired with positive (monetary or negative (picture outcomes increased response speed to both auditory and audiovisual targets similarly. Importantly, prior conditioning did not increase the multisensory response facilitation (ie, (A + V/2 – AV or the race model violation. Collectively, our results suggest that prior conditioning primarily increases the saliency of the auditory stimulus per se rather than influencing audiovisual integration directly. In turn, conditioned sounds are rendered more potent for increasing response accuracy or speed in detection of visual targets.

  20. Audio-Visual Aids for Cooperative Education and Training.

    Science.gov (United States)

    Botham, C. N.

    Within the context of cooperative education, audiovisual aids may be used for spreading the idea of cooperatives and helping to consolidate study groups; for the continuous process of education, both formal and informal, within the cooperative movement; for constant follow up purposes; and for promoting loyalty to the movement. Detailed…

  1. An Audio-Visual Presentation of Black Francophone Poetry.

    Science.gov (United States)

    Bruner, Charlotte H.

    1982-01-01

    A college class project to develop a videocassette presentation of African, Caribbean, and Afro-American French poetry is described from its inception through the processes of obtaining copyright and translation permissions, arranging scripts, presenting at various functions, and reception by Francophone and non-Francophone audiences. (MSE)

  2. Audio-visual Training for Lip–reading

    DEFF Research Database (Denmark)

    Gebert, Hermann; Bothe, Hans-Heinrich

    2011-01-01

    This new edited book aims to bring together researchers and developers from various related areas to share their knowledge and experience, to describe current state of the art in mobile and wireless-based adaptive e-learning and to present innovative techniques and solutions that support a person...

  3. Effect of Audio-Visual Intervention Program on Cognitive ...

    African Journals Online (AJOL)

    Preschool may not be a place where formal education is imparted but yes, it definitely is a place where children have their first taste of independence. Preschool ... Hence, in many ways the findings of the present study can be beneficial in strengthening the non-formal preschool education component. It can be useful for the ...

  4. [Audio-visual communication in the history of psychiatry].

    Science.gov (United States)

    Farina, B; Remoli, V; Russo, F

    1993-12-01

    The authors analyse the evolution of visual communication in the history of psychiatry. From the 18th century oil paintings to the first dagherrotic prints until the cinematography and the modern audiovisual systems they observed an increasing diffusion of the new communication techniques in psychiatry, and described the use of the different techniques in psychiatric practice. The article ends with a brief review of the current applications of the audiovisual in therapy, training, teaching, and research.

  5. preservation of audio-visual records at National Archives

    African Journals Online (AJOL)

    Walter

    ... where the equipment and power supply might not be readily available. ... of information beyond the written word and they serve as direct and powerful ... preservation of mankind's collective memory, and access to information by citizens.

  6. School Building Design and Audio-Visual Resources.

    Science.gov (United States)

    National Committee for Audio-Visual Aids in Education, London (England).

    The design of new schools should facilitate the use of audiovisual resources by ensuring that the materials used in the construction of the buildings provide adequate sound insulation and acoustical and viewing conditions in all learning spaces. The facilities to be considered are: electrical services; electronic services; light control and…

  7. Increasing observer objectivity with audio-visual technology: The Sphygmocorder

    NARCIS (Netherlands)

    Atkins, N.; Brien, E. O; Wesseling, K.H.; Guelen, I.

    1997-01-01

    The most fallible component of blood pressure measurement is the human observer. The traditional technique of measuring blood pressure does not allow the result of the measurement to be checked by independent observers, thereby leaving the method open to bias. In the Sphygmocorder, several

  8. Preattentive processing of audio-visual emotional signals

    DEFF Research Database (Denmark)

    Föcker, J.; Gondan, Matthias; Röder, B.

    2011-01-01

    Previous research has shown that redundant information in faces and voices leads to faster emotional categorization compared to incongruent emotional information even when attending to only one modality. The aim of the present study was to test whether these crossmodal effects are predominantly d...

  9. A Joint Audio-Visual Approach to Audio Localization

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2015-01-01

    Localization of audio sources is an important research problem, e.g., to facilitate noise reduction. In the recent years, the problem has been tackled using distributed microphone arrays (DMA). A common approach is to apply direction-of-arrival (DOA) estimation on each array (denoted as nodes), a...... time-of-flight cameras. Moreover, we propose an optimal method for weighting such DOA and range information for audio localization. Our experiments on both synthetic and real data show that there is a clear, potential advantage of using the joint audiovisual localization framework....

  10. Audio-Visual Equipment Depreciation. RDU-75-07.

    Science.gov (United States)

    Drake, Miriam A.; Baker, Martha

    A study was conducted at Purdue University to gather operational and budgetary planning data for the Libraries and Audiovisual Center. The objectives were: (1) to complete a current inventory of equipment including year of purchase, costs, and salvage value; (2) to determine useful life data for general classes of equipment; and (3) to determine…

  11. Free Speech Yearbook 1978.

    Science.gov (United States)

    Phifer, Gregg, Ed.

    The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…

  12. Cueing listeners to attend to a target talker progressively improves word report as the duration of the cue-target interval lengthens to 2,000 ms.

    Science.gov (United States)

    Holmes, Emma; Kitterick, Padraig T; Summerfield, A Quentin

    2018-04-25

    Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1,000, and 2,000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker's spatial location or their gender. Participants directed attention to location and gender simultaneously ("objects") at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2,000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2,000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2,000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2,000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities.

  13. Cue reactivity towards shopping cues in female participants.

    Science.gov (United States)

    Starcke, Katrin; Schlereth, Berenike; Domass, Debora; Schöler, Tobias; Brand, Matthias

    2013-03-01

    Background and aims It is currently under debate whether pathological buying can be considered as a behavioural addiction. Addictions have often been investigated with cue-reactivity paradigms to assess subjective, physiological and neural craving reactions. The current study aims at testing whether cue reactivity towards shopping cues is related to pathological buying tendencies. Methods A sample of 66 non-clinical female participants rated shopping related pictures concerning valence, arousal, and subjective craving. In a subgroup of 26 participants, electrodermal reactions towards those pictures were additionally assessed. Furthermore, all participants were screened concerning pathological buying tendencies and baseline craving for shopping. Results Results indicate a relationship between the subjective ratings of the shopping cues and pathological buying tendencies, even if baseline craving for shopping was controlled for. Electrodermal reactions were partly related to the subjective ratings of the cues. Conclusions Cue reactivity may be a potential correlate of pathological buying tendencies. Thus, pathological buying may be accompanied by craving reactions towards shopping cues. Results support the assumption that pathological buying can be considered as a behavioural addiction. From a methodological point of view, results support the view that the cue-reactivity paradigm is suited for the investigation of craving reactions in pathological buying and future studies should implement this paradigm in clinical samples.

  14. Individual differences in speech-in-noise perception parallel neural speech processing and attention in preschoolers

    Science.gov (United States)

    Thompson, Elaine C.; Carr, Kali Woodruff; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina

    2016-01-01

    From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3–5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ~12 months), we followed a cohort of 59 preschoolers, ages 3.0–4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children. PMID:27864051

  15. Perceived gender in clear and conversational speech

    Science.gov (United States)

    Booz, Jaime A.

    Although many studies have examined acoustic and sociolinguistic differences between male and female speech, the relationship between talker speaking style and perceived gender has not yet been explored. The present study attempts to determine whether clear speech, a style adopted by talkers who perceive some barrier to effective communication, shifts perceptions of femininity for male and female talkers. Much of our understanding of gender perception in voice and speech is based on sustained vowels or single words, eliminating temporal, prosodic, and articulatory cues available in more naturalistic, connected speech. Thus, clear and conversational sentence stimuli, selected from the 41 talkers of the Ferguson Clear Speech Database (Ferguson, 2004) were presented to 17 normal-hearing listeners, aged 18 to 30. They rated the talkers' gender using a visual analog scale with "masculine" and "feminine" endpoints. This response method was chosen to account for within-category shifts of gender perception by allowing nonbinary responses. Mixed-effects regression analysis of listener responses revealed a small but significant effect of speaking style, and this effect was larger for male talkers than female talkers. Because of the high degree of talker variability observed for talker gender, acoustic analyses of these sentences were undertaken to determine the relationship between acoustic changes in clear and conversational speech and perceived femininity. Results of these analyses showed that mean fundamental frequency (fo) and f o standard deviation were significantly correlated to perceived gender for both male and female talkers, and vowel space was significantly correlated only for male talkers. Speaking rate and breathiness measures (CPPS) were not significantly related for either group. Outcomes of this study indicate that adopting a clear speaking style is correlated with increases in perceived femininity. Although the increase was small, some changes associated

  16. Speech in spinocerebellar ataxia.

    Science.gov (United States)

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. Training to Improve Hearing Speech in Noise: Biological Mechanisms

    Science.gov (United States)

    Song, Judy H.; Skoe, Erika; Banai, Karen

    2012-01-01

    We investigated training-related improvements in listening in noise and the biological mechanisms mediating these improvements. Training-related malleability was examined using a program that incorporates cognitively based listening exercises to improve speech-in-noise perception. Before and after training, auditory brainstem responses to a speech syllable were recorded in quiet and multitalker noise from adults who ranged in their speech-in-noise perceptual ability. Controls did not undergo training but were tested at intervals equivalent to the trained subjects. Trained subjects exhibited significant improvements in speech-in-noise perception that were retained 6 months later. Subcortical responses in noise demonstrated training-related enhancements in the encoding of pitch-related cues (the fundamental frequency and the second harmonic), particularly for the time-varying portion of the syllable that is most vulnerable to perceptual disruption (the formant transition region). Subjects with the largest strength of pitch encoding at pretest showed the greatest perceptual improvement. Controls exhibited neither neurophysiological nor perceptual changes. We provide the first demonstration that short-term training can improve the neural representation of cues important for speech-in-noise perception. These results implicate and delineate biological mechanisms contributing to learning success, and they provide a conceptual advance to our understanding of the kind of training experiences that can influence sensory processing in adulthood. PMID:21799207

  18. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  19. Human phoneme recognition depending on speech-intrinsic variability.

    Science.gov (United States)

    Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

    2010-11-01

    The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).

  20. Grasp cueing and joint attention.

    Science.gov (United States)

    Tschentscher, Nadja; Fischer, Martin H

    2008-10-01

    We studied how two different hand posture cues affect joint attention in normal observers. Visual targets appeared over lateralized objects, with different delays after centrally presented hand postures. Attention was cued by either hand direction or the congruency between hand aperture and object size. Participants pressed a button when they detected a target. Direction cues alone facilitated target detection following short delays but aperture cues alone were ineffective. In contrast, when hand postures combined direction and aperture cues, aperture congruency effects without directional congruency effects emerged and persisted, but only for power grips. These results suggest that parallel parameter specification makes joint attention mechanisms exquisitely sensitive to the timing and content of contextual cues.

  1. Compound cueing in free recall

    Science.gov (United States)

    Lohnas, Lynn J.; Kahana, Michael J.

    2013-01-01

    According to the retrieved context theory of episodic memory, the cue for recall of an item is a weighted sum of recently activated cognitive states, including previously recalled and studied items as well as their associations. We show that this theory predicts there should be compound cueing in free recall. Specifically, the temporal contiguity effect should be greater when the two most recently recalled items were studied in contiguous list positions. A meta-analysis of published free recall experiments demonstrates evidence for compound cueing in both conditional response probabilities and inter-response times. To help rule out a rehearsal-based account of these compound cueing effects, we conducted an experiment with immediate, delayed and continual-distractor free recall conditions. Consistent with retrieved context theory but not with a rehearsal-based account, compound cueing was present in all conditions, and was not significantly influenced by the presence of interitem distractors. PMID:23957364

  2. Comparison of different speech tasks among adults who stutter and adults who do not stutter

    Directory of Open Access Journals (Sweden)

    Ana Paula Ritto

    2016-03-01

    Full Text Available OBJECTIVES: In this study, we compared the performance of both fluent speakers and people who stutter in three different speaking situations: monologue speech, oral reading and choral reading. This study follows the assumption that the neuromotor control of speech can be influenced by external auditory stimuli in both speakers who stutter and speakers who do not stutter. METHOD: Seventeen adults who stutter and seventeen adults who do not stutter were assessed in three speaking tasks: monologue, oral reading (solo reading aloud and choral reading (reading in unison with the evaluator. Speech fluency and rate were measured for each task. RESULTS: The participants who stuttered had a lower frequency of stuttering during choral reading than during monologue and oral reading. CONCLUSIONS: According to the dual premotor system model, choral speech enhanced fluency by providing external cues for the timing of each syllable compensating for deficient internal cues.

  3. Seeing the talker’s face supports executive processing of speech in steady state noise

    OpenAIRE

    Sushmit eMishra; Thomas eLunner; Thomas eLunner; Thomas eLunner; Stefan eStenfelt; Stefan eStenfelt; Jerker eRönnberg; Mary eRudner

    2013-01-01

    Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-st...

  4. Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy

    Science.gov (United States)

    Ramirez, Joshua; Mann, Virginia

    2005-08-01

    Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.

  5. Global Repetition Influences Contextual Cueing

    Science.gov (United States)

    Zang, Xuelian; Zinchenko, Artyom; Jia, Lina; Li, Hong

    2018-01-01

    Our visual system has a striking ability to improve visual search based on the learning of repeated ambient regularities, an effect named contextual cueing. Whereas most of the previous studies investigated contextual cueing effect with the same number of repeated and non-repeated search displays per block, the current study focused on whether a global repetition frequency formed by different presentation ratios between the repeated and non-repeated configurations influence contextual cueing effect. Specifically, the number of repeated and non-repeated displays presented in each block was manipulated: 12:12, 20:4, 4:20, and 4:4 in Experiments 1–4, respectively. The results revealed a significant contextual cueing effect when the global repetition frequency is high (≥1:1 ratio) in Experiments 1, 2, and 4, given that processing of repeated displays was expedited relative to non-repeated displays. Nevertheless, the contextual cueing effect reduced to a non-significant level when the repetition frequency reduced to 4:20 in Experiment 3. These results suggested that the presentation frequency of repeated relative to the non-repeated displays could influence the strength of contextual cueing. In other words, global repetition statistics could be a crucial factor to mediate contextual cueing effect. PMID:29636716

  6. Speech Alarms Pilot Study

    Science.gov (United States)

    Sandor, Aniko; Moses, Haifa

    2016-01-01

    Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.

  7. Individual differences in using geometric and featural cues to maintain spatial orientation: cue quantity and cue ambiguity are more important than cue type.

    Science.gov (United States)

    Kelly, Jonathan W; McNamara, Timothy P; Bodenheimer, Bobby; Carr, Thomas H; Rieser, John J

    2009-02-01

    Two experiments explored the role of environmental cues in maintaining spatial orientation (sense of self-location and direction) during locomotion. Of particular interest was the importance of geometric cues (provided by environmental surfaces) and featural cues (nongeometric properties provided by striped walls) in maintaining spatial orientation. Participants performed a spatial updating task within virtual environments containing geometric or featural cues that were ambiguous or unambiguous indicators of self-location and direction. Cue type (geometric or featural) did not affect performance, but the number and ambiguity of environmental cues did. Gender differences, interpreted as a proxy for individual differences in spatial ability and/or experience, highlight the interaction between cue quantity and ambiguity. When environmental cues were ambiguous, men stayed oriented with either one or two cues, whereas women stayed oriented only with two. When environmental cues were unambiguous, women stayed oriented with one cue.

  8. Ear, Hearing and Speech

    DEFF Research Database (Denmark)

    Poulsen, Torben

    2000-01-01

    An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)......An introduction is given to the the anatomy and the function of the ear, basic psychoacoustic matters (hearing threshold, loudness, masking), the speech signal and speech intelligibility. The lecture note is written for the course: Fundamentals of Acoustics and Noise Control (51001)...

  9. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  10. Speech disorder prevention

    Directory of Open Access Journals (Sweden)

    Miladis Fornaris-Méndez

    2017-04-01

    Full Text Available Language therapy has trafficked from a medical focus until a preventive focus. However, difficulties are evidenced in the development of this last task, because he is devoted bigger space to the correction of the disorders of the language. Because the speech disorders is the dysfunction with more frequently appearance, acquires special importance the preventive work that is developed to avoid its appearance. Speech education since early age of the childhood makes work easier for prevent the appearance of speech disorders in the children. The present work has as objective to offer different activities for the prevention of the speech disorders.

  11. Sensory modality of smoking cues modulates neural cue reactivity.

    Science.gov (United States)

    Yalachkov, Yavor; Kaiser, Jochen; Görres, Andreas; Seehaus, Arne; Naumer, Marcus J

    2013-01-01

    Behavioral experiments have demonstrated that the sensory modality of presentation modulates drug cue reactivity. The present study on nicotine addiction tested whether neural responses to smoking cues are modulated by the sensory modality of stimulus presentation. We measured brain activation using functional magnetic resonance imaging (fMRI) in 15 smokers and 15 nonsmokers while they viewed images of smoking paraphernalia and control objects and while they touched the same objects without seeing them. Haptically presented, smoking-related stimuli induced more pronounced neural cue reactivity than visual cues in the left dorsal striatum in smokers compared to nonsmokers. The severity of nicotine dependence correlated positively with the preference for haptically explored smoking cues in the left inferior parietal lobule/somatosensory cortex, right fusiform gyrus/inferior temporal cortex/cerebellum, hippocampus/parahippocampal gyrus, posterior cingulate cortex, and supplementary motor area. These observations are in line with the hypothesized role of the dorsal striatum for the expression of drug habits and the well-established concept of drug-related automatized schemata, since haptic perception is more closely linked to the corresponding object-specific action pattern than visual perception. Moreover, our findings demonstrate that with the growing severity of nicotine dependence, brain regions involved in object perception, memory, self-processing, and motor control exhibit an increasing preference for haptic over visual smoking cues. This difference was not found for control stimuli. Considering the sensory modality of the presented cues could serve to develop more reliable fMRI-specific biomarkers, more ecologically valid experimental designs, and more effective cue-exposure therapies of addiction.

  12. Cognitive-linguistic effort in multidisciplinary stroke rehabilitation: Decreasing vs. increasing cues for word retrieval.

    Science.gov (United States)

    Choe, Yu-Kyong; Foster, Tammie; Asselin, Abigail; LeVander, Meagan; Baird, Jennifer

    2017-04-01

    Approximately 24% of stroke survivors experience co-occurring aphasia and hemiparesis. These individuals typically attend back-to-back therapy sessions. However, sequentially scheduled therapy may trigger physical and mental fatigue and have an adverse impact on treatment outcomes. The current study tested a hypothesis that exerting less effort during a therapy session would reduce overall fatigue and enhance functional recovery. Two stroke survivors chronically challenged by non-fluent aphasia and right hemiparesis sequentially completed verbal naming and upper-limb tasks on their home computers. The level of cognitive-linguistic effort in speech/language practice was manipulated by presenting verbal naming tasks in two conditions: Decreasing cues (i.e., most-to-least support for word retrieval), and Increasing cues (i.e., least-to-most support). The participants completed the same upper-limb exercises throughout the study periods. Both individuals showed a statistically significant advantage of decreasing cues over increasing cues in word retrieval during the practice period, but not at the end of the practice period or thereafter. The participant with moderate aphasia and hemiparesis achieved clinically meaningful gains in upper-limb functions following the decreasing cues condition, but not after the increasing cues condition. Preliminary findings from the current study suggest a positive impact of decreasing cues in the context of multidisciplinary stroke rehabilitation.

  13. Global cue inconsistency diminishes learning of cue validity

    Directory of Open Access Journals (Sweden)

    Tony Wang

    2016-11-01

    Full Text Available We present a novel two-stage probabilistic learning task that examines the participants’ ability to learn and utilize valid cues across several levels of probabilistic feedback. In the first stage, participants sample from one of three cues that gives predictive information about the outcome of the second stage. Participants are rewarded for correct prediction of the outcome in stage two. Only one of the three cues gives valid predictive information and thus participants can maximise their reward by learning to sample from the valid cue. The validity of this predictive information, however, is reinforced across several levels of probabilistic feedback. A second manipulation involved changing the consistency of the predictive information in stage one and the outcome in stage two. The results show that participants, with higher probabilistic feedback, learned to utilise the valid cue. In inconsistent task conditions, however, participants were significantly less successful in utilising higher validity cues. We interpret this result as implying that learning in probabilistic categorization is based on developing a representation of the task that allows for goal-directed action.

  14. Relative cue encoding in the context of sophisticated models of categorization: Separating information from categorization.

    Science.gov (United States)

    Apfelbaum, Keith S; McMurray, Bob

    2015-08-01

    Traditional studies of human categorization often treat the processes of encoding features and cues as peripheral to the question of how stimuli are categorized. However, in domains where the features and cues are less transparent, how information is encoded prior to categorization may constrain our understanding of the architecture of categorization. This is particularly true in speech perception, where acoustic cues to phonological categories are ambiguous and influenced by multiple factors. Here, it is crucial to consider the joint contributions of the information in the input and the categorization architecture. We contrasted accounts that argue for raw acoustic information encoding with accounts that posit that cues are encoded relative to expectations, and investigated how two categorization architectures-exemplar models and back-propagation parallel distributed processing models-deal with each kind of information. Relative encoding, akin to predictive coding, is a form of noise reduction, so it can be expected to improve model accuracy; however, like predictive coding, the use of relative encoding in speech perception by humans is controversial, so results are compared to patterns of human performance, rather than on the basis of overall accuracy. We found that, for both classes of models, in the vast majority of parameter settings, relative cues greatly helped the models approximate human performance. This suggests that expectation-relative processing is a crucial precursor step in phoneme categorization, and that understanding the information content is essential to understanding categorization processes.

  15. Gaze Cueing by Pareidolia Faces

    Directory of Open Access Journals (Sweden)

    Kohske Takahashi

    2013-12-01

    Full Text Available Visual images that are not faces are sometimes perceived as faces (the pareidolia phenomenon. While the pareidolia phenomenon provides people with a strong impression that a face is present, it is unclear how deeply pareidolia faces are processed as faces. In the present study, we examined whether a shift in spatial attention would be produced by gaze cueing of face-like objects. A robust cueing effect was observed when the face-like objects were perceived as faces. The magnitude of the cueing effect was comparable between the face-like objects and a cartoon face. However, the cueing effect was eliminated when the observer did not perceive the objects as faces. These results demonstrated that pareidolia faces do more than give the impression of the presence of faces; indeed, they trigger an additional face-specific attentional process.

  16. Gaze cueing by pareidolia faces.

    Science.gov (United States)

    Takahashi, Kohske; Watanabe, Katsumi

    2013-01-01

    Visual images that are not faces are sometimes perceived as faces (the pareidolia phenomenon). While the pareidolia phenomenon provides people with a strong impression that a face is present, it is unclear how deeply pareidolia faces are processed as faces. In the present study, we examined whether a shift in spatial attention would be produced by gaze cueing of face-like objects. A robust cueing effect was observed when the face-like objects were perceived as faces. The magnitude of the cueing effect was comparable between the face-like objects and a cartoon face. However, the cueing effect was eliminated when the observer did not perceive the objects as faces. These results demonstrated that pareidolia faces do more than give the impression of the presence of faces; indeed, they trigger an additional face-specific attentional process.

  17. Evaluation of multimodal ground cues

    DEFF Research Database (Denmark)

    Nordahl, Rolf; Lecuyer, Anatole; Serafin, Stefania

    2012-01-01

    This chapter presents an array of results on the perception of ground surfaces via multiple sensory modalities,with special attention to non visual perceptual cues, notably those arising from audition and haptics, as well as interactions between them. It also reviews approaches to combining...... synthetic multimodal cues, from vision, haptics, and audition, in order to realize virtual experiences of walking on simulated ground surfaces or other features....

  18. Visual form Cues, Biological Motions, Auditory Cues, and Even Olfactory Cues Interact to Affect Visual Sex Discriminations

    OpenAIRE

    Rick Van Der Zwan; Anna Brooks; Duncan Blair; Coralia Machatch; Graeme Hacker

    2011-01-01

    Johnson and Tassinary (2005) proposed that visually perceived sex is signalled by structural or form cues. They suggested also that biological motion cues signal sex, but do so indirectly. We previously have shown that auditory cues can mediate visual sex perceptions (van der Zwan et al., 2009). Here we demonstrate that structural cues to body shape are alone sufficient for visual sex discriminations but that biological motion cues alone are not. Interestingly, biological motions can resolve ...

  19. Speech recognition in natural background noise.

    Directory of Open Access Journals (Sweden)

    Julien Meyer

    Full Text Available In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR. Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A, reference at 1 meter at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB. Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda. Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for

  20. Speech recognition in natural background noise.

    Science.gov (United States)

    Meyer, Julien; Dentel, Laure; Meunier, Fanny

    2013-01-01

    In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR). Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A), reference at 1 meter) at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB). Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda). Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for future studies.

  1. Collective speech acts

    NARCIS (Netherlands)

    Meijers, A.W.M.; Tsohatzidis, S.L.

    2007-01-01

    From its early development in the 1960s, speech act theory always had an individualistic orientation. It focused exclusively on speech acts performed by individual agents. Paradigmatic examples are ‘I promise that p’, ‘I order that p’, and ‘I declare that p’. There is a single speaker and a single

  2. Private Speech in Ballet

    Science.gov (United States)

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  3. Free Speech Yearbook 1980.

    Science.gov (United States)

    Kane, Peter E., Ed.

    The 11 articles in this collection deal with theoretical and practical freedom of speech issues. The topics covered are (1) the United States Supreme Court and communication theory; (2) truth, knowledge, and a democratic respect for diversity; (3) denial of freedom of speech in Jock Yablonski's campaign for the presidency of the United Mine…

  4. Illustrated Speech Anatomy.

    Science.gov (United States)

    Shearer, William M.

    Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

  5. Free Speech. No. 38.

    Science.gov (United States)

    Kane, Peter E., Ed.

    This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds…

  6. Do We Perceive Others Better than Ourselves? A Perceptual Benefit for Noise-Vocoded Speech Produced by an Average Speaker.

    Directory of Open Access Journals (Sweden)

    William L Schuerman

    Full Text Available In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.

  7. The Galker test of speech reception in noise; associations with background variables, middle ear status, hearing, and language in Danish preschool children.

    Science.gov (United States)

    Lauritsen, Maj-Britt Glenn; Söderström, Margareta; Kreiner, Svend; Dørup, Jens; Lous, Jørgen

    2016-01-01

    We tested "the Galker test", a speech reception in noise test developed for primary care for Danish preschool children, to explore if the children's ability to hear and understand speech was associated with gender, age, middle ear status, and the level of background noise. The Galker test is a 35-item audio-visual, computerized word discrimination test in background noise. Included were 370 normally developed children attending day care center. The children were examined with the Galker test, tympanometry, audiometry, and the Reynell test of verbal comprehension. Parents and daycare teachers completed questionnaires on the children's ability to hear and understand speech. As most of the variables were not assessed using interval scales, non-parametric statistics (Goodman-Kruskal's gamma) were used for analyzing associations with the Galker test score. For comparisons, analysis of variance (ANOVA) was used. Interrelations were adjusted for using a non-parametric graphic model. In unadjusted analyses, the Galker test was associated with gender, age group, language development (Reynell revised scale), audiometry, and tympanometry. The Galker score was also associated with the parents' and day care teachers' reports on the children's vocabulary, sentence construction, and pronunciation. Type B tympanograms were associated with a mean hearing 5-6dB below that of than type A, C1, or C2. In the graphic analysis, Galker scores were closely and significantly related to Reynell test scores (Gamma (G)=0.35), the children's age group (G=0.33), and the day care teachers' assessment of the children's vocabulary (G=0.26). The Galker test of speech reception in noise appears promising as an easy and quick tool for evaluating preschool children's understanding of spoken words in noise, and it correlated well with the day care teachers' reports and less with the parents' reports. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  8. Musician advantage for speech-on-speech perception

    NARCIS (Netherlands)

    Başkent, Deniz; Gaudrain, Etienne

    Evidence for transfer of musical training to better perception of speech in noise has been mixed. Unlike speech-in-noise, speech-on-speech perception utilizes many of the skills that musical training improves, such as better pitch perception and stream segregation, as well as use of higher-level

  9. Speech Production and Speech Discrimination by Hearing-Impaired Children.

    Science.gov (United States)

    Novelli-Olmstead, Tina; Ling, Daniel

    1984-01-01

    Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…

  10. The Production of Emotional Prosody in Varying Degrees of Severity of Apraxia of Speech.

    Science.gov (United States)

    Van Putten, Steffany M.; Walker, Judy P.

    2003-01-01

    A study examined the abilities of three adults with varying degrees of apraxia of speech (AOS) to produce emotional prosody. Acoustic analyses of the subjects' productions revealed that unlike the control subject, the subjects with AOS did not produce differences in duration and amplitude cues to convey different emotions. (Contains references.)…

  11. Children's Responses to Computer-Synthesized Speech in Educational Media: Gender Consistency and Gender Similarity Effects

    Science.gov (United States)

    Lee, Kwan Min; Liao, Katharine; Ryu, Seoungho

    2007-01-01

    This study examines children's social responses to gender cues in synthesized speech in a computer-based instruction setting. Eighty 5th-grade elementary school children were randomly assigned to one of the conditions in a full-factorial 2 (participant gender) x 2 (voice gender) x 2 (content gender) experiment. Results show that children apply…

  12. Use of "um" in the Deceptive Speech of a Convicted Murderer

    Science.gov (United States)

    Villar, Gina; Arciuli, Joanne; Mallard, David

    2012-01-01

    Previous studies have demonstrated a link between language behaviors and deception; however, questions remain about the role of specific linguistic cues, especially in real-life high-stakes lies. This study investigated use of the so-called filler, "um," in externally verifiable truthful versus deceptive speech of a convicted murderer. The data…

  13. Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.

    Science.gov (United States)

    Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A

    2017-09-18

    Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.

  14. Environmental Contamination of Normal Speech.

    Science.gov (United States)

    Harley, Trevor A.

    1990-01-01

    Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…

  15. The effect of filtered speech feedback on the frequency of stuttering

    Science.gov (United States)

    Rami, Manish Krishnakant

    2000-10-01

    whispered speech conditions all decreased the frequency of stuttering while the approximate glottal source did not. It is suggested that articulatory events, chiefly the encoded speech output of the vocal tract origin, afford effective cues and induces fluent speech in people who stutter.

  16. Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

    Science.gov (United States)

    Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

    2018-06-01

    Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. How much does language proficiency by non-native listeners influence speech audiometric tests in noise?

    Science.gov (United States)

    Warzybok, Anna; Brand, Thomas; Wagener, Kirsten C; Kollmeier, Birger

    2015-01-01

    The current study investigates the extent to which the linguistic complexity of three commonly employed speech recognition tests and second language proficiency influence speech recognition thresholds (SRTs) in noise in non-native listeners. SRTs were measured for non-natives and natives using three German speech recognition tests: the digit triplet test (DTT), the Oldenburg sentence test (OLSA), and the Göttingen sentence test (GÖSA). Sixty-four non-native and eight native listeners participated. Non-natives can show native-like SRTs in noise only for the linguistically easy speech material (DTT). Furthermore, the limitation of phonemic-acoustical cues in digit triplets affects speech recognition to the same extent in non-natives and natives. For more complex and less familiar speech materials, non-natives, ranging from basic to advanced proficiency in German, require on average 3-dB better signal-to-noise ratio for the OLSA and 6-dB for the GÖSA to obtain 50% speech recognition compared to native listeners. In clinical audiology, SRT measurements with a closed-set speech test (i.e. DTT for screening or OLSA test for clinical purposes) should be used with non-native listeners rather than open-set speech tests (such as the GÖSA or HINT), especially if a closed-set version in the patient's own native language is available.

  18. APPRECIATING SPEECH THROUGH GAMING

    Directory of Open Access Journals (Sweden)

    Mario T Carreon

    2014-06-01

    Full Text Available This paper discusses the Speech and Phoneme Recognition as an Educational Aid for the Deaf and Hearing Impaired (SPREAD application and the ongoing research on its deployment as a tool for motivating deaf and hearing impaired students to learn and appreciate speech. This application uses the Sphinx-4 voice recognition system to analyze the vocalization of the student and provide prompt feedback on their pronunciation. The packaging of the application as an interactive game aims to provide additional motivation for the deaf and hearing impaired student through visual motivation for them to learn and appreciate speech.

  19. Global Freedom of Speech

    DEFF Research Database (Denmark)

    Binderup, Lars Grassme

    2007-01-01

    , as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...

  20. Gaze Cueing by Pareidolia Faces

    OpenAIRE

    Kohske Takahashi; Katsumi Watanabe

    2013-01-01

    Visual images that are not faces are sometimes perceived as faces (the pareidolia phenomenon). While the pareidolia phenomenon provides people with a strong impression that a face is present, it is unclear how deeply pareidolia faces are processed as faces. In the present study, we examined whether a shift in spatial attention would be produced by gaze cueing of face-like objects. A robust cueing effect was observed when the face-like objects were perceived as faces. The magnitude of the cuei...

  1. Charisma in business speeches

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter

    2016-01-01

    to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences...... between Steve Jobs and Mark Zuckerberg and the investor- and customer-related sections of their speeches support the modern understanding of charisma as a gradual, multiparametric, and context-sensitive concept....

  2. Speech spectrum envelope modeling

    Czech Academy of Sciences Publication Activity Database

    Vích, Robert; Vondra, Martin

    Vol. 4775, - (2007), s. 129-137 ISSN 0302-9743. [COST Action 2102 International Workshop. Vietri sul Mare, 29.03.2007-31.03.2007] R&D Projects: GA AV ČR(CZ) 1ET301710509 Institutional research plan: CEZ:AV0Z20670512 Keywords : speech * speech processing * cepstral analysis Subject RIV: JA - Electronics ; Optoelectronics, Electrical Engineering Impact factor: 0.302, year: 2005

  3. Zebra finches can use positional and transitional cues to distinguish vocal element strings.

    Science.gov (United States)

    Chen, Jiani; Ten Cate, Carel

    2015-08-01

    Learning sequences is of great importance to humans and non-human animals. Many motor and mental actions, such as singing in birds and speech processing in humans, rely on sequential learning. At least two mechanisms are considered to be involved in such learning. The chaining theory proposes that learning of sequences relies on memorizing the transitions between adjacent items, while the positional theory suggests that learners encode the items according to their ordinal position in the sequence. Positional learning is assumed to dominate sequential learning. However, human infants exposed to a string of speech sounds can learn transitional (chaining) cues. So far, it is not clear whether birds, an increasingly important model for examining vocal processing, can do this. In this study we use a Go-Nogo design to examine whether zebra finches can use transitional cues to distinguish artificially constructed strings of song elements. Zebra finches were trained with sequences differing in transitional and positional information and next tested with novel strings sharing positional and transitional similarities with the training strings. The results show that they can attend to both transitional and positional cues and that their sequential coding strategies can be biased toward transitional cues depending on the learning context. This article is part of a Special Issue entitled: In Honor of Jerry Hogan. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Audiovisual integration in children listening to spectrally degraded speech.

    Science.gov (United States)

    Maidment, David W; Kang, Hi Jee; Stewart, Hannah J; Amitay, Sygal

    2015-02-01

    The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Children (n=69) and adults (n=15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in auditory-only or audiovisual conditions. The number of bands was adaptively varied to modulate the degradation of the auditory signal, with the number of bands required for approximately 79% correct identification calculated as the threshold. The youngest children (4- to 5-year-olds) did not benefit from accompanying visual information, in comparison to 6- to 11-year-old children and adults. Audiovisual gain also increased with age in the child sample. The current data suggest that children younger than 6 years of age do not fully utilize visual speech cues to enhance speech perception when the auditory signal is degraded. This evidence not only has implications for understanding the development of speech perception skills in children with normal hearing but may also inform the development of new treatment and intervention strategies that aim to remediate speech perception difficulties in pediatric cochlear implant users.

  5. Audiovisual integration for speech during mid-childhood: Electrophysiological evidence

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer

    2014-01-01

    Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7–8-year-olds and 10–11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. PMID:25463815

  6. Memory for speech and speech for memory.

    Science.gov (United States)

    Locke, J L; Kutz, K J

    1975-03-01

    Thirty kindergarteners, 15 who substituted /w/ for /r/ and 15 with correct articulation, received two perception tests and a memory test that included /w/ and /r/ in minimally contrastive syllables. Although both groups had nearly perfect perception of the experimenter's productions of /w/ and /r/, misarticulating subjects perceived their own tape-recorded w/r productions as /w/. In the memory task these same misarticulating subjects committed significantly more /w/-/r/ confusions in unspoken recall. The discussion considers why people subvocally rehearse; a developmental period in which children do not rehearse; ways subvocalization may aid recall, including motor and acoustic encoding; an echoic store that provides additional recall support if subjects rehearse vocally, and perception of self- and other- produced phonemes by misarticulating children-including its relevance to a motor theory of perception. Evidence is presented that speech for memory can be sufficiently impaired to cause memory disorder. Conceptions that restrict speech disorder to an impairment of communication are challenged.

  7. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...

  8. Using Zebra-speech to study sequential and simultaneous speech segregation in a cochlear-implant simulation.

    Science.gov (United States)

    Gaudrain, Etienne; Carlyon, Robert P

    2013-01-01

    Previous studies have suggested that cochlear implant users may have particular difficulties exploiting opportunities to glimpse clear segments of a target speech signal in the presence of a fluctuating masker. Although it has been proposed that this difficulty is associated with a deficit in linking the glimpsed segments across time, the details of this mechanism are yet to be explained. The present study introduces a method called Zebra-speech developed to investigate the relative contribution of simultaneous and sequential segregation mechanisms in concurrent speech perception, using a noise-band vocoder to simulate cochlear implants. One experiment showed that the saliency of the difference between the target and the masker is a key factor for Zebra-speech perception, as it is for sequential segregation. Furthermore, forward masking played little or no role, confirming that intelligibility was not limited by energetic masking but by across-time linkage abilities. In another experiment, a binaural cue was used to distinguish the target and the masker. It showed that the relative contribution of simultaneous and sequential segregation depended on the spectral resolution, with listeners relying more on sequential segregation when the spectral resolution was reduced. The potential of Zebra-speech as a segregation enhancement strategy for cochlear implants is discussed.

  9. Analysis of engagement behavior in children during dyadic interactions using prosodic cues.

    Science.gov (United States)

    Gupta, Rahul; Bone, Daniel; Lee, Sungbok; Narayanan, Shrikanth

    2016-05-01

    Child engagement is defined as the interaction of a child with his/her environment in a contextually appropriate manner. Engagement behavior in children is linked to socio-emotional and cognitive state assessment with enhanced engagement identified with improved skills. A vast majority of studies however rely solely, and often implicitly, on subjective perceptual measures of engagement. Access to automatic quantification could assist researchers/clinicians to objectively interpret engagement with respect to a target behavior or condition, and furthermore inform mechanisms for improving engagement in various settings. In this paper, we present an engagement prediction system based exclusively on vocal cues observed during structured interaction between a child and a psychologist involving several tasks. Specifically, we derive prosodic cues that capture engagement levels across the various tasks. Our experiments suggest that a child's engagement is reflected not only in the vocalizations, but also in the speech of the interacting psychologist. Moreover, we show that prosodic cues are informative of the engagement phenomena not only as characterized over the entire task (i.e., global cues), but also in short term patterns (i.e., local cues). We perform a classification experiment assigning the engagement of a child into three discrete levels achieving an unweighted average recall of 55.8% (chance is 33.3%). While the systems using global cues and local level cues are each statistically significant in predicting engagement, we obtain the best results after fusing these two components. We perform further analysis of the cues at local and global levels to achieve insights linking specific prosodic patterns to the engagement phenomenon. We observe that while the performance of our model varies with task setting and interacting psychologist, there exist universal prosodic patterns reflective of engagement.

  10. Temporal predictive mechanisms modulate motor reaction time during initiation and inhibition of speech and hand movement.

    Science.gov (United States)

    Johari, Karim; Behroozmand, Roozbeh

    2017-08-01

    Skilled movement is mediated by motor commands executed with extremely fine temporal precision. The question of how the brain incorporates temporal information to perform motor actions has remained unanswered. This study investigated the effect of stimulus temporal predictability on response timing of speech and hand movement. Subjects performed a randomized vowel vocalization or button press task in two counterbalanced blocks in response to temporally-predictable and unpredictable visual cues. Results indicated that speech and hand reaction time was decreased for predictable compared with unpredictable stimuli. This finding suggests that a temporal predictive code is established to capture temporal dynamics of sensory cues in order to produce faster movements in responses to predictable stimuli. In addition, results revealed a main effect of modality, indicating faster hand movement compared with speech. We suggest that this effect is accounted for by the inherent complexity of speech production compared with hand movement. Lastly, we found that movement inhibition was faster than initiation for both hand and speech, suggesting that movement initiation requires a longer processing time to coordinate activities across multiple regions in the brain. These findings provide new insights into the mechanisms of temporal information processing during initiation and inhibition of speech and hand movement. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Estimating location without external cues.

    Directory of Open Access Journals (Sweden)

    Allen Cheung

    2014-10-01

    Full Text Available The ability to determine one's location is fundamental to spatial navigation. Here, it is shown that localization is theoretically possible without the use of external cues, and without knowledge of initial position or orientation. With only error-prone self-motion estimates as input, a fully disoriented agent can, in principle, determine its location in familiar spaces with 1-fold rotational symmetry. Surprisingly, localization does not require the sensing of any external cue, including the boundary. The combination of self-motion estimates and an internal map of the arena provide enough information for localization. This stands in conflict with the supposition that 2D arenas are analogous to open fields. Using a rodent error model, it is shown that the localization performance which can be achieved is enough to initiate and maintain stable firing patterns like those of grid cells, starting from full disorientation. Successful localization was achieved when the rotational asymmetry was due to the external boundary, an interior barrier or a void space within an arena. Optimal localization performance was found to depend on arena shape, arena size, local and global rotational asymmetry, and the structure of the path taken during localization. Since allothetic cues including visual and boundary contact cues were not present, localization necessarily relied on the fusion of idiothetic self-motion cues and memory of the boundary. Implications for spatial navigation mechanisms are discussed, including possible relationships with place field overdispersion and hippocampal reverse replay. Based on these results, experiments are suggested to identify if and where information fusion occurs in the mammalian spatial memory system.

  12. Music and Speech Perception in Children Using Sung Speech.

    Science.gov (United States)

    Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie

    2018-01-01

    This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.

  13. Practical speech user interface design

    CERN Document Server

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  14. The Influence of Cue Reliability and Cue Representation on Spatial Reorientation in Young Children

    Science.gov (United States)

    Lyons, Ian M.; Huttenlocher, Janellen; Ratliff, Kristin R.

    2014-01-01

    Previous studies of children's reorientation have focused on cue representation (e.g., whether cues are geometric) as a predictor of performance but have not addressed cue reliability (the regularity of the relation between a given cue and an outcome) as a predictor of performance. Here we address both factors within the same series of…

  15. Cues for localization in the horizontal plane

    DEFF Research Database (Denmark)

    Jeppesen, Jakob; Møller, Henrik

    2005-01-01

    manipulated in HRTFs used for binaural synthesis of sound in the horizontal plane. The manipulation of cues resulted in HRTFs with cues ranging from correct combinations of spectral information and ITDs to combinations with severely conflicting cues. Both the ITD and the spectral information seem...

  16. The role of spectral and temporal cues in voice gender discrimination by normal-hearing listeners and cochlear implant users.

    Science.gov (United States)

    Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J

    2004-09-01

    The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.

  17. Detection of Clinical Depression in Adolescents’ Speech During Family Interactions

    Science.gov (United States)

    Low, Lu-Shih Alex; Maddage, Namunu C.; Lech, Margaret; Sheeber, Lisa B.; Allen, Nicholas B.

    2013-01-01

    The properties of acoustic speech have previously been investigated as possible cues for depression in adults. However, these studies were restricted to small populations of patients and the speech recordings were made during patients’ clinical interviews or fixed-text reading sessions. Symptoms of depression often first appear during adolescence at a time when the voice is changing, in both males and females, suggesting that specific studies of these phenomena in adolescent populations are warranted. This study investigated acoustic correlates of depression in a large sample of 139 adolescents (68 clinically depressed and 71 controls). Speech recordings were made during naturalistic interactions between adolescents and their parents. Prosodic, cepstral, spectral, and glottal features, as well as features derived from the Teager energy operator (TEO), were tested within a binary classification framework. Strong gender differences in classification accuracy were observed. The TEO-based features clearly outperformed all other features and feature combinations, providing classification accuracy ranging between 81%–87% for males and 72%–79% for females. Close, but slightly less accurate, results were obtained by combining glottal features with prosodic and spectral features (67%–69% for males and 70%–75% for females). These findings indicate the importance of nonlinear mechanisms associated with the glottal flow formation as cues for clinical depression. PMID:21075715

  18. Under-resourced speech recognition based on the speech manifold

    CSIR Research Space (South Africa)

    Sahraeian, R

    2015-09-01

    Full Text Available Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a...

  19. The role of periodicity in perceiving speech in quiet and in background noise.

    Science.gov (United States)

    Steinmetzger, Kurt; Rosen, Stuart

    2015-12-01

    The ability of normal-hearing listeners to perceive sentences in quiet and in background noise was investigated in a variety of conditions mixing the presence and absence of periodicity (i.e., voicing) in both target and masker. Experiment 1 showed that in quiet, aperiodic noise-vocoded speech and speech with a natural amount of periodicity were equally intelligible, while fully periodic speech was much harder to understand. In Experiments 2 and 3, speech reception thresholds for these targets were measured in the presence of four different maskers: speech-shaped noise, harmonic complexes with a dynamically varying F0 contour, and 10 Hz amplitude-modulated versions of both. For experiment 2, results of experiment 1 were used to identify conditions with equal intelligibility in quiet, while in experiment 3 target intelligibility in quiet was near ceiling. In the presence of a masker, periodicity in the target speech mattered little, but listeners strongly benefited from periodicity in the masker. Substantial fluctuating-masker benefits required the target speech to be almost perfectly intelligible in quiet. In summary, results suggest that the ability to exploit periodicity cues may be an even more important factor when attempting to understand speech embedded in noise than the ability to benefit from masker fluctuations.

  20. The Relationship Between Spectral Modulation Detection and Speech Recognition: Adult Versus Pediatric Cochlear Implant Recipients.

    Science.gov (United States)

    Gifford, René H; Noble, Jack H; Camarata, Stephen M; Sunderhaus, Linsey W; Dwyer, Robert T; Dawant, Benoit M; Dietrich, Mary S; Labadie, Robert F

    2018-01-01

    Adult cochlear implant (CI) recipients demonstrate a reliable relationship between spectral modulation detection and speech understanding. Prior studies documenting this relationship have focused on postlingually deafened adult CI recipients-leaving an open question regarding the relationship between spectral resolution and speech understanding for adults and children with prelingual onset of deafness. Here, we report CI performance on the measures of speech recognition and spectral modulation detection for 578 CI recipients including 477 postlingual adults, 65 prelingual adults, and 36 prelingual pediatric CI users. The results demonstrated a significant correlation between spectral modulation detection and various measures of speech understanding for 542 adult CI recipients. For 36 pediatric CI recipients, however, there was no significant correlation between spectral modulation detection and speech understanding in quiet or in noise nor was spectral modulation detection significantly correlated with listener age or age at implantation. These findings suggest that pediatric CI recipients might not depend upon spectral resolution for speech understanding in the same manner as adult CI recipients. It is possible that pediatric CI users are making use of different cues, such as those contained within the temporal envelope, to achieve high levels of speech understanding. Further investigation is warranted to investigate the relationship between spectral and temporal resolution and speech recognition to describe the underlying mechanisms driving peripheral auditory processing in pediatric CI users.

  1. Speech Alarms Pilot Study

    Science.gov (United States)

    Sandor, A.; Moses, H. R.

    2016-01-01

    Currently on the International Space Station (ISS) and other space vehicles Caution & Warning (C&W) alerts are represented with various auditory tones that correspond to the type of event. This system relies on the crew's ability to remember what each tone represents in a high stress, high workload environment when responding to the alert. Furthermore, crew receive a year or more in advance of the mission that makes remembering the semantic meaning of the alerts more difficult. The current system works for missions conducted close to Earth where ground operators can assist as needed. On long duration missions, however, they will need to work off-nominal events autonomously. There is evidence that speech alarms may be easier and faster to recognize, especially during an off-nominal event. The Information Presentation Directed Research Project (FY07-FY09) funded by the Human Research Program included several studies investigating C&W alerts. The studies evaluated tone alerts currently in use with NASA flight deck displays along with candidate speech alerts. A follow-on study used four types of speech alerts to investigate how quickly various types of auditory alerts with and without a speech component - either at the beginning or at the end of the tone - can be identified. Even though crew were familiar with the tone alert from training or direct mission experience, alerts starting with a speech component were identified faster than alerts starting with a tone. The current study replicated the results from the previous study in a more rigorous experimental design to determine if the candidate speech alarms are ready for transition to operations or if more research is needed. Four types of alarms (caution, warning, fire, and depressurization) were presented to participants in both tone and speech formats in laboratory settings and later in the Human Exploration Research Analog (HERA). In the laboratory study, the alerts were presented by software and participants were

  2. Intelligibility of speech of children with speech and sound disorders

    OpenAIRE

    Ivetac, Tina

    2014-01-01

    The purpose of this study is to examine speech intelligibility of children with primary speech and sound disorders aged 3 to 6 years in everyday life. The research problem is based on the degree to which parents or guardians, immediate family members (sister, brother, grandparents), extended family members (aunt, uncle, cousin), child's friends, other acquaintances, child's teachers and strangers understand the speech of children with speech sound disorders. We examined whether the level ...

  3. Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

    NARCIS (Netherlands)

    Huijbregts, M.A.H.; de Jong, Franciska M.G.

    In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because

  4. Tackling the complexity in speech

    DEFF Research Database (Denmark)

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  5. Visual cues for data mining

    Science.gov (United States)

    Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.

    1996-04-01

    This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.

  6. Eliciting nicotine craving with virtual smoking cues.

    Science.gov (United States)

    Gamito, Pedro; Oliveira, Jorge; Baptista, André; Morais, Diogo; Lopes, Paulo; Rosa, Pedro; Santos, Nuno; Brito, Rodrigo

    2014-08-01

    Craving is a strong desire to consume that emerges in every case of substance addiction. Previous studies have shown that eliciting craving with an exposure cues protocol can be a useful option for the treatment of nicotine dependence. Thus, the main goal of this study was to develop a virtual platform in order to induce craving in smokers. Fifty-five undergraduate students were randomly assigned to two different virtual environments: high arousal contextual cues and low arousal contextual cues scenarios (17 smokers with low nicotine dependency were excluded). An eye-tracker system was used to evaluate attention toward these cues. Eye fixation on smoking-related cues differed between smokers and nonsmokers, indicating that smokers focused more often on smoking-related cues than nonsmokers. Self-reports of craving are in agreement with these results and suggest a significant increase in craving after exposure to smoking cues. In sum, these data support the use of virtual environments for eliciting craving.

  7. Experience with a second language affects the use of fundamental frequency in speech segmentation

    Science.gov (United States)

    Broersma, Mirjam; Cho, Taehong; Kim, Sahyang; Martínez-García, Maria Teresa; Connell, Katrina

    2017-01-01

    This study investigates whether listeners’ experience with a second language learned later in life affects their use of fundamental frequency (F0) as a cue to word boundaries in the segmentation of an artificial language (AL), particularly when the cues to word boundaries conflict between the first language (L1) and second language (L2). F0 signals phrase-final (and thus word-final) boundaries in French but word-initial boundaries in English. Participants were functionally monolingual French listeners, functionally monolingual English listeners, bilingual L1-English L2-French listeners, and bilingual L1-French L2-English listeners. They completed the AL-segmentation task with F0 signaling word-final boundaries or without prosodic cues to word boundaries (monolingual groups only). After listening to the AL, participants completed a forced-choice word-identification task in which the foils were either non-words or part-words. The results show that the monolingual French listeners, but not the monolingual English listeners, performed better in the presence of F0 cues than in the absence of such cues. Moreover, bilingual status modulated listeners’ use of F0 cues to word-final boundaries, with bilingual French listeners performing less accurately than monolingual French listeners on both word types but with bilingual English listeners performing more accurately than monolingual English listeners on non-words. These findings not only confirm that speech segmentation is modulated by the L1, but also newly demonstrate that listeners’ experience with the L2 (French or English) affects their use of F0 cues in speech segmentation. This suggests that listeners’ use of prosodic cues to word boundaries is adaptive and non-selective, and can change as a function of language experience. PMID:28738093

  8. Innovative Speech Reconstructive Surgery

    OpenAIRE

    Hashem Shemshadi

    2003-01-01

    Proper speech functioning in human being, depends on the precise coordination and timing balances in a series of complex neuro nuscular movements and actions. Starting from the prime organ of energy source of expelled air from respirato y system; deliver such air to trigger vocal cords; swift changes of this phonatory episode to a comprehensible sound in RESONACE and final coordination of all head and neck structures to elicit final speech in ...

  9. The chairman's speech

    International Nuclear Information System (INIS)

    Allen, A.M.

    1986-01-01

    The paper contains a transcript of a speech by the chairman of the UKAEA, to mark the publication of the 1985/6 annual report. The topics discussed in the speech include: the Chernobyl accident and its effect on public attitudes to nuclear power, management and disposal of radioactive waste, the operation of UKAEA as a trading fund, and the UKAEA development programmes. The development programmes include work on the following: fast reactor technology, thermal reactors, reactor safety, health and safety aspects of water cooled reactors, the Joint European Torus, and under-lying research. (U.K.)

  10. The influence of masker type on early reflection processing and speech intelligibility (L)

    DEFF Research Database (Denmark)

    Arweiler, Iris; Buchholz, Jörg M.; Dau, Torsten

    2013-01-01

    Arweiler and Buchholz [J. Acoust. Soc. Am. 130, 996-1005 (2011)] showed that, while the energy of early reflections (ERs) in a room improves speech intelligibility, the benefit is smaller than that provided by the energy of the direct sound (DS). In terms of integration of ERs and DS, binaural...... listening did not provide a benefit from ERs apart from a binaural energy summation, such that monaural auditory processing could account for the data. However, a diffuse speech shaped noise (SSN) was used in the speech intelligibility experiments, which does not provide distinct binaural cues...... to the auditory system. In the present study, the monaural and binaural benefit from ERs for speech intelligibility was investigated using three directional maskers presented from 90° azimuth: a SSN, a multi-talker babble, and a reversed two-talker masker. For normal-hearing as well as hearing-impaired listeners...

  11. The minor third communicates sadness in speech, mirroring its use in music.

    Science.gov (United States)

    Curtis, Meagan E; Bharucha, Jamshed J

    2010-06-01

    There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.

  12. Visualizing structures of speech expressiveness

    DEFF Research Database (Denmark)

    Herbelin, Bruno; Jensen, Karl Kristoffer; Graugaard, Lars

    2008-01-01

    Speech is both beautiful and informative. In this work, a conceptual study of the speech, through investigation of the tower of Babel, the archetypal phonemes, and a study of the reasons of uses of language is undertaken in order to create an artistic work investigating the nature of speech. The ....... The artwork is presented at the Re:New festival in May 2008....

  13. Seeing the talker’s face supports executive processing of speech in steady state noise

    Directory of Open Access Journals (Sweden)

    Sushmit eMishra

    2013-11-01

    Full Text Available Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013 along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity. Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

  14. Seeing the talker's face supports executive processing of speech in steady state noise.

    Science.gov (United States)

    Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

    2013-01-01

    Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

  15. Seeing the talker’s face supports executive processing of speech in steady state noise

    Science.gov (United States)

    Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

    2013-01-01

    Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills. PMID:24324411

  16. Workshop: Welcoming speech

    International Nuclear Information System (INIS)

    Lummerzheim, D.

    1994-01-01

    The welcoming speech underlines the fact that any validation process starting with calculation methods and ending with studies on the long-term behaviour of a repository system can only be effected through laboratory, field and natural-analogue studies. The use of natural analogues (NA) is to secure the biosphere and to verify whether this safety really exists. (HP) [de

  17. Hearing speech in music.

    Science.gov (United States)

    Ekström, Seth-Reino; Borg, Erik

    2011-01-01

    The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (Ptempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (Pmusic offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  18. Hearing speech in music

    Directory of Open Access Journals (Sweden)

    Seth-Reino Ekström

    2011-01-01

    Full Text Available The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA noise and speech spectrum-filtered noise (SPN]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA. The results showed a significant effect of piano performance speed and octave (P<.01. Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01 and SPN (P<.05. Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01, but there were smaller differences between masking conditions (P<.01. It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings.

  19. Free Speech Yearbook 1979.

    Science.gov (United States)

    Kane, Peter E., Ed.

    The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…

  20. Prosodic cues to word order: what level of representation?

    Directory of Open Access Journals (Sweden)

    Carline eBernard

    2012-10-01

    Full Text Available Within language, systematic correlations exist between syntactic structure and prosody. Prosodic prominence, for instance, falls on the complement and not the head of syntactic phrases, and its realization depends on the phrasal position of the prominent element. Thus, in Japanese, a functor-final language, prominence is phrase-initial and realized as increased pitch (^Tōkyō ni ‘Tokyo to’, whereas in French, English or Italian, functor-initial languages, it manifests itself as phrase-final lengthening (to Rome. Prosody is readily available in the linguistic signal even to the youngest infants. It has, therefore, been proposed that young learners might be able to exploit its correlations with syntax to bootstrap language structure. In this study, we tested this hypothesis, investigating how 8-month-old monolingual French infants processed an artificial grammar manipulating the relative position of prosodic prominence and word frequency. In Condition 1, we created a speech stream in which the two cues, prosody and frequency, were aligned, frequent words being prosodically non-prominent and infrequent ones being prominent, as is the case in natural language (functors are prosodically minimal compared to content words. In Condition 2, the two cues were misaligned, with frequent words carrying prosodic prominence, unlike in natural language. After familiarization with the aligned or the misaligned stream in a headturn preference procedure, we tested infants’ preference for test items having a frequent word initial or a frequent word final word order. We found that infants’ familiarized with the aligned stream showed the expected preference for the frequent word initial test items, mimicking the functor-initial word order of French. Infants in the misaligned condition showed no preference. These results suggest that infants are able to use word frequency and prosody as early cues to word order and they integrate them into a coherent

  1. Nobel peace speech

    Directory of Open Access Journals (Sweden)

    Joshua FRYE

    2017-07-01

    Full Text Available The Nobel Peace Prize has long been considered the premier peace prize in the world. According to Geir Lundestad, Secretary of the Nobel Committee, of the 300 some peace prizes awarded worldwide, “none is in any way as well known and as highly respected as the Nobel Peace Prize” (Lundestad, 2001. Nobel peace speech is a unique and significant international site of public discourse committed to articulating the universal grammar of peace. Spanning over 100 years of sociopolitical history on the world stage, Nobel Peace Laureates richly represent an important cross-section of domestic and international issues increasingly germane to many publics. Communication scholars’ interest in this rhetorical genre has increased in the past decade. Yet, the norm has been to analyze a single speech artifact from a prestigious or controversial winner rather than examine the collection of speeches for generic commonalities of import. In this essay, we analyze the discourse of Nobel peace speech inductively and argue that the organizing principle of the Nobel peace speech genre is the repetitive form of normative liberal principles and values that function as rhetorical topoi. These topoi include freedom and justice and appeal to the inviolable, inborn right of human beings to exercise certain political and civil liberties and the expectation of equality of protection from totalitarian and tyrannical abuses. The significance of this essay to contemporary communication theory is to expand our theoretical understanding of rhetoric’s role in the maintenance and development of an international and cross-cultural vocabulary for the grammar of peace.

  2. Awareness of rhythm patterns in speech and music in children with specific language impairments

    Directory of Open Access Journals (Sweden)

    Ruth eCumming

    2015-12-01

    Full Text Available Children with specific language impairments (SLIs show impaired perception and production of language, and also show impairments in perceiving auditory cues to rhythm (amplitude rise time [ART] and sound duration and in tapping to a rhythmic beat. Here we explore potential links between language development and rhythm perception in 45 children with SLI and 50 age-matched controls. We administered three rhythmic tasks, a musical beat detection task, a tapping-to-music task, and a novel music/speech task, which varied rhythm and pitch cues independently or together in both speech and music. Via low-pass filtering, the music sounded as though it was played from a low-quality radio and the speech sounded as though it was muffled (heard behind the door. We report data for all of the SLI children (N = 45, IQ varying, as well as for two independent subgroupings with intact IQ. One subgroup, Pure SLI, had intact phonology and reading (N=16, the other, SLI PPR (N=15, had impaired phonology and reading. When IQ varied (all SLI children, we found significant group differences in all the rhythmic tasks. For the Pure SLI group, there were rhythmic impairments in the tapping task only. For children with SLI and poor phonology (SLI PPR, group differences were found in all of the filtered speech/music AXB tasks. We conclude that difficulties with rhythmic cues in both speech and music are present in children with SLIs, but that some rhythmic measures are more sensitive than others. The data are interpreted within a ‘prosodic phrasing’ hypothesis, and we discuss the potential utility of rhythmic and musical interventions in remediating speech and language difficulties in children.

  3. Retrieval-induced forgetting and interference between cues:Training a cue-outcome association attenuates retrieval by alternative cues

    OpenAIRE

    Ortega-Castro, Nerea; Vadillo Nistal, Miguel

    2013-01-01

    Some researchers have attempted to determine whether situations in which a single cue is paired with several outcomes (A-B, A-C interference or interference between outcomes) involve the same learning and retrieval mechanisms as situations in which several cues are paired with a single outcome (A-B, C-B interference or interference between cues). Interestingly, current research on a related effect, which is known as retrieval-induced forgetting, can illuminate this debate. Most retrieval-indu...

  4. Metaheuristic applications to speech enhancement

    CERN Document Server

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  5. A glimpsing account of the role of temporal fine structure information in speech recognition.

    Science.gov (United States)

    Apoux, Frédéric; Healy, Eric W

    2013-01-01

    Many behavioral studies have reported a significant decrease in intelligibility when the temporal fine structure (TFS) of a sound mixture is replaced with noise or tones (i.e., vocoder processing). This finding has led to the conclusion that TFS information is critical for speech recognition in noise. How the normal -auditory system takes advantage of the original TFS, however, remains unclear. Three -experiments on the role of TFS in noise are described. All three experiments measured speech recognition in various backgrounds while manipulating the envelope, TFS, or both. One experiment tested the hypothesis that vocoder processing may artificially increase the apparent importance of TFS cues. Another experiment evaluated the relative contribution of the target and masker TFS by disturbing only the TFS of the target or that of the masker. Finally, a last experiment evaluated the -relative contribution of envelope and TFS information. In contrast to previous -studies, however, the original envelope and TFS were both preserved - to some extent - in all conditions. Overall, the experiments indicate a limited influence of TFS and suggest that little speech information is extracted from the TFS. Concomitantly, these experiments confirm that most speech information is carried by the temporal envelope in real-world conditions. When interpreted within the framework of the glimpsing model, the results of these experiments suggest that TFS is primarily used as a grouping cue to select the time-frequency regions -corresponding to the target speech signal.

  6. When to Take a Gesture Seriously: On How We Use and Prioritize Communicative Cues.

    Science.gov (United States)

    Gunter, Thomas C; Weinbrenner, J E Douglas

    2017-08-01

    When people talk, their speech is often accompanied by gestures. Although it is known that co-speech gestures can influence face-to-face communication, it is currently unclear to what extent they are actively used and under which premises they are prioritized to facilitate communication. We investigated these open questions in two experiments that varied how pointing gestures disambiguate the utterances of an interlocutor. Participants, whose event-related brain responses were measured, watched a video, where an actress was interviewed about, for instance, classical literature (e.g., Goethe and Shakespeare). While responding, the actress pointed systematically to the left side to refer to, for example, Goethe, or to the right to refer to Shakespeare. Her final statement was ambiguous and combined with a pointing gesture. The P600 pattern found in Experiment 1 revealed that, when pointing was unreliable, gestures were only monitored for their cue validity and not used for reference tracking related to the ambiguity. However, when pointing was a valid cue (Experiment 2), it was used for reference tracking, as indicated by a reduced N400 for pointing. In summary, these findings suggest that a general prioritization mechanism is in use that constantly monitors and evaluates the use of communicative cues against communicative priors on the basis of accumulated error information.

  7. Evidence for a perception of prosodic cues in bat communication: contact call classification by Megaderma lyra.

    Science.gov (United States)

    Janssen, Simone; Schmidt, Sabine

    2009-07-01

    The perception of prosodic cues in human speech may be rooted in mechanisms common to mammals. The present study explores to what extent bats use rhythm and frequency, typically carrying prosodic information in human speech, for the classification of communication call series. Using a two-alternative, forced choice procedure, we trained Megaderma lyra to discriminate between synthetic contact call series differing in frequency, rhythm on level of calls and rhythm on level of call series, and measured the classification performance for stimuli differing in only one, or two, of the above parameters. A comparison with predictions from models based on one, combinations of two, or all, parameters revealed that the bats based their decision predominantly on frequency and in addition on rhythm on the level of call series, whereas rhythm on level of calls was not taken into account in this paradigm. Moreover, frequency and rhythm on the level of call series were evaluated independently. Our results show that parameters corresponding to prosodic cues in human languages are perceived and evaluated by bats. Thus, these necessary prerequisites for a communication via prosodic structures in mammals have evolved far before human speech.

  8. A frequency bin-wise nonlinear masking algorithm in convolutive mixtures for speech segregation.

    Science.gov (United States)

    Chi, Tai-Shih; Huang, Ching-Wen; Chou, Wen-Sheng

    2012-05-01

    A frequency bin-wise nonlinear masking algorithm is proposed in the spectrogram domain for speech segregation in convolutive mixtures. The contributive weight from each speech source to a time-frequency unit of the mixture spectrogram is estimated by a nonlinear function based on location cues. For each sound source, a non-binary mask is formed from the estimated weights and is multiplied to the mixture spectrogram to extract the sound. Head-related transfer functions (HRTFs) are used to simulate convolutive sound mixtures perceived by listeners. Simulation results show our proposed method outperforms convolutive independent component analysis and degenerate unmixing and estimation technique methods in almost all test conditions.

  9. The impact of brief restriction to articulation on children's subsequent speech production.

    Science.gov (United States)

    Seidl, Amanda; Brosseau-Lapré, Françoise; Goffman, Lisa

    2018-02-01

    This project explored whether disruption of articulation during listening impacts subsequent speech production in 4-yr-olds with and without speech sound disorder (SSD). During novel word learning, typically-developing children showed effects of articulatory disruption as revealed by larger differences between two acoustic cues to a sound contrast, but children with SSD were unaffected by articulatory disruption. Findings suggest that, when typically developing 4-yr-olds experience an articulatory disruption during a listening task, the children's subsequent production is affected. Children with SSD show less influence of articulatory experience during perception, which could be the result of impaired or attenuated ties between perception and articulation.

  10. Within-subjects comparison of the HiRes and Fidelity120 speech processing strategies: speech perception and its relation to place-pitch sensitivity.

    Science.gov (United States)

    Donaldson, Gail S; Dawson, Patricia K; Borden, Lamar Z

    2011-01-01

    Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many cochlear implant (CI) users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 wks during the main study; a subset of five subjects used Fidelity120 for three additional months after the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency, vowel F2 frequency, and consonant place of articulation; overall transmitted information for vowels and consonants; and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle, and basal regions of the implanted array using a psychophysical pitch-ranking task. With one exception, there was no effect of strategy (HiRes versus Fidelity120) on the speech measures tested, either during the main study (N = 10) or after extended use of Fidelity120 (N = 5). The exception was a small but significant advantage for HiRes over Fidelity120 for consonant perception during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 wks or longer

  11. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    Science.gov (United States)

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  12. Post-cueing deficits with maintained cueing benefits in patients with Parkinson's disease dementia

    Directory of Open Access Journals (Sweden)

    Susanne eGräber

    2014-11-01

    Full Text Available In Parkinson’s disease (PD internal cueing mechanisms are impaired leading to symptoms such as like hypokinesia. However external cues can improve movement execution by using cortical resources. These cortical processes can be affected by cognitive decline in dementia.It is still unclear how dementia in PD influences external cueing. We investigated a group of 25 PD patients with dementia (PDD and 25 non-demented PD patients (PDnD matched by age, sex and disease duration in a simple reaction time (SRT task using an additional acoustic cue. PDD patients benefited from the additional cue in similar magnitude as did PDnD patients. However, withdrawal of the cue led to a significantly increased reaction time in the PDD group compared to the PDnD patients. Our results indicate that even PDD patients can benefit from strategies using external cue presentation but the process of cognitive worsening can reduce the effect when cues are withdrawn.

  13. Cue-reactors: individual differences in cue-induced craving after food or smoking abstinence.

    Science.gov (United States)

    Mahler, Stephen V; de Wit, Harriet

    2010-11-10

    Pavlovian conditioning plays a critical role in both drug addiction and binge eating. Recent animal research suggests that certain individuals are highly sensitive to conditioned cues, whether they signal food or drugs. Are certain humans also more reactive to both food and drug cues? We examined cue-induced craving for both cigarettes and food, in the same individuals (n = 15 adult smokers). Subjects viewed smoking-related or food-related images after abstaining from either smoking or eating. Certain individuals reported strong cue-induced craving after both smoking and food cues. That is, subjects who reported strong cue-induced craving for cigarettes also rated stronger cue-induced food craving. In humans, like in nonhumans, there may be a "cue-reactive" phenotype, consisting of individuals who are highly sensitive to conditioned stimuli. This finding extends recent reports from nonhuman studies. Further understanding this subgroup of smokers may allow clinicians to individually tailor therapies for smoking cessation.

  14. Visible propagation from invisible exogenous cueing.

    Science.gov (United States)

    Lin, Zhicheng; Murray, Scott O

    2013-09-20

    Perception and performance is affected not just by what we see but also by what we do not see-inputs that escape our awareness. While conscious processing and unconscious processing have been assumed to be separate and independent, here we report the propagation of unconscious exogenous cueing as determined by conscious motion perception. In a paradigm combining masked exogenous cueing and apparent motion, we show that, when an onset cue was rendered invisible, the unconscious exogenous cueing effect traveled, manifesting at uncued locations (4° apart) in accordance with conscious perception of visual motion; the effect diminished when the cue-to-target distance was 8° apart. In contrast, conscious exogenous cueing manifested in both distances. Further evidence reveals that the unconscious and conscious nonretinotopic effects could not be explained by an attentional gradient, nor by bottom-up, energy-based motion mechanisms, but rather they were subserved by top-down, tracking-based motion mechanisms. We thus term these effects mobile cueing. Taken together, unconscious mobile cueing effects (a) demonstrate a previously unknown degree of flexibility of unconscious exogenous attention; (b) embody a simultaneous dissociation and association of attention and consciousness, in which exogenous attention can occur without cue awareness ("dissociation"), yet at the same time its effect is contingent on conscious motion tracking ("association"); and (c) underscore the interaction of conscious and unconscious processing, providing evidence for an unconscious effect that is not automatic but controlled.

  15. Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores

    NARCIS (Netherlands)

    Gallardo, L.F.; Möller, S.; Beerends, J.

    2017-01-01

    The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility

  16. Contextual cueing by global features

    Science.gov (United States)

    Kunar, Melina A.; Flusberg, Stephen J.; Wolfe, Jeremy M.

    2008-01-01

    In visual search tasks, attention can be guided to a target item, appearing amidst distractors, on the basis of simple features (e.g. find the red letter among green). Chun and Jiang’s (1998) “contextual cueing” effect shows that RTs are also speeded if the spatial configuration of items in a scene is repeated over time. In these studies we ask if global properties of the scene can speed search (e.g. if the display is mostly red, then the target is at location X). In Experiment 1a, the overall background color of the display predicted the target location. Here the predictive color could appear 0, 400 or 800 msec in advance of the search array. Mean RTs are faster in predictive than in non-predictive conditions. However, there is little improvement in search slopes. The global color cue did not improve search efficiency. Experiments 1b-1f replicate this effect using different predictive properties (e.g. background orientation/texture, stimuli color etc.). The results show a strong RT effect of predictive background but (at best) only a weak improvement in search efficiency. A strong improvement in efficiency was found, however, when the informative background was presented 1500 msec prior to the onset of the search stimuli and when observers were given explicit instructions to use the cue (Experiment 2). PMID:17355043

  17. Speech is Golden

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter

    2014-01-01

    on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around......Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...... the speech technology challenge, they have formulated a number of joint questions and new requirements to be met by suppliers and have deliberately worked towards formulating tendering material which will allow fair competition. Public researchers have contributed to this work, including the author...

  18. A configural dominant account of contextual cueing : configural cues are stronger than colour cues

    OpenAIRE

    Kunar, Melina A.; Johnston, Rebecca; Sweetman, Hollie

    2013-01-01

    Previous work has shown that reaction times to find a target in displays that have been repeated are faster than those for displays that have never been seen before. This learning effect, termed “contextual cueing” (CC), has been shown using contexts such as the configuration of the distractors in the display and the background colour. However, it is not clear how these two contexts interact to facilitate search. We investigated this here by comparing the strengths of these two cues when they...

  19. Retrieval of bilingual autobiographical memories: effects of cue language and cue imageability.

    Science.gov (United States)

    Mortensen, Linda; Berntsen, Dorthe; Bohn, Ocke-Schwen

    2015-01-01

    An important issue in theories of bilingual autobiographical memory is whether linguistically encoded memories are represented in language-specific stores or in a common language-independent store. Previous research has found that autobiographical memory retrieval is facilitated when the language of the cue is the same as the language of encoding, consistent with language-specific memory stores. The present study examined whether this language congruency effect is influenced by cue imageability. Danish-English bilinguals retrieved autobiographical memories in response to Danish and English high- or low-imageability cues. Retrieval latencies were shorter to Danish than English cues and shorter to high- than low-imageability cues. Importantly, the cue language effect was stronger for low-than high-imageability cues. To examine the relationship between cue language and the language of internal retrieval, participants identified the language in which the memories were internally retrieved. More memories were retrieved when the cue language was the same as the internal language than when the cue was in the other language, and more memories were identified as being internally retrieved in Danish than English, regardless of the cue language. These results provide further evidence for language congruency effects in bilingual memory and suggest that this effect is influenced by cue imageability.

  20. Multilevel Analysis in Analyzing Speech Data

    Science.gov (United States)

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  1. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Staying Safe Videos for Educators Search English Español Speech-Language Therapy KidsHealth / For Parents / Speech-Language Therapy ... most kids with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech ...

  2. The cue is key : design for real-life remembering

    NARCIS (Netherlands)

    Hoven, van den E.A.W.H.; Eggen, J.H.

    2014-01-01

    This paper aims to put the memory cue in the spotlight. We will show how memory cues are incorporated in the area of interaction design. The focus will be on external memory cues: cues that exist outside the human mind but have an internal effect on memory reconstruction. Examples of external cues

  3. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  4. Neurophysiology of speech differences in childhood apraxia of speech.

    Science.gov (United States)

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.

  5. Noise and pitch interact during the cortical segregation of concurrent speech.

    Science.gov (United States)

    Bidelman, Gavin M; Yellamsetty, Anusha

    2017-08-01

    Behavioral studies reveal listeners exploit intrinsic differences in voice fundamental frequency (F0) to segregate concurrent speech sounds-the so-called "F0-benefit." More favorable signal-to-noise ratio (SNR) in the environment, an extrinsic acoustic factor, similarly benefits the parsing of simultaneous speech. Here, we examined the neurobiological substrates of these two cues in the perceptual segregation of concurrent speech mixtures. We recorded event-related brain potentials (ERPs) while listeners performed a speeded double-vowel identification task. Listeners heard two concurrent vowels whose F0 differed by zero or four semitones presented in either clean (no noise) or noise-degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in correctly identifying both vowels for larger F0 separations but F0-benefit was more pronounced at more favorable SNRs (i.e., pitch × SNR interaction). Analysis of the ERPs revealed that only the P2 wave (∼200 ms) showed a similar F0 x SNR interaction as behavior and was correlated with listeners' perceptual F0-benefit. Neural classifiers applied to the ERPs further suggested that speech sounds are segregated neurally within 200 ms based on SNR whereas segregation based on pitch occurs later in time (400-700 ms). The earlier timing of extrinsic SNR compared to intrinsic F0-based segregation implies that the cortical extraction of speech from noise is more efficient than differentiating speech based on pitch cues alone, which may recruit additional cortical processes. Findings indicate that noise and pitch differences interact relatively early in cerebral cortex and that the brain arrives at the identities of concurrent speech mixtures as early as ∼200 ms. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    Science.gov (United States)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  7. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals

    Science.gov (United States)

    Lidestam, Björn; Rönnberg, Jerker

    2016-01-01

    The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. PMID:27317667

  8. Abortion and compelled physician speech.

    Science.gov (United States)

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading. © 2015 American Society of Law, Medicine & Ethics, Inc.

  9. Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition.

    Science.gov (United States)

    Stevenson, Ryan A; Nelms, Caitlin E; Baum, Sarah H; Zurkovsky, Lilia; Barense, Morgan D; Newhouse, Paul A; Wallace, Mark T

    2015-01-01

    Over the next 2 decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio. For whole-word recognition, older adults relative to younger adults showed greater multisensory gains at intermediate SNRs but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as signal-to-noise ratio decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy-aging populations and that deficits begin to emerge only at the more complex word-recognition level of speech signals. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Prosody production networks are modulated by sensory cues and social context.

    Science.gov (United States)

    Klasen, Martin; von Marschall, Clara; Isman, Güldehen; Zvyagintsev, Mikhail; Gur, Ruben C; Mathiak, Klaus

    2018-03-05

    The neurobiology of emotional prosody production is not well investigated. In particular, the effects of cues and social context are not known. The present study sought to differentiate cued from free emotion generation and the effect of social feedback from a human listener. Online speech filtering enabled fMRI during prosodic communication in 30 participants. Emotional vocalizations were a) free, b) auditorily cued, c) visually cued, or d) with interactive feedback. In addition to distributed language networks, cued emotions increased activity in auditory and - in case of visual stimuli - visual cortex. Responses were larger in pSTG at the right hemisphere and the ventral striatum when participants were listened to and received feedback from the experimenter. Sensory, language, and reward networks contributed to prosody production and were modulated by cues and social context. The right pSTG is a central hub for communication in social interactions - in particular for interpersonal evaluation of vocal emotions.

  11. The Accuracy Enhancing Effect of Biasing Cues

    NARCIS (Netherlands)

    W. Vanhouche (Wouter); S.M.J. van Osselaer (Stijn)

    2009-01-01

    textabstractExtrinsic cues such as price and irrelevant attributes have been shown to bias consumers’ product judgments. Results in this article replicate those findings in pretrial judgments but show that such biasing cues can improve quality judgments at a later point in time. Initially biasing

  12. Auditory Emotional Cues Enhance Visual Perception

    Science.gov (United States)

    Zeelenberg, Rene; Bocanegra, Bruno R.

    2010-01-01

    Recent studies show that emotional stimuli impair performance to subsequently presented neutral stimuli. Here we show a cross-modal perceptual enhancement caused by emotional cues. Auditory cue words were followed by a visually presented neutral target word. Two-alternative forced-choice identification of the visual target was improved by…

  13. Cue Reliance in L2 Written Production

    Science.gov (United States)

    Wiechmann, Daniel; Kerz, Elma

    2014-01-01

    Second language learners reach expert levels in relative cue weighting only gradually. On the basis of ensemble machine learning models fit to naturalistic written productions of German advanced learners of English and expert writers, we set out to reverse engineer differences in the weighting of multiple cues in a clause linearization problem. We…

  14. Contextual Cueing Effects across the Lifespan

    Science.gov (United States)

    Merrill, Edward C.; Conners, Frances A.; Roskos, Beverly; Klinger, Mark R.; Klinger, Laura Grofer

    2013-01-01

    The authors evaluated age-related variations in contextual cueing, which reflects the extent to which visuospatial regularities can facilitate search for a target. Previous research produced inconsistent results regarding contextual cueing effects in young children and in older adults, and no study has investigated the phenomenon across the life…

  15. Cues for haptic perception of compliance

    NARCIS (Netherlands)

    Bergmann Tiest, W.M.; Kappers, A.M.L.

    2009-01-01

    For the perception of the hardness of compliant materials, several cues are available. In this paper, the relative roles of force/displacement and surface deformation cues are investigated. We have measured discrimination thresholds with silicone rubber stimuli of differing thickness and compliance.

  16. Speech Recognition on Mobile Devices

    DEFF Research Database (Denmark)

    Tan, Zheng-Hua; Lindberg, Børge

    2010-01-01

    in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within......The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...

  17. How rats combine temporal cues.

    Science.gov (United States)

    Guilhardi, Paulo; Keen, Richard; MacInnis, Mika L M; Church, Russell M

    2005-05-31

    The procedures for classical and operant conditioning, and for many timing procedures, involve the delivery of reinforcers that may be related to the time of previous reinforcers and responses, and to the time of onsets and terminations of stimuli. The behavior resulting from such procedures can be described as bouts of responding that occur in some pattern at some rate. A packet theory of timing and conditioning is described that accounts for such behavior under a wide range of procedures. Applications include the food searching by rats in Skinner boxes under conditions of fixed and random reinforcement, brief and sustained stimuli, and several response-food contingencies. The approach is used to describe how multiple cues from reinforcers and stimuli combine to determine the rate and pattern of response bouts.

  18. Contribution of Binaural Masking Release to Improved Speech Intelligibility for different Masker types.

    Science.gov (United States)

    Sutojo, Sarinah; van de Par, Steven; Schoenmaker, Esther

    2018-06-01

    In situations with competing talkers or in the presence of masking noise, speech intelligibility can be improved by spatially separating the target speaker from the interferers. This advantage is generally referred to as spatial release from masking (SRM) and different mechanisms have been suggested to explain it. One proposed mechanism to benefit from spatial cues is the binaural masking release, which is purely stimulus driven. According to this mechanism, the spatial benefit results from differences in the binaural cues of target and masker, which need to appear simultaneously in time and frequency to improve the signal detection. In an alternative proposed mechanism, the differences in the interaural cues improve the segregation of auditory streams, a process, which involves top-down processing rather than being purely stimulus driven. Other than the cues that produce binaural masking release, the interaural cue differences between target and interferer required to improve stream segregation do not have to appear simultaneously in time and frequency. This study is concerned with the contribution of binaural masking release to SRM for three masker types that differ with respect to the amount of energetic masking they exert. Speech intelligibility was measured, employing a stimulus manipulation that inhibits binaural masking release, and analyzed with a metric to account for the number of better-ear glimpses. Results indicate that the contribution of the stimulus-driven binaural masking release plays a minor role while binaural stream segregation and the availability of glimpses in the better ear had a stronger influence on improving the speech intelligibility. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  19. Role of short-time acoustic temporal fine structure cues in sentence recognition for normal-hearing listeners.

    Science.gov (United States)

    Hou, Limin; Xu, Li

    2018-02-01

    Short-time processing was employed to manipulate the amplitude, bandwidth, and temporal fine structure (TFS) in sentences. Fifty-two native-English-speaking, normal-hearing listeners participated in four sentence-recognition experiments. Results showed that recovered envelope (E) played an important role in speech recognition when the bandwidth was > 1 equivalent rectangular bandwidth. Removing TFS drastically reduced sentence recognition. Preserving TFS greatly improved sentence recognition when amplitude information was available at a rate ≥ 10 Hz (i.e., time segment ≤ 100 ms). Therefore, the short-time TFS facilitates speech perception together with the recovered E and works with the coarse amplitude cues to provide useful information for speech recognition.

  20. Kin-informative recognition cues in ants

    DEFF Research Database (Denmark)

    Nehring, Volker; Evison, Sophie E F; Santorelli, Lorenzo A

    2011-01-01

    behaviour is thought to be rare in one of the classic examples of cooperation--social insect colonies--because the colony-level costs of individual selfishness select against cues that would allow workers to recognize their closest relatives. In accord with this, previous studies of wasps and ants have...... found little or no kin information in recognition cues. Here, we test the hypothesis that social insects do not have kin-informative recognition cues by investigating the recognition cues and relatedness of workers from four colonies of the ant Acromyrmex octospinosus. Contrary to the theoretical...... prediction, we show that the cuticular hydrocarbons of ant workers in all four colonies are informative enough to allow full-sisters to be distinguished from half-sisters with a high accuracy. These results contradict the hypothesis of non-heritable recognition cues and suggest that there is more potential...

  1. Multiscale Cues Drive Collective Cell Migration

    Science.gov (United States)

    Nam, Ki-Hwan; Kim, Peter; Wood, David K.; Kwon, Sunghoon; Provenzano, Paolo P.; Kim, Deok-Ho

    2016-07-01

    To investigate complex biophysical relationships driving directed cell migration, we developed a biomimetic platform that allows perturbation of microscale geometric constraints with concomitant nanoscale contact guidance architectures. This permits us to elucidate the influence, and parse out the relative contribution, of multiscale features, and define how these physical inputs are jointly processed with oncogenic signaling. We demonstrate that collective cell migration is profoundly enhanced by the addition of contract guidance cues when not otherwise constrained. However, while nanoscale cues promoted migration in all cases, microscale directed migration cues are dominant as the geometric constraint narrows, a behavior that is well explained by stochastic diffusion anisotropy modeling. Further, oncogene activation (i.e. mutant PIK3CA) resulted in profoundly increased migration where extracellular multiscale directed migration cues and intrinsic signaling synergistically conspire to greatly outperform normal cells or any extracellular guidance cues in isolation.

  2. Assessing the contribution of binaural cues for apparent source width perception via a functional model

    DEFF Research Database (Denmark)

    Käsbach, Johannes; Hahmann, Manuel; May, Tobias

    2016-01-01

    In echoic conditions, sound sources are not perceived as point sources but appear to be expanded. The expansion in the horizontal dimension is referred to as apparent source width (ASW). To elicit this perception, the auditory system has access to fluctuations of binaural cues, the interaural time...... a statistical representation of ITDs and ILDs based on percentiles integrated over time and frequency. The model’s performance was evaluated against psychoacoustic data obtained with noise, speech and music signals in loudspeakerbased experiments. A robust model prediction of ASW was achieved using a cross...

  3. Current trends in multilingual speech processing

    Indian Academy of Sciences (India)

    2016-08-26

    ; speech-to-speech translation; language identification. ... interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers.

  4. Contribution of auditory working memory to speech understanding in mandarin-speaking cochlear implant users.

    Science.gov (United States)

    Tao, Duoduo; Deng, Rui; Jiang, Ye; Galvin, John J; Fu, Qian-Jie; Chen, Bing

    2014-01-01

    of voice pitch cues (albeit poorly coded by the CI) did not influence the relationship between working memory and speech perception.

  5. Electrophysiological and hemodynamic mismatch responses in rats listening to human speech syllables.

    Directory of Open Access Journals (Sweden)

    Mahdi Mahmoudzadeh

    Full Text Available Speech is a complex auditory stimulus which is processed according to several time-scales. Whereas consonant discrimination is required to resolve rapid acoustic events, voice perception relies on slower cues. Humans, right from preterm ages, are particularly efficient to encode temporal cues. To compare the capacities of preterms to those observed in other mammals, we tested anesthetized adult rats by using exactly the same paradigm as that used in preterm neonates. We simultaneously recorded neural (using ECoG and hemodynamic responses (using fNIRS to series of human speech syllables and investigated the brain response to a change of consonant (ba vs. ga and to a change of voice (male vs. female. Both methods revealed concordant results, although ECoG measures were more sensitive than fNIRS. Responses to syllables were bilateral, but with marked right-hemispheric lateralization. Responses to voice changes were observed with both methods, while only ECoG was sensitive to consonant changes. These results suggest that rats more effectively processed the speech envelope than fine temporal cues in contrast with human preterm neonates, in whom the opposite effects were observed. Cross-species comparisons constitute a very valuable tool to define the singularities of the human brain and species-specific bias that may help human infants to learn their native language.

  6. Don't speak too fast! Processing of fast rate speech in children with specific language impairment.

    Directory of Open Access Journals (Sweden)

    Hélène Guiraud

    Full Text Available Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI, impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD children.Sixteen French children with SLI (8-13 years old with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1 at normal rate, 2 at fast rate or 3 time-compressed. Sensitivity index (d' to semantically incongruent sentence-final words was measured.Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing.In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception.

  7. Don't speak too fast! Processing of fast rate speech in children with specific language impairment.

    Science.gov (United States)

    Guiraud, Hélène; Bedoin, Nathalie; Krifi-Papoz, Sonia; Herbillon, Vania; Caillot-Bascoul, Aurélia; Gonzalez-Monge, Sibylle; Boulenger, Véronique

    2018-01-01

    Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI), impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD) children. Sixteen French children with SLI (8-13 years old) with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1) at normal rate, 2) at fast rate or 3) time-compressed. Sensitivity index (d') to semantically incongruent sentence-final words was measured. Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing. In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception.

  8. Multimodal Speech Capture System for Speech Rehabilitation and Learning.

    Science.gov (United States)

    Sebkhi, Nordine; Desai, Dhyey; Islam, Mohammad; Lu, Jun; Wilson, Kimberly; Ghovanloo, Maysam

    2017-11-01

    Speech-language pathologists (SLPs) are trained to correct articulation of people diagnosed with motor speech disorders by analyzing articulators' motion and assessing speech outcome while patients speak. To assist SLPs in this task, we are presenting the multimodal speech capture system (MSCS) that records and displays kinematics of key speech articulators, the tongue and lips, along with voice, using unobtrusive methods. Collected speech modalities, tongue motion, lips gestures, and voice are visualized not only in real-time to provide patients with instant feedback but also offline to allow SLPs to perform post-analysis of articulators' motion, particularly the tongue, with its prominent but hardly visible role in articulation. We describe the MSCS hardware and software components, and demonstrate its basic visualization capabilities by a healthy individual repeating the words "Hello World." A proof-of-concept prototype has been successfully developed for this purpose, and will be used in future clinical studies to evaluate its potential impact on accelerating speech rehabilitation by enabling patients to speak naturally. Pattern matching algorithms to be applied to the collected data can provide patients with quantitative and objective feedback on their speech performance, unlike current methods that are mostly subjective, and may vary from one SLP to another.

  9. Measurement of speech parameters in casual speech of dementia patients

    NARCIS (Netherlands)

    Ossewaarde, Roelant; Jonkers, Roel; Jalvingh, Fedor; Bastiaanse, Yvonne

    Measurement of speech parameters in casual speech of dementia patients Roelant Adriaan Ossewaarde1,2, Roel Jonkers1, Fedor Jalvingh1,3, Roelien Bastiaanse1 1CLCG, University of Groningen (NL); 2HU University of Applied Sciences Utrecht (NL); 33St. Marienhospital - Vechta, Geriatric Clinic Vechta

  10. Alternative Speech Communication System for Persons with Severe Speech Disorders

    Science.gov (United States)

    Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

    2009-12-01

    Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.

  11. The effect of viewing speech on auditory speech processing is different in the left and right hemispheres.

    Science.gov (United States)

    Davis, Chris; Kislyuk, Daniel; Kim, Jeesun; Sams, Mikko

    2008-11-25

    We used whole-head magnetoencephalograpy (MEG) to record changes in neuromagnetic N100m responses generated in the left and right auditory cortex as a function of the match between visual and auditory speech signals. Stimuli were auditory-only (AO) and auditory-visual (AV) presentations of /pi/, /ti/ and /vi/. Three types of intensity matched auditory stimuli were used: intact speech (Normal), frequency band filtered speech (Band) and speech-shaped white noise (Noise). The behavioural task was to detect the /vi/ syllables which comprised 12% of stimuli. N100m responses were measured to averaged /pi/ and /ti/ stimuli. Behavioural data showed that identification of the stimuli was faster and more accurate for Normal than for Band stimuli, and for Band than for Noise stimuli. Reaction times were faster for AV than AO stimuli. MEG data showed that in the left hemisphere, N100m to both AO and AV stimuli was largest for the Normal, smaller for Band and smallest for Noise stimuli. In the right hemisphere, Normal and Band AO stimuli elicited N100m responses of quite similar amplitudes, but N100m amplitude to Noise was about half of that. There was a reduction in N100m for the AV compared to the AO conditions. The size of this reduction for each stimulus type was same in the left hemisphere but graded in the right (being largest to the Normal, smaller to the Band and smallest to the Noise stimuli). The N100m decrease for the Normal stimuli was significantly larger in the right than in the left hemisphere. We suggest that the effect of processing visual speech seen in the right hemisphere likely reflects suppression of the auditory response based on AV cues for place of articulation.

  12. Speech Perception as a Multimodal Phenomenon

    OpenAIRE

    Rosenblum, Lawrence D.

    2008-01-01

    Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal s...

  13. Cue-induced craving among inhalant users: Development and preliminary validation of a visual cue paradigm.

    Science.gov (United States)

    Jain, Shobhit; Dhawan, Anju; Kumaran, S Senthil; Pattanayak, Raman Deep; Jain, Raka

    2017-12-01

    Cue-induced craving is known to be associated with a higher risk of relapse, wherein drug-specific cues become conditioned stimuli, eliciting conditioned responses. Cue-reactivity paradigm are important tools to study psychological responses and functional neuroimaging changes. However, till date, there has been no specific study or a validated paradigm for inhalant cue-induced craving research. The study aimed to develop and validate visual cue stimulus for inhalant cue-associated craving. The first step (picture selection) involved screening and careful selection of 30 cue- and 30 neutral-pictures based on their relevance for naturalistic settings. In the second step (time optimization), a random selection of ten cue-pictures each was presented for 4s, 6s, and 8s to seven adolescent male inhalant users, and pre-post craving response was compared using a Visual Analogue Scale(VAS) for each of the picture and time. In the third step (validation), craving response for each of 30 cue- and 30 neutral-pictures were analysed among 20 adolescent inhalant users. Findings revealed a significant difference in before and after craving response for the cue-pictures, but not neutral-pictures. Using ROC-curve, pictures were arranged in order of craving intensity. Finally, 20 best cue- and 20 neutral-pictures were used for the development of a 480s visual cue paradigm. This is the first study to systematically develop an inhalant cue picture paradigm which can be used as a tool to examine cue induced craving in neurobiological studies. Further research, including its further validation in larger study and diverse samples, is required. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Retro-dimension-cue benefit in visual working memory

    OpenAIRE

    Ye, Chaoxiong; Hu, Zhonghua; Ristaniemi, Tapani; Gendron, Maria; Liu, Qiang

    2016-01-01

    In visual working memory (VWM) tasks, participants? performance can be improved by a retro-object-cue. However, previous studies have not investigated whether participants? performance can also be improved by a retro-dimension-cue. Three experiments investigated this issue. We used a recall task with a retro-dimension-cue in all experiments. In Experiment 1, we found benefits from retro-dimension-cues compared to neutral cues. This retro-dimension-cue benefit is reflected in an increased prob...

  15. Auditory Modeling for Noisy Speech Recognition

    National Research Council Canada - National Science Library

    2000-01-01

    ... digital filtering for noise cancellation which interfaces to speech recognition software. It uses auditory features in speech recognition training, and provides applications to multilingual spoken language translation...

  16. Teaching Speech Acts

    Directory of Open Access Journals (Sweden)

    Teaching Speech Acts

    2007-01-01

    Full Text Available In this paper I argue that pragmatic ability must become part of what we teach in the classroom if we are to realize the goals of communicative competence for our students. I review the research on pragmatics, especially those articles that point to the effectiveness of teaching pragmatics in an explicit manner, and those that posit methods for teaching. I also note two areas of scholarship that address classroom needs—the use of authentic data and appropriate assessment tools. The essay concludes with a summary of my own experience teaching speech acts in an advanced-level Portuguese class.

  17. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Dansereau Richard M

    2007-01-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  18. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Mohammad H. Radfar

    2006-11-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  19. Identification of speech transients using variable frame rate analysis and wavelet packets.

    Science.gov (United States)

    Rasetshwane, Daniel M; Boston, J Robert; Li, Ching-Chung

    2006-01-01

    Speech transients are important cues for identifying and discriminating speech sounds. Yoo et al. and Tantibundhit et al. were successful in identifying speech transients and, emphasizing them, improving the intelligibility of speech in noise. However, their methods are computationally intensive and unsuitable for real-time applications. This paper presents a method to identify and emphasize speech transients that combines subband decomposition by the wavelet packet transform with variable frame rate (VFR) analysis and unvoiced consonant detection. The VFR analysis is applied to each wavelet packet to define a transitivity function that describes the extent to which the wavelet coefficients of that packet are changing. Unvoiced consonant detection is used to identify unvoiced consonant intervals and the transitivity function is amplified during these intervals. The wavelet coefficients are multiplied by the transitivity function for that packet, amplifying the coefficients localized at times when they are changing and attenuating coefficients at times when they are steady. Inverse transform of the modified wavelet packet coefficients produces a signal corresponding to speech transients similar to the transients identified by Yoo et al. and Tantibundhit et al. A preliminary implementation of the algorithm runs more efficiently.

  20. Cross-modal cueing in audiovisual spatial attention

    DEFF Research Database (Denmark)

    Blurton, Steven Paul; Greenlee, Mark W.; Gondan, Matthias

    2015-01-01

    effects have been reported for endogenous visual cues while exogenous cues seem to be mostly ineffective. In three experiments, we investigated cueing effects on the processing of audiovisual signals. In Experiment 1 we used endogenous cues to investigate their effect on the detection of auditory, visual......, and audiovisual targets presented with onset asynchrony. Consistent cueing effects were found in all target conditions. In Experiment 2 we used exogenous cues and found cueing effects only for visual target detection, but not auditory target detection. In Experiment 3 we used predictive exogenous cues to examine...