audio-visual speech cue: Topics by WorldWideScience.org

Sample records for audio-visual speech cue

[Intermodal timing cues for audio-visual speech recognition].

Science.gov (United States)

Hashimoto, Masahiro; Kumashiro, Masaharu

2004-06-01

The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.
Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features

Directory of Open Access Journals (Sweden)

Petar S. Aleksic

2002-11-01

Full Text Available We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR experiments. Both single-stream and multistream hidden Markov models (HMMs were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0Ã¢Â€Â“30 dB with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.
Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

Directory of Open Access Journals (Sweden)

Magnus eAlm

2015-07-01

Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.
Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.

Directory of Open Access Journals (Sweden)

Kirsten E Smayda

Full Text Available Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35 and thirty-three older adults (ages 60-90 to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger
Audio-visual onset differences are used to determine syllable identity for ambiguous audio-visual stimulus pairs.

Science.gov (United States)

Ten Oever, Sanne; Sack, Alexander T; Wheat, Katherine L; Bien, Nina; van Atteveldt, Nienke

2013-01-01

Content and temporal cues have been shown to interact during audio-visual (AV) speech identification. Typically, the most reliable unimodal cue is used more strongly to identify specific speech features; however, visual cues are only used if the AV stimuli are presented within a certain temporal window of integration (TWI). This suggests that temporal cues denote whether unimodal stimuli belong together, that is, whether they should be integrated. It is not known whether temporal cues also provide information about the identity of a syllable. Since spoken syllables have naturally varying AV onset asynchronies, we hypothesize that for suboptimal AV cues presented within the TWI, information about the natural AV onset differences can aid in speech identification. To test this, we presented low-intensity auditory syllables concurrently with visual speech signals, and varied the stimulus onset asynchronies (SOA) of the AV pair, while participants were instructed to identify the auditory syllables. We revealed that specific speech features (e.g., voicing) were identified by relying primarily on one modality (e.g., auditory). Additionally, we showed a wide window in which visual information influenced auditory perception, that seemed even wider for congruent stimulus pairs. Finally, we found a specific response pattern across the SOA range for syllables that were not reliably identified by the unimodal cues, which we explained as the result of the use of natural onset differences between AV speech signals. This indicates that temporal cues not only provide information about the temporal integration of AV stimuli, but additionally convey information about the identity of AV pairs. These results provide a detailed behavioral basis for further neuro-imaging and stimulation studies to unravel the neurofunctional mechanisms of the audio-visual-temporal interplay within speech perception.
Classifying laughter and speech using audio-visual feature prediction

NARCIS (Netherlands)

Petridis, Stavros; Asghar, Ali; Pantic, Maja

2010-01-01

In this study, a system that discriminates laughter from speech by modelling the relationship between audio and visual features is presented. The underlying assumption is that this relationship is different between speech and laughter. Neural networks are trained which learn the audio-to-visual and
Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

Directory of Open Access Journals (Sweden)

Warrick Roseboom

2011-04-01

Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.
Fusion of audio and visual cues for laughter detection

NARCIS (Netherlands)

Petridis, Stavros; Pantic, Maja

Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audio- visual approach to distinguishing laughter from speech and we show that integrating the information from audio and video channels leads to improved performance over single-modal
Should visual speech cues (speechreading) be considered when fitting hearing aids?

Science.gov (United States)

Grant, Ken

2002-05-01

When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.
Effects of Audio-Visual Information on the Intelligibility of Alaryngeal Speech

Science.gov (United States)

Evitts, Paul M.; Portugal, Lindsay; Van Dine, Ami; Holler, Aline

2010-01-01

Background: There is minimal research on the contribution of visual information on speech intelligibility for individuals with a laryngectomy (IWL). Aims: The purpose of this project was to determine the effects of mode of presentation (audio-only, audio-visual) on alaryngeal speech intelligibility. Method: Twenty-three naive listeners were…
Audio-Visual Speech Perception: A Developmental ERP Investigation

Science.gov (United States)

Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

2014-01-01

Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…
ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

Directory of Open Access Journals (Sweden)

D.V. Ivanko

2016-05-01

Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.
Preschoolers Benefit from Visually Salient Speech Cues

Science.gov (United States)

Lalonde, Kaylah; Holt, Rachael Frush

2015-01-01

Purpose: This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method: Twelve adults and 27 typically developing 3-…
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

Science.gov (United States)

Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

2015-01-01

Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.
The effect of combined sensory and semantic components on audio-visual speech perception in older adults

Directory of Open Access Journals (Sweden)

Corrina eMaguinness

2011-12-01

Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.
Robust audio-visual speech recognition under noisy audio-video conditions.

Science.gov (United States)

Stewart, Darryl; Seymour, Rowan; Pass, Adrian; Ming, Ji

2014-02-01

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.
Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition

NARCIS (Netherlands)

Jesse, A.; McQueen, J.M.

2014-01-01

Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes
Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

Directory of Open Access Journals (Sweden)

Koji Iwano

2007-03-01

Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.
Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

Directory of Open Access Journals (Sweden)

Yue Zhao

2012-12-01

Full Text Available Audio-visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi-stream Dynamic Bayesian Network and coupled HMM are widely used for audio-visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial-temporal multimodal features from Tibetan audio-visual speech data and build an accurate audio-visual speech recognition model under a no frame-independency assumption. The experiment results on Tibetan speech data from some real-world environments showed the proposed DDBN outperforms the state-of-art methods in word recognition accuracy.
Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition

OpenAIRE

Jesse, A.; McQueen, J.

2014-01-01

Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker...

The contribution of dynamic visual cues to audiovisual speech perception.

Science.gov (United States)

Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

2015-08-01

Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.
No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag.

Directory of Open Access Journals (Sweden)

Jean-Luc Schwartz

2014-07-01

Full Text Available An increasing number of neuroscience papers capitalize on the assumption published in this journal that visual speech would be typically 150 ms ahead of auditory speech. It happens that the estimation of audiovisual asynchrony in the reference paper is valid only in very specific cases, for isolated consonant-vowel syllables or at the beginning of a speech utterance, in what we call "preparatory gestures". However, when syllables are chained in sequences, as they are typically in most parts of a natural speech utterance, asynchrony should be defined in a different way. This is what we call "comodulatory gestures" providing auditory and visual events more or less in synchrony. We provide audiovisual data on sequences of plosive-vowel syllables (pa, ta, ka, ba, da, ga, ma, na showing that audiovisual synchrony is actually rather precise, varying between 20 ms audio lead and 70 ms audio lag. We show how more complex speech material should result in a range typically varying between 40 ms audio lead and 200 ms audio lag, and we discuss how this natural coordination is reflected in the so-called temporal integration window for audiovisual speech perception. Finally we present a toy model of auditory and audiovisual predictive coding, showing that visual lead is actually not necessary for visual prediction.
Audio-visual temporal recalibration can be constrained by content cues regardless of spatial overlap

Directory of Open Access Journals (Sweden)

Warrick eRoseboom

2013-04-01

Full Text Available It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated, and opposing, estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possible to maintain a temporal relationship distinct from other pairs. It has been suggested that spatial separation of the different audio-visual pairs is necessary to achieve multiple distinct audio-visual synchrony estimates. Here we investigated if this was necessarily true. Specifically, we examined whether it is possible to obtain two distinct temporal recalibrations for stimuli that differed only in featural content. Using both complex (audio visual speech; Experiment 1 and simple stimuli (high and low pitch audio matched with either vertically or horizontally oriented Gabors; Experiment 2 we found concurrent, and opposite, recalibrations despite there being no spatial difference in presentation location at any point throughout the experiment. This result supports the notion that the content of an audio-visual pair can be used to constrain distinct audio-visual synchrony estimates regardless of spatial overlap.
The Neural Basis of Speech Perception through Lipreading and Manual Cues: Evidence from Deaf Native Users of Cued Speech

Science.gov (United States)

Aparicio, Mario; Peigneux, Philippe; Charlier, Brigitte; Balériaux, Danielle; Kavec, Martin; Leybaert, Jacqueline

2017-01-01

We present here the first neuroimaging data for perception of Cued Speech (CS) by deaf adults who are native users of CS. CS is a visual mode of communicating a spoken language through a set of manual cues which accompany lipreading and disambiguate it. With CS, sublexical units of the oral language are conveyed clearly and completely through the visual modality without requiring hearing. The comparison of neural processing of CS in deaf individuals with processing of audiovisual (AV) speech in normally hearing individuals represents a unique opportunity to explore the similarities and differences in neural processing of an oral language delivered in a visuo-manual vs. an AV modality. The study included deaf adult participants who were early CS users and native hearing users of French who process speech audiovisually. Words were presented in an event-related fMRI design. Three conditions were presented to each group of participants. The deaf participants saw CS words (manual + lipread), words presented as manual cues alone, and words presented to be lipread without manual cues. The hearing group saw AV spoken words, audio-alone and lipread-alone. Three findings are highlighted. First, the middle and superior temporal gyrus (excluding Heschl’s gyrus) and left inferior frontal gyrus pars triangularis constituted a common, amodal neural basis for AV and CS perception. Second, integration was inferred in posterior parts of superior temporal sulcus for audio and lipread information in AV speech, but in the occipito-temporal junction, including MT/V5, for the manual cues and lipreading in CS. Third, the perception of manual cues showed a much greater overlap with the regions activated by CS (manual + lipreading) than lipreading alone did. This supports the notion that manual cues play a larger role than lipreading for CS processing. The present study contributes to a better understanding of the role of manual cues as support of visual speech perception in the framework
Visual cues and listening effort: individual variability.

Science.gov (United States)

Picou, Erin M; Ricketts, Todd A; Hornsby, Benjamin W Y

2011-10-01

To investigate the effect of visual cues on listening effort as well as whether predictive variables such as working memory capacity (WMC) and lipreading ability affect the magnitude of listening effort. Twenty participants with normal hearing were tested using a paired-associates recall task in 2 conditions (quiet and noise) and 2 presentation modalities (audio only [AO] and auditory-visual [AV]). Signal-to-noise ratios were adjusted to provide matched speech recognition across audio-only and AV noise conditions. Also measured were subjective perceptions of listening effort and 2 predictive variables: (a) lipreading ability and (b) WMC. Objective and subjective results indicated that listening effort increased in the presence of noise, but on average the addition of visual cues did not significantly affect the magnitude of listening effort. Although there was substantial individual variability, on average participants who were better lipreaders or had larger WMCs demonstrated reduced listening effort in noise in AV conditions. Overall, the results support the hypothesis that integrating auditory and visual cues requires cognitive resources in some participants. The data indicate that low lipreading ability or low WMC is associated with relatively effortful integration of auditory and visual information in noise.
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

Science.gov (United States)

Borrie, Stephanie A

2015-03-01

This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Audio-visual identification of place of articulation and voicing in white and babble noise.

Science.gov (United States)

Alm, Magnus; Behne, Dawn M; Wang, Yue; Eg, Ragnhild

2009-07-01

Research shows that noise and phonetic attributes influence the degree to which auditory and visual modalities are used in audio-visual speech perception (AVSP). Research has, however, mainly focused on white noise and single phonetic attributes, thus neglecting the more common babble noise and possible interactions between phonetic attributes. This study explores whether white and babble noise differentially influence AVSP and whether these differences depend on phonetic attributes. White and babble noise of 0 and -12 dB signal-to-noise ratio were added to congruent and incongruent audio-visual stop consonant-vowel stimuli. The audio (A) and video (V) of incongruent stimuli differed either in place of articulation (POA) or voicing. Responses from 15 young adults show that, compared to white noise, babble resulted in more audio responses for POA stimuli, and fewer for voicing stimuli. Voiced syllables received more audio responses than voiceless syllables. Results can be attributed to discrepancies in the acoustic spectra of both the noise and speech target. Voiced consonants may be more auditorily salient than voiceless consonants which are more spectrally similar to white noise. Visual cues contribute to identification of voicing, but only if the POA is visually salient and auditorily susceptible to the noise type.
Cortical Integration of Audio-Visual Information

Science.gov (United States)

Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.

2013-01-01

We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442
Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

Directory of Open Access Journals (Sweden)

Alan James Power

2012-07-01

Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling
Learning to Match Auditory and Visual Speech Cues: Social Influences on Acquisition of Phonological Categories

Science.gov (United States)

Altvater-Mackensen, Nicole; Grossmann, Tobias

2015-01-01

Infants' language exposure largely involves face-to-face interactions providing acoustic and visual speech cues but also social cues that might foster language learning. Yet, both audiovisual speech information and social information have so far received little attention in research on infants' early language development. Using a preferential…
Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition.

Science.gov (United States)

Jesse, Alexandra; McQueen, James M

2014-01-01

Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker say fragments of word pairs that were segmentally identical but differed in their stress realization (e.g., 'ca-vi from cavia "guinea pig" vs. 'ka-vi from kaviaar "caviar"). Participants were able to distinguish between these pairs from seeing a speaker alone. Only the presence of primary stress in the fragment, not its absence, was informative. Participants were able to distinguish visually primary from secondary stress on first syllables, but only when the fragment-bearing target word carried phrase-level emphasis. Furthermore, participants distinguished fragments with primary stress on their second syllable from those with secondary stress on their first syllable (e.g., pro-'jec from projector "projector" vs. 'pro-jec from projectiel "projectile"), independently of phrase-level emphasis. Seeing a speaker thus contributes to spoken-word recognition by providing suprasegmental information about the presence of primary lexical stress.
Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

Directory of Open Access Journals (Sweden)

W. H. Adams

2003-02-01

Full Text Available We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM, hidden Markov models (HMM, and support vector machines (SVM. Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.

Science.gov (United States)

Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu

2018-05-01

Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
Crossmodal and incremental perception of audiovisual cues to emotional speech.

Science.gov (United States)

Barkhuysen, Pashiera; Krahmer, Emiel; Swerts, Marc

2010-01-01

In this article we report on two experiments about the perception of audiovisual cues to emotional speech. The article addresses two questions: 1) how do visual cues from a speaker's face to emotion relate to auditory cues, and (2) what is the recognition speed for various facial cues to emotion? Both experiments reported below are based on tests with video clips of emotional utterances collected via a variant of the well-known Velten method. More specifically, we recorded speakers who displayed positive or negative emotions, which were congruent or incongruent with the (emotional) lexical content of the uttered sentence. In order to test this, we conducted two experiments. The first experiment is a perception experiment in which Czech participants, who do not speak Dutch, rate the perceived emotional state of Dutch speakers in a bimodal (audiovisual) or a unimodal (audio- or vision-only) condition. It was found that incongruent emotional speech leads to significantly more extreme perceived emotion scores than congruent emotional speech, where the difference between congruent and incongruent emotional speech is larger for the negative than for the positive conditions. Interestingly, the largest overall differences between congruent and incongruent emotions were found for the audio-only condition, which suggests that posing an incongruent emotion has a particularly strong effect on the spoken realization of emotions. The second experiment uses a gating paradigm to test the recognition speed for various emotional expressions from a speaker's face. In this experiment participants were presented with the same clips as experiment I, but this time presented vision-only. The clips were shown in successive segments (gates) of increasing duration. Results show that participants are surprisingly accurate in their recognition of the various emotions, as they already reach high recognition scores in the first gate (after only 160 ms). Interestingly, the recognition scores
The Effect of Visual Cueing and Control Design on Children's Reading Achievement of Audio E-Books with Tablet Computers

Science.gov (United States)

Wang, Pei-Yu; Huang, Chung-Kai

2015-01-01

This study aims to explore the impact of learner grade, visual cueing, and control design on children's reading achievement of audio e-books with tablet computers. This research was a three-way factorial design where the first factor was learner grade (grade four and six), the second factor was e-book visual cueing (word-based, line-based, and…
Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study.

Science.gov (United States)

Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

2016-01-01

Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our
Speech cues contribute to audiovisual spatial integration.

Directory of Open Access Journals (Sweden)

Christopher W Bishop

Full Text Available Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways.
Modeling the Development of Audiovisual Cue Integration in Speech Perception.

Science.gov (United States)

Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

2017-03-21

Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
Listeners' expectation of room acoustical parameters based on visual cues

Science.gov (United States)

Valente, Daniel L.

Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audio-visual study, in which participants are instructed to make spatial congruency and quantity judgments in dynamic cross-modal environments. The results of these psychophysical tests suggest the importance of consilient audio-visual presentation to the legibility of an auditory scene. Several studies have looked into audio-visual interaction in room perception in recent years, but these studies rely on static images, speech signals, or photographs alone to represent the visual scene. Building on these studies, the aim is to propose a testing method that uses monochromatic compositing (blue-screen technique) to position a studio recording of a musical performance in a number of virtual acoustical environments and ask subjects to assess these environments. In the first experiment of the study, video footage was taken from five rooms varying in physical size from a small studio to a small performance hall. Participants were asked to perceptually align two distinct acoustical parameters---early-to-late reverberant energy ratio and reverberation time---of two solo musical performances in five contrasting visual environments according to their expectations of how the room should sound given its visual appearance. In the second experiment in the study, video footage shot from four different listening positions within a general-purpose space was coupled with sounds derived from measured binaural impulse responses (IRs). The relationship between the presented image, sound, and virtual receiver position was examined. It was found that many visual cues caused different perceived events of the acoustic environment. This included the visual attributes of the space in which the performance was located as well as the visual attributes of the performer
SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support

Directory of Open Access Journals (Sweden)

Giampiero Salvi

2009-01-01

Full Text Available This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling.

Atypical audio-visual speech perception and McGurk effects in children with specific language impairment.

Science.gov (United States)

Leybaert, Jacqueline; Macchi, Lucie; Huyse, Aurélie; Champoux, François; Bayard, Clémence; Colin, Cécile; Berthommier, Frédéric

2014-01-01

Audiovisual speech perception of children with specific language impairment (SLI) and children with typical language development (TLD) was compared in two experiments using /aCa/ syllables presented in the context of a masking release paradigm. Children had to repeat syllables presented in auditory alone, visual alone (speechreading), audiovisual congruent and incongruent (McGurk) conditions. Stimuli were masked by either stationary (ST) or amplitude modulated (AM) noise. Although children with SLI were less accurate in auditory and audiovisual speech perception, they showed similar auditory masking release effect than children with TLD. Children with SLI also had less correct responses in speechreading than children with TLD, indicating impairment in phonemic processing of visual speech information. In response to McGurk stimuli, children with TLD showed more fusions in AM noise than in ST noise, a consequence of the auditory masking release effect and of the influence of visual information. Children with SLI did not show this effect systematically, suggesting they were less influenced by visual speech. However, when the visual cues were easily identified, the profile of responses to McGurk stimuli was similar in both groups, suggesting that children with SLI do not suffer from an impairment of audiovisual integration. An analysis of percent of information transmitted revealed a deficit in the children with SLI, particularly for the place of articulation feature. Taken together, the data support the hypothesis of an intact peripheral processing of auditory speech information, coupled with a supra modal deficit of phonemic categorization in children with SLI. Clinical implications are discussed.
Audio-Visual Temporal Recalibration Can be Constrained by Content Cues Regardless of Spatial Overlap

OpenAIRE

Roseboom, Warrick; Kawabe, Takahiro; Nishida, Shin?Ya

2013-01-01

It has now been well established that the point of subjective synchrony for audio and visual events can be shifted following exposure to asynchronous audio-visual presentations, an effect often referred to as temporal recalibration. Recently it was further demonstrated that it is possible to concurrently maintain two such recalibrated, and opposing, estimates of audio-visual temporal synchrony. However, it remains unclear precisely what defines a given audio-visual pair such that it is possib...
Deep Complementary Bottleneck Features for Visual Speech Recognition

NARCIS (Netherlands)

Petridis, Stavros; Pantic, Maja

Deep bottleneck features (DBNFs) have been used successfully in the past for acoustic speech recognition from audio. However, research on extracting DBNFs for visual speech recognition is very limited. In this work, we present an approach to extract deep bottleneck visual features based on deep
News video story segmentation method using fusion of audio-visual features

Science.gov (United States)

Wen, Jun; Wu, Ling-da; Zeng, Pu; Luan, Xi-dao; Xie, Yu-xiang

2007-11-01

News story segmentation is an important aspect for news video analysis. This paper presents a method for news video story segmentation. Different form prior works, which base on visual features transform, the proposed technique uses audio features as baseline and fuses visual features with it to refine the results. At first, it selects silence clips as audio features candidate points, and selects shot boundaries and anchor shots as two kinds of visual features candidate points. Then this paper selects audio feature candidates as cues and develops different fusion method, which effectively using diverse type visual candidates to refine audio candidates, to get story boundaries. Experiment results show that this method has high efficiency and adaptability to different kinds of news video.
Audio-Visual Speech in Noise Perception in Dyslexia

Science.gov (United States)

van Laarhoven, Thijs; Keetels, Mirjam; Schakel, Lemmy; Vroomen, Jean

2018-01-01

Individuals with developmental dyslexia (DD) may experience, besides reading problems, other speech-related processing deficits. Here, we examined the influence of visual articulatory information (lip-read speech) at various levels of background noise on auditory word recognition in children and adults with DD. We found that children with a…
On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

Directory of Open Access Journals (Sweden)

Wesley Mattheyses

2009-01-01

Full Text Available Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality.
Audio-visual Classification and Fusion of Spontaneous Affect Data in Likelihood Space

NARCIS (Netherlands)

Nicolaou, Mihalis A.; Gunes, Hatice; Pantic, Maja

2010-01-01

This paper focuses on audio-visual (using facial expression, shoulder and audio cues) classification of spontaneous affect, utilising generative models for classification (i) in terms of Maximum Likelihood Classification with the assumption that the generative model structure in the classifier is
Multisensory and Modality Specific Processing of Visual Speech in Different Regions of the Premotor Cortex

Directory of Open Access Journals (Sweden)

Daniel eCallan

2014-05-01

Full Text Available Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex has been shown to be active during both observation and execution of action (‘Mirror System’ properties, and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI study, participants identified vowels produced by a speaker in audio-visual (saw the speaker’s articulating face and heard her voice, visual only (only saw the speaker’s articulating face, and audio only (only heard the speaker’s voice conditions with varying audio signal-to-noise ratios in order to determine the regions of the premotor cortex involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the fMRI analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and premotor cortex. The left ventral inferior premotor cortex showed properties of multimodal (audio-visual enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the premotor cortex are involved with mapping unimodal (in this case visual sensory features of the speech signal with
Acoustic cues identifying phonetic transitions for speech segmentation

CSIR Research Space (South Africa)

Van Niekerk, DR

2008-11-01

Full Text Available The quality of corpus-based text-to-speech (TTS) systems depends strongly on the consistency of boundary placements during phonetic alignments. Expert human transcribers use visually represented acoustic cues in order to consistently place...
Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

Science.gov (United States)

Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

2016-06-17

The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. © The Author(s) 2016.
Auditory and audio-visual processing in patients with cochlear, auditory brainstem, and auditory midbrain implants: An EEG study.

Science.gov (United States)

Schierholz, Irina; Finke, Mareike; Kral, Andrej; Büchner, Andreas; Rach, Stefan; Lenarz, Thomas; Dengler, Reinhard; Sandmann, Pascale

2017-04-01

There is substantial variability in speech recognition ability across patients with cochlear implants (CIs), auditory brainstem implants (ABIs), and auditory midbrain implants (AMIs). To better understand how this variability is related to central processing differences, the current electroencephalography (EEG) study compared hearing abilities and auditory-cortex activation in patients with electrical stimulation at different sites of the auditory pathway. Three different groups of patients with auditory implants (Hannover Medical School; ABI: n = 6, CI: n = 6; AMI: n = 2) performed a speeded response task and a speech recognition test with auditory, visual, and audio-visual stimuli. Behavioral performance and cortical processing of auditory and audio-visual stimuli were compared between groups. ABI and AMI patients showed prolonged response times on auditory and audio-visual stimuli compared with NH listeners and CI patients. This was confirmed by prolonged N1 latencies and reduced N1 amplitudes in ABI and AMI patients. However, patients with central auditory implants showed a remarkable gain in performance when visual and auditory input was combined, in both speech and non-speech conditions, which was reflected by a strong visual modulation of auditory-cortex activation in these individuals. In sum, the results suggest that the behavioral improvement for audio-visual conditions in central auditory implant patients is based on enhanced audio-visual interactions in the auditory cortex. Their findings may provide important implications for the optimization of electrical stimulation and rehabilitation strategies in patients with central auditory prostheses. Hum Brain Mapp 38:2206-2225, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Model-Based Synthesis of Visual Speech Movements from 3D Video

Directory of Open Access Journals (Sweden)

Edge JamesD

2009-01-01

Full Text Available We describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. Speech lip movements are captured using a 3D stereo face capture system and split up into phonetic units. A dynamic parameterisation of this data is constructed which maintains the relationship between lip shapes and velocities; within this parameterisation a model of how lips move is built and is used in the animation of visual speech movements from speech audio input. The mapping from audio parameters to lip movements is disambiguated by selecting only the most similar stored phonetic units to the target utterance during synthesis. By combining properties of model-based synthesis (e.g., HMMs, neural nets with unit selection we improve the quality of our speech synthesis.
Visual-Auditory Integration during Speech Imitation in Autism

Science.gov (United States)

Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

2004-01-01

Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…
Computationally Efficient Clustering of Audio-Visual Meeting Data

Science.gov (United States)

Hung, Hayley; Friedland, Gerald; Yeo, Chuohao

This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.
Audio-visual speech perception in infants and toddlers with Down syndrome, fragile X syndrome, and Williams syndrome.

Science.gov (United States)

D'Souza, Dean; D'Souza, Hana; Johnson, Mark H; Karmiloff-Smith, Annette

2016-08-01

Typically-developing (TD) infants can construct unified cross-modal percepts, such as a speaking face, by integrating auditory-visual (AV) information. This skill is a key building block upon which higher-level skills, such as word learning, are built. Because word learning is seriously delayed in most children with neurodevelopmental disorders, we assessed the hypothesis that this delay partly results from a deficit in integrating AV speech cues. AV speech integration has rarely been investigated in neurodevelopmental disorders, and never previously in infants. We probed for the McGurk effect, which occurs when the auditory component of one sound (/ba/) is paired with the visual component of another sound (/ga/), leading to the perception of an illusory third sound (/da/ or /tha/). We measured AV integration in 95 infants/toddlers with Down, fragile X, or Williams syndrome, whom we matched on Chronological and Mental Age to 25 TD infants. We also assessed a more basic AV perceptual ability: sensitivity to matching vs. mismatching AV speech stimuli. Infants with Williams syndrome failed to demonstrate a McGurk effect, indicating poor AV speech integration. Moreover, while the TD children discriminated between matching and mismatching AV stimuli, none of the other groups did, hinting at a basic deficit or delay in AV speech processing, which is likely to constrain subsequent language development. Copyright © 2016 Elsevier Inc. All rights reserved.
Audio-Visual Perception System for a Humanoid Robotic Head

Directory of Open Access Journals (Sweden)

Raquel Viciana-Abad

2014-05-01

Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.
Audio-visual speech perception in prelingually deafened Japanese children following sequential bilateral cochlear implantation.

Science.gov (United States)

Yamamoto, Ryosuke; Naito, Yasushi; Tona, Risa; Moroto, Saburo; Tamaya, Rinko; Fujiwara, Keizo; Shinohara, Shogo; Takebayashi, Shinji; Kikuchi, Masahiro; Michida, Tetsuhiko

2017-11-01

An effect of audio-visual (AV) integration is observed when the auditory and visual stimuli are incongruent (the McGurk effect). In general, AV integration is helpful especially in subjects wearing hearing aids or cochlear implants (CIs). However, the influence of AV integration on spoken word recognition in individuals with bilateral CIs (Bi-CIs) has not been fully investigated so far. In this study, we investigated AV integration in children with Bi-CIs. The study sample included thirty one prelingually deafened children who underwent sequential bilateral cochlear implantation. We assessed their responses to congruent and incongruent AV stimuli with three CI-listening modes: only the 1st CI, only the 2nd CI, and Bi-CIs. The responses were assessed in the whole group as well as in two sub-groups: a proficient group (syllable intelligibility ≥80% with the 1st CI) and a non-proficient group (syllable intelligibility effect in each of the three CI-listening modes. AV integration responses were observed in a subset of incongruent AV stimuli, and the patterns observed with the 1st CI and with Bi-CIs were similar. In the proficient group, the responses with the 2nd CI were not significantly different from those with the 1st CI whereas in the non-proficient group the responses with the 2nd CI were driven by visual stimuli more than those with the 1st CI. Our results suggested that prelingually deafened Japanese children who underwent sequential bilateral cochlear implantation exhibit AV integration abilities, both in monaural listening as well as in binaural listening. We also observed a higher influence of visual stimuli on speech perception with the 2nd CI in the non-proficient group, suggesting that Bi-CIs listeners with poorer speech recognition rely on visual information more compared to the proficient subjects to compensate for poorer auditory input. Nevertheless, poorer quality auditory input with the 2nd CI did not interfere with AV integration with binaural
pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis.

Science.gov (United States)

Giannakopoulos, Theodoros

2015-01-01

Audio information plays a rather important role in the increasing digital content that is available today, resulting in a need for methodologies that automatically analyze such content: audio event recognition for home automations and surveillance systems, speech recognition, music information retrieval, multimodal analysis (e.g. audio-visual analysis of online videos for content-based recommendation), etc. This paper presents pyAudioAnalysis, an open-source Python library that provides a wide range of audio analysis procedures including: feature extraction, classification of audio signals, supervised and unsupervised segmentation and content visualization. pyAudioAnalysis is licensed under the Apache License and is available at GitHub (https://github.com/tyiannak/pyAudioAnalysis/). Here we present the theoretical background behind the wide range of the implemented methodologies, along with evaluation metrics for some of the methods. pyAudioAnalysis has been already used in several audio analysis research applications: smart-home functionalities through audio event detection, speech emotion recognition, depression classification based on audio-visual features, music segmentation, multimodal content-based movie recommendation and health applications (e.g. monitoring eating habits). The feedback provided from all these particular audio applications has led to practical enhancement of the library.
Temporal visual cues aid speech recognition

DEFF Research Database (Denmark)

Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue

2006-01-01

of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p......BACKGROUND: It is well known that under noisy conditions, viewing a speaker's articulatory movement aids the recognition of spoken words. Conventionally it is thought that the visual input disambiguates otherwise confusing auditory input. HYPOTHESIS: In contrast we hypothesize...... that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features...
Audiovisual Speech Synchrony Measure: Application to Biometrics

Directory of Open Access Journals (Sweden)

Gérard Chollet

2007-01-01

Full Text Available Speech is a means of communication which is intrinsically bimodal: the audio signal originates from the dynamics of the articulators. This paper reviews recent works in the field of audiovisual speech, and more specifically techniques developed to measure the level of correspondence between audio and visual speech. It overviews the most common audio and visual speech front-end processing, transformations performed on audio, visual, or joint audiovisual feature spaces, and the actual measure of correspondence between audio and visual speech. Finally, the use of synchrony measure for biometric identity verification based on talking faces is experimented on the BANCA database.

Speech and audio processing for coding, enhancement and recognition

CERN Document Server

Togneri, Roberto; Narasimha, Madihally

2015-01-01

This book describes the basic principles underlying the generation, coding, transmission and enhancement of speech and audio signals, including advanced statistical and machine learning techniques for speech and speaker recognition with an overview of the key innovations in these areas. Key research undertaken in speech coding, speech enhancement, speech recognition, emotion recognition and speaker diarization are also presented, along with recent advances and new paradigms in these areas. · Offers readers a single-source reference on the significant applications of speech and audio processing to speech coding, speech enhancement and speech/speaker recognition. Enables readers involved in algorithm development and implementation issues for speech coding to understand the historical development and future challenges in speech coding research; · Discusses speech coding methods yielding bit-streams that are multi-rate and scalable for Voice-over-IP (VoIP) Networks; · �...
Discriminating native from non-native speech using fusion of visual cues

NARCIS (Netherlands)

Georgakis, Christos; Petridis, Stavros; Pantic, Maja

2014-01-01

The task of classifying accent, as belonging to a native language speaker or a foreign language speaker, has been so far addressed by means of the audio modality only. However, features extracted from the visual modality have been successfully used to extend or substitute audio-only approaches
Joint evaluation of communication quality and user experience in an audio-visual virtual reality meeting

DEFF Research Database (Denmark)

Møller, Anders Kalsgaard; Hoffmann, Pablo F.; Carrozzino, Marcello

2013-01-01

The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...
Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

Science.gov (United States)

Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

2015-05-01

The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.
Atypical audiovisual speech integration in infants at risk for autism.

Directory of Open Access Journals (Sweden)

Jeanne A Guiraud

Full Text Available The language difficulties often seen in individuals with autism might stem from an inability to integrate audiovisual information, a skill important for language development. We investigated whether 9-month-old siblings of older children with autism, who are at an increased risk of developing autism, are able to integrate audiovisual speech cues. We used an eye-tracker to record where infants looked when shown a screen displaying two faces of the same model, where one face is articulating/ba/and the other/ga/, with one face congruent with the syllable sound being presented simultaneously, the other face incongruent. This method was successful in showing that infants at low risk can integrate audiovisual speech: they looked for the same amount of time at the mouths in both the fusible visual/ga/- audio/ba/and the congruent visual/ba/- audio/ba/displays, indicating that the auditory and visual streams fuse into a McGurk-type of syllabic percept in the incongruent condition. It also showed that low-risk infants could perceive a mismatch between auditory and visual cues: they looked longer at the mouth in the mismatched, non-fusible visual/ba/- audio/ga/display compared with the congruent visual/ga/- audio/ga/display, demonstrating that they perceive an uncommon, and therefore interesting, speech-like percept when looking at the incongruent mouth (repeated ANOVA: displays x fusion/mismatch conditions interaction: F(1,16 = 17.153, p = 0.001. The looking behaviour of high-risk infants did not differ according to the type of display, suggesting difficulties in matching auditory and visual information (repeated ANOVA, displays x conditions interaction: F(1,25 = 0.09, p = 0.767, in contrast to low-risk infants (repeated ANOVA: displays x conditions x low/high-risk groups interaction: F(1,41 = 4.466, p = 0.041. In some cases this reduced ability might lead to the poor communication skills characteristic of autism.
Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy

Science.gov (United States)

Ramirez, Joshua; Mann, Virginia

2005-08-01

Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.
Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

Science.gov (United States)

Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

2015-07-01

It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line
Design guidelines for the use of audio cues in computer interfaces

Energy Technology Data Exchange (ETDEWEB)

Sumikawa, D.A.; Blattner, M.M.; Joy, K.I.; Greenberg, R.M.

1985-07-01

A logical next step in the evolution of the computer-user interface is the incorporation of sound thereby using our senses of ''hearing'' in our communication with the computer. This allows our visual and auditory capacities to work in unison leading to a more effective and efficient interpretation of information received from the computer than by sight alone. In this paper we examine earcons, which are audio cues, used in the computer-user interface to provide information and feedback to the user about computer entities (these include messages and functions, as well as states and labels). The material in this paper is part of a larger study that recommends guidelines for the design and use of audio cues in the computer-user interface. The complete work examines the disciplines of music, psychology, communication theory, advertising, and psychoacoustics to discover how sound is utilized and analyzed in those areas. The resulting information is organized according to the theory of semiotics, the theory of signs, into the syntax, semantics, and pragmatics of communication by sound. Here we present design guidelines for the syntax of earcons. Earcons are constructed from motives, short sequences of notes with a specific rhythm and pitch, embellished by timbre, dynamics, and register. Compound earcons and family earcons are introduced. These are related motives that serve to identify a family of related cues. Examples of earcons are given.
Overview of the 2015 Workshop on Speech, Language and Audio in Multimedia

NARCIS (Netherlands)

Gravier, Guillaume; Jones, Gareth J.F.; Larson, Martha; Ordelman, Roeland J.F.

2015-01-01

The Workshop on Speech, Language and Audio in Multimedia (SLAM) positions itself at at the crossroad of multiple scientific fields - music and audio processing, speech processing, natural language processing and multimedia - to discuss and stimulate research results, projects, datasets and
Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech

Science.gov (United States)

Pilling, Michael; Thomas, Sharon

2011-01-01

Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…
Audio-visual biofeedback for respiratory-gated radiotherapy: Impact of audio instruction and audio-visual biofeedback on respiratory-gated radiotherapy

International Nuclear Information System (INIS)

George, Rohini; Chung, Theodore D.; Vedam, Sastry S.; Ramakrishnan, Viswanathan; Mohan, Radhe; Weiss, Elisabeth; Keall, Paul J.

2006-01-01

Purpose: Respiratory gating is a commercially available technology for reducing the deleterious effects of motion during imaging and treatment. The efficacy of gating is dependent on the reproducibility within and between respiratory cycles during imaging and treatment. The aim of this study was to determine whether audio-visual biofeedback can improve respiratory reproducibility by decreasing residual motion and therefore increasing the accuracy of gated radiotherapy. Methods and Materials: A total of 331 respiratory traces were collected from 24 lung cancer patients. The protocol consisted of five breathing training sessions spaced about a week apart. Within each session the patients initially breathed without any instruction (free breathing), with audio instructions and with audio-visual biofeedback. Residual motion was quantified by the standard deviation of the respiratory signal within the gating window. Results: Audio-visual biofeedback significantly reduced residual motion compared with free breathing and audio instruction. Displacement-based gating has lower residual motion than phase-based gating. Little reduction in residual motion was found for duty cycles less than 30%; for duty cycles above 50% there was a sharp increase in residual motion. Conclusions: The efficiency and reproducibility of gating can be improved by: incorporating audio-visual biofeedback, using a 30-50% duty cycle, gating during exhalation, and using displacement-based gating
Perception of audio-visual speech synchrony in Spanish-speaking children with and without specific language impairment.

Science.gov (United States)

Pons, Ferran; Andreu, Llorenç; Sanz-Torrent, Monica; Buil-Legaz, Lucía; Lewkowicz, David J

2013-06-01

Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666 ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component preceded [corrected] the visual component. None of the groups perceived an audiovisual asynchrony of 366 ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception.
Influences of selective adaptation on perception of audiovisual speech

Science.gov (United States)

Dias, James W.; Cook, Theresa C.; Rosenblum, Lawrence D.

2016-01-01

Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers “heard” the audio-visual stimulus as an integrated “va” percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual “va” percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. PMID:27041781
Advances in audio source seperation and multisource audio content retrieval

Science.gov (United States)

Vincent, Emmanuel

2012-06-01

Audio source separation aims to extract the signals of individual sound sources from a given recording. In this paper, we review three recent advances which improve the robustness of source separation in real-world challenging scenarios and enable its use for multisource content retrieval tasks, such as automatic speech recognition (ASR) or acoustic event detection (AED) in noisy environments. We present a Flexible Audio Source Separation Toolkit (FASST) and discuss its advantages compared to earlier approaches such as independent component analysis (ICA) and sparse component analysis (SCA). We explain how cues as diverse as harmonicity, spectral envelope, temporal fine structure or spatial location can be jointly exploited by this toolkit. We subsequently present the uncertainty decoding (UD) framework for the integration of audio source separation and audio content retrieval. We show how the uncertainty about the separated source signals can be accurately estimated and propagated to the features. Finally, we explain how this uncertainty can be efficiently exploited by a classifier, both at the training and the decoding stage. We illustrate the resulting performance improvements in terms of speech separation quality and speaker recognition accuracy.
Automated Speech and Audio Analysis for Semantic Access to Multimedia

NARCIS (Netherlands)

Jong, F.M.G. de; Ordelman, R.; Huijbregts, M.

2006-01-01

The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to
Automated speech and audio analysis for semantic access to multimedia

NARCIS (Netherlands)

de Jong, Franciska M.G.; Ordelman, Roeland J.F.; Huijbregts, M.A.H.; Avrithis, Y.; Kompatsiaris, Y.; Staab, S.; O' Connor, N.E.

2006-01-01

The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to
Sensitivity to audio-visual synchrony and its relation to language abilities in children with and without ASD.

Science.gov (United States)

Righi, Giulia; Tenenbaum, Elena J; McCormick, Carolyn; Blossom, Megan; Amso, Dima; Sheinkopf, Stephen J

2018-04-01

Autism Spectrum Disorder (ASD) is often accompanied by deficits in speech and language processing. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to examine whether young children with ASD show reduced sensitivity to temporal asynchronies in a speech processing task when compared to typically developing controls, and to examine how this sensitivity might relate to language proficiency. Using automated eye tracking methods, we found that children with ASD failed to demonstrate sensitivity to asynchronies of 0.3s, 0.6s, or 1.0s between a video of a woman speaking and the corresponding audio track. In contrast, typically developing children who were language-matched to the ASD group, were sensitive to both 0.6s and 1.0s asynchronies. We also demonstrated that individual differences in sensitivity to audiovisual asynchronies and individual differences in orientation to relevant facial features were both correlated with scores on a standardized measure of language abilities. Results are discussed in the context of attention to visual language and audio-visual processing as potential precursors to language impairment in ASD. Autism Res 2018, 11: 645-653. © 2018 International Society for Autism Research, Wiley Periodicals, Inc. Speech processing relies heavily on the integration of auditory and visual information, and it has been suggested that the ability to detect correspondence between auditory and visual signals helps to lay the foundation for successful language development. The goal of the present study was to explore whether children with ASD process audio-visual synchrony in ways comparable to their typically developing peers, and the relationship between preference for synchrony and language ability. Results showed that
Working Memory and Speech Recognition in Noise under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type among Adults with Hearing Loss

Science.gov (United States)

Miller, Christi W.; Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly

2017-01-01

Purpose: This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method: Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2…
Spatio-temporal distribution of brain activity associated with audio-visually congruent and incongruent speech and the McGurk Effect.

Science.gov (United States)

Pratt, Hillel; Bleich, Naomi; Mittelman, Nomi

2015-11-01

Spatio-temporal distributions of cortical activity to audio-visual presentations of meaningless vowel-consonant-vowels and the effects of audio-visual congruence/incongruence, with emphasis on the McGurk effect, were studied. The McGurk effect occurs when a clearly audible syllable with one consonant, is presented simultaneously with a visual presentation of a face articulating a syllable with a different consonant and the resulting percept is a syllable with a consonant other than the auditorily presented one. Twenty subjects listened to pairs of audio-visually congruent or incongruent utterances and indicated whether pair members were the same or not. Source current densities of event-related potentials to the first utterance in the pair were estimated and effects of stimulus-response combinations, brain area, hemisphere, and clarity of visual articulation were assessed. Auditory cortex, superior parietal cortex, and middle temporal cortex were the most consistently involved areas across experimental conditions. Early (visual cortex. Clarity of visual articulation impacted activity in secondary visual cortex and Wernicke's area. McGurk perception was associated with decreased activity in primary and secondary auditory cortices and Wernicke's area before 100 msec, increased activity around 100 msec which decreased again around 180 msec. Activity in Broca's area was unaffected by McGurk perception and was only increased to congruent audio-visual stimuli 30-70 msec following consonant onset. The results suggest left hemisphere prominence in the effects of stimulus and response conditions on eight brain areas involved in dynamically distributed parallel processing of audio-visual integration. Initially (30-70 msec) subcortical contributions to auditory cortex, superior parietal cortex, and middle temporal cortex occur. During 100-140 msec, peristriate visual influences and Wernicke's area join in the processing. Resolution of incongruent audio-visual inputs is then
Segmentation cues in conversational speech: Robust semantics and fragile phonotactics

Directory of Open Access Journals (Sweden)

Laurence eWhite

2012-10-01

Full Text Available Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs hypoarticulation (H&H – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs read, using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylised landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech.Independent of speech style, we found an interaction between cue valence (favourable/unfavourable and cue type (phonotactics/semantics. Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behaviour. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically-based cues in the segmentation of natural

Consequence of audio visual collection in school libraries

OpenAIRE

Kuri, Ramesh

2016-01-01

The collection of Audio-Visual in library plays important role in teaching and learning. The importance of audio visual (AV) technology in education should not be underestimated. If audio-visual collection in library is carefully planned and designed, it can provide a rich learning environment. In this article, an author discussed the consequences of Audio-Visual collection in libraries especially for students of school library
Visual communication and the content and style of conversation.

Science.gov (United States)

Rutter, D R; Stephenson, G M; Dewey, M E

1981-02-01

Previous research suggests that visual communication plays a number of important roles in social interaction. In particular, it appears to influence the content of what people say in discussions, the style of their speech, and the outcomes they reach. However, the findings are based exclusively on comparisons between face-to-face conversations and audio conversations, in which subjects sit in separate rooms and speak over a microphone-headphone intercom which precludes visual communication. Interpretation is difficult, because visual communication is confounded with physical presence, which itself makes available certain cues denied to audio subjects. The purpose of this paper is to report two experiments in which the variables were separated and content and style were re-examined. The first made use of blind subjects, and again compared the face-to-face and audio conditions. The second returned to sighted subjects, and examined four experimental conditions: face-to-face; audio; a curtain condition in which subjects sat in the same room but without visual communication; and a video condition in which they sat in separate rooms and communicated over a television link. Neither visual communication nor physical presence proved to be critical variable. Instead, the two sources of cues combined, such that content and style were influenced by the aggregate of available cues. The more cueless the settings, the more task-oriented, depersonalized and unspontaneous the conversation. The findings also suggested that the primary effect of cuelessness is to influence verbal content, and that its influence on both style and outcome occurs indirectly, through the mediation of content.
Visual form Cues, Biological Motions, Auditory Cues, and Even Olfactory Cues Interact to Affect Visual Sex Discriminations

OpenAIRE

Rick Van Der Zwan; Anna Brooks; Duncan Blair; Coralia Machatch; Graeme Hacker

2011-01-01

Johnson and Tassinary (2005) proposed that visually perceived sex is signalled by structural or form cues. They suggested also that biological motion cues signal sex, but do so indirectly. We previously have shown that auditory cues can mediate visual sex perceptions (van der Zwan et al., 2009). Here we demonstrate that structural cues to body shape are alone sufficient for visual sex discriminations but that biological motion cues alone are not. Interestingly, biological motions can resolve ...
Fusion for Audio-Visual Laughter Detection

NARCIS (Netherlands)

Reuderink, B.

2007-01-01

Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed
Perception of co-speech gestures in aphasic patients: a visual exploration study during the observation of dyadic conversations.

Science.gov (United States)

Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M

2015-03-01

Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.
A scheme for racquet sports video analysis with the combination of audio-visual information

Science.gov (United States)

Xing, Liyuan; Ye, Qixiang; Zhang, Weigang; Huang, Qingming; Yu, Hua

2005-07-01

As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results.
Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

Science.gov (United States)

Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

2011-01-01

Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…
Perception of the Multisensory Coherence of Fluent Audiovisual Speech in Infancy: Its Emergence & the Role of Experience

Science.gov (United States)

Lewkowicz, David J.; Minar, Nicholas J.; Tift, Amy H.; Brandon, Melissa

2014-01-01

To investigate the developmental emergence of the ability to perceive the multisensory coherence of native and non-native audiovisual fluent speech, we tested 4-, 8–10, and 12–14 month-old English-learning infants. Infants first viewed two identical female faces articulating two different monologues in silence and then in the presence of an audible monologue that matched the visible articulations of one of the faces. Neither the 4-month-old nor the 8–10 month-old infants exhibited audio-visual matching in that neither group exhibited greater looking at the matching monologue. In contrast, the 12–14 month-old infants exhibited matching and, consistent with the emergence of perceptual expertise for the native language, they perceived the multisensory coherence of native-language monologues earlier in the test trials than of non-native language monologues. Moreover, the matching of native audible and visible speech streams observed in the 12–14 month olds did not depend on audio-visual synchrony whereas the matching of non-native audible and visible speech streams did depend on synchrony. Overall, the current findings indicate that the perception of the multisensory coherence of fluent audiovisual speech emerges late in infancy, that audio-visual synchrony cues are more important in the perception of the multisensory coherence of non-native than native audiovisual speech, and that the emergence of this skill most likely is affected by perceptual narrowing. PMID:25462038
Audio-Visual Classification of Sports Types

DEFF Research Database (Denmark)

Gade, Rikke; Abou-Zleikha, Mohamed; Christensen, Mads Græsbøll

2015-01-01

In this work we propose a method for classification of sports types from combined audio and visual features ex- tracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modali...
Effects of virtual speaker density and room reverberation on spatiotemporal thresholds of audio-visual motion coherence.

Directory of Open Access Journals (Sweden)

Narayan Sankaran

Full Text Available The present study examined the effects of spatial sound-source density and reverberation on the spatiotemporal window for audio-visual motion coherence. Three different acoustic stimuli were generated in Virtual Auditory Space: two acoustically "dry" stimuli via the measurement of anechoic head-related impulse responses recorded at either 1° or 5° spatial intervals (Experiment 1, and a reverberant stimulus rendered from binaural room impulse responses recorded at 5° intervals in situ in order to capture reverberant acoustics in addition to head-related cues (Experiment 2. A moving visual stimulus with invariant localization cues was generated by sequentially activating LED's along the same radial path as the virtual auditory motion. Stimuli were presented at 25°/s, 50°/s and 100°/s with a random spatial offset between audition and vision. In a 2AFC task, subjects made a judgment of the leading modality (auditory or visual. No significant differences were observed in the spatial threshold based on the point of subjective equivalence (PSE or the slope of psychometric functions (β across all three acoustic conditions. Additionally, both the PSE and β did not significantly differ across velocity, suggesting a fixed spatial window of audio-visual separation. Findings suggest that there was no loss in spatial information accompanying the reduction in spatial cues and reverberation levels tested, and establish a perceptual measure for assessing the veracity of motion generated from discrete locations and in echoic environments.
Design guidelines for audio presentation of graphs and tables

OpenAIRE

Brown, L.M.; Brewster, S.A.; Ramloll, S.A.; Burton, R.; Riedel, B.

2003-01-01

Audio can be used to make visualisations accessible to blind and visually impaired people. The MultiVis Project has carried out research into suitable methods for presenting graphs and tables to blind people through the use of both speech and non-speech audio. This paper presents guidelines extracted from this research. These guidelines will enable designers to implement visualisation systems for blind and visually impaired users, and will provide a framework for researchers wishing to invest...
Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus.

Directory of Open Access Journals (Sweden)

Mary Flaherty

Full Text Available The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated, Passive speech exposure (regular exposure to human speech, and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.
Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration.

Science.gov (United States)

Stropahl, Maren; Debener, Stefan

2017-01-01

There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI) users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users ( n = 18), untreated mild to moderately hearing impaired individuals (n = 18) and normal hearing controls ( n = 17). Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the auditory system
Auditory cross-modal reorganization in cochlear implant users indicates audio-visual integration

Directory of Open Access Journals (Sweden)

Maren Stropahl

2017-01-01

Full Text Available There is clear evidence for cross-modal cortical reorganization in the auditory system of post-lingually deafened cochlear implant (CI users. A recent report suggests that moderate sensori-neural hearing loss is already sufficient to initiate corresponding cortical changes. To what extend these changes are deprivation-induced or related to sensory recovery is still debated. Moreover, the influence of cross-modal reorganization on CI benefit is also still unclear. While reorganization during deafness may impede speech recovery, reorganization also has beneficial influences on face recognition and lip-reading. As CI users were observed to show differences in multisensory integration, the question arises if cross-modal reorganization is related to audio-visual integration skills. The current electroencephalography study investigated cortical reorganization in experienced post-lingually deafened CI users (n = 18, untreated mild to moderately hearing impaired individuals (n = 18 and normal hearing controls (n = 17. Cross-modal activation of the auditory cortex by means of EEG source localization in response to human faces and audio-visual integration, quantified with the McGurk illusion, were measured. CI users revealed stronger cross-modal activations compared to age-matched normal hearing individuals. Furthermore, CI users showed a relationship between cross-modal activation and audio-visual integration strength. This may further support a beneficial relationship between cross-modal activation and daily-life communication skills that may not be fully captured by laboratory-based speech perception tests. Interestingly, hearing impaired individuals showed behavioral and neurophysiological results that were numerically between the other two groups, and they showed a moderate relationship between cross-modal activation and the degree of hearing loss. This further supports the notion that auditory deprivation evokes a reorganization of the
Simulator study of the effect of visual-motion time delays on pilot tracking performance with an audio side task

Science.gov (United States)

Riley, D. R.; Miller, G. K., Jr.

1978-01-01

The effect of time delay was determined in the visual and motion cues in a flight simulator on pilot performance in tracking a target aircraft that was oscillating sinusoidally in altitude only. An audio side task was used to assure the subject was fully occupied at all times. The results indicate that, within the test grid employed, about the same acceptable time delay (250 msec) was obtained for a single aircraft (fighter type) by each of two subjects for both fixed-base and motion-base conditions. Acceptable time delay is defined as the largest amount of delay that can be inserted simultaneously into the visual and motion cues before performance degradation occurs. A statistical analysis of the data was made to establish this value of time delay. Audio side task provided quantitative data that documented the subject's work level.
Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

Directory of Open Access Journals (Sweden)

Elena V Kushnerenko

2013-07-01

Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.
Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus.

Science.gov (United States)

Venezia, Jonathan H; Vaden, Kenneth I; Rong, Feng; Maddox, Dale; Saberi, Kourosh; Hickok, Gregory

2017-01-01

The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.
Speech entrainment enables patients with Broca’s aphasia to produce fluent speech

Science.gov (United States)

Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-01-01

A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production
Voice activity detection using audio-visual information

DEFF Research Database (Denmark)

Petsatodis, Theodore; Pnevmatikakis, Aristodemos; Boukis, Christos

2009-01-01

An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post...
Neurophysiological evidence for the interplay of speech segmentation and word-referent mapping during novel word learning.

Science.gov (United States)

François, Clément; Cunillera, Toni; Garcia, Enara; Laine, Matti; Rodriguez-Fornells, Antoni

2017-04-01

Learning a new language requires the identification of word units from continuous speech (the speech segmentation problem) and mapping them onto conceptual representation (the word to world mapping problem). Recent behavioral studies have revealed that the statistical properties found within and across modalities can serve as cues for both processes. However, segmentation and mapping have been largely studied separately, and thus it remains unclear whether both processes can be accomplished at the same time and if they share common neurophysiological features. To address this question, we recorded EEG of 20 adult participants during both an audio alone speech segmentation task and an audiovisual word-to-picture association task. The participants were tested for both the implicit detection of online mismatches (structural auditory and visual semantic violations) as well as for the explicit recognition of words and word-to-picture associations. The ERP results from the learning phase revealed a delayed learning-related fronto-central negativity (FN400) in the audiovisual condition compared to the audio alone condition. Interestingly, while online structural auditory violations elicited clear MMN/N200 components in the audio alone condition, visual-semantic violations induced meaning-related N400 modulations in the audiovisual condition. The present results support the idea that speech segmentation and meaning mapping can take place in parallel and act in synergy to enhance novel word learning. Copyright © 2016 Elsevier Ltd. All rights reserved.

Voice over: Audio-visual congruency and content recall in the gallery setting.

Science.gov (United States)

Fairhurst, Merle T; Scott, Minnie; Deroy, Ophelia

2017-01-01

Experimental research has shown that pairs of stimuli which are congruent and assumed to 'go together' are recalled more effectively than an item presented in isolation. Will this multisensory memory benefit occur when stimuli are richer and longer, in an ecological setting? In the present study, we focused on an everyday situation of audio-visual learning and manipulated the relationship between audio guide tracks and viewed portraits in the galleries of the Tate Britain. By varying the gender and narrative style of the voice-over, we examined how the perceived congruency and assumed unity of the audio guide track with painted portraits affected subsequent recall. We show that tracks perceived as best matching the viewed portraits led to greater recall of both sensory and linguistic content. We provide the first evidence that manipulating crossmodal congruence and unity assumptions can effectively impact memory in a multisensory ecological setting, even in the absence of precise temporal alignment between sensory cues.
Estimating the relative weights of visual and auditory tau versus heuristic-based cues for time-to-contact judgments in realistic, familiar scenes by older and younger adults.

Science.gov (United States)

Keshavarz, Behrang; Campos, Jennifer L; DeLucia, Patricia R; Oberfeld, Daniel

2017-04-01

Estimating time to contact (TTC) involves multiple sensory systems, including vision and audition. Previous findings suggested that the ratio of an object's instantaneous optical size/sound intensity to its instantaneous rate of change in optical size/sound intensity (τ) drives TTC judgments. Other evidence has shown that heuristic-based cues are used, including final optical size or final sound pressure level. Most previous studies have used decontextualized and unfamiliar stimuli (e.g., geometric shapes on a blank background). Here we evaluated TTC estimates by using a traffic scene with an approaching vehicle to evaluate the weights of visual and auditory TTC cues under more realistic conditions. Younger (18-39 years) and older (65+ years) participants made TTC estimates in three sensory conditions: visual-only, auditory-only, and audio-visual. Stimuli were presented within an immersive virtual-reality environment, and cue weights were calculated for both visual cues (e.g., visual τ, final optical size) and auditory cues (e.g., auditory τ, final sound pressure level). The results demonstrated the use of visual τ as well as heuristic cues in the visual-only condition. TTC estimates in the auditory-only condition, however, were primarily based on an auditory heuristic cue (final sound pressure level), rather than on auditory τ. In the audio-visual condition, the visual cues dominated overall, with the highest weight being assigned to visual τ by younger adults, and a more equal weighting of visual τ and heuristic cues in older adults. Overall, better characterizing the effects of combined sensory inputs, stimulus characteristics, and age on the cues used to estimate TTC will provide important insights into how these factors may affect everyday behavior.
Neural networks supporting audiovisual integration for speech: A large-scale lesion study.

Science.gov (United States)

Hickok, Gregory; Rogalsky, Corianne; Matchin, William; Basilakos, Alexandra; Cai, Julia; Pillay, Sara; Ferrill, Michelle; Mickelsen, Soren; Anderson, Steven W; Love, Tracy; Binder, Jeffrey; Fridriksson, Julius

2018-06-01

Auditory and visual speech information are often strongly integrated resulting in perceptual enhancements for audiovisual (AV) speech over audio alone and sometimes yielding compelling illusory fusion percepts when AV cues are mismatched, the McGurk-MacDonald effect. Previous research has identified three candidate regions thought to be critical for AV speech integration: the posterior superior temporal sulcus (STS), early auditory cortex, and the posterior inferior frontal gyrus. We assess the causal involvement of these regions (and others) in the first large-scale (N = 100) lesion-based study of AV speech integration. Two primary findings emerged. First, behavioral performance and lesion maps for AV enhancement and illusory fusion measures indicate that classic metrics of AV speech integration are not necessarily measuring the same process. Second, lesions involving superior temporal auditory, lateral occipital visual, and multisensory zones in the STS are the most disruptive to AV speech integration. Further, when AV speech integration fails, the nature of the failure-auditory vs visual capture-can be predicted from the location of the lesions. These findings show that AV speech processing is supported by unimodal auditory and visual cortices as well as multimodal regions such as the STS at their boundary. Motor related frontal regions do not appear to play a role in AV speech integration. Copyright © 2018 Elsevier Ltd. All rights reserved.
Contribution of Prosody in Audio-Visual Integration to Emotional Perception of Virtual Characters

Directory of Open Access Journals (Sweden)

Ekaterina Volkova

2011-10-01

Full Text Available Recent technology provides us with realistic looking virtual characters. Motion capture and elaborate mathematical models supply data for natural looking, controllable facial and bodily animations. With the help of computational linguistics and artificial intelligence, we can automatically assign emotional categories to appropriate stretches of text for a simulation of those social scenarios where verbal communication is important. All this makes virtual characters a valuable tool for creation of versatile stimuli for research on the integration of emotion information from different modalities. We conducted an audio-visual experiment to investigate the differential contributions of emotional speech and facial expressions on emotion identification. We used recorded and synthesized speech as well as dynamic virtual faces, all enhanced for seven emotional categories. The participants were asked to recognize the prevalent emotion of paired faces and audio. Results showed that when the voice was recorded, the vocalized emotion influenced participants' emotion identification more than the facial expression. However, when the voice was synthesized, facial expression influenced participants' emotion identification more than vocalized emotion. Additionally, individuals did worse on identifying either the facial expression or vocalized emotion when the voice was synthesized. Our experimental method can help to determine how to improve synthesized emotional speech.
Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

Directory of Open Access Journals (Sweden)

Héctor Delgado

2015-06-01

This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.
The Fungible Audio-Visual Mapping and its Experience

Directory of Open Access Journals (Sweden)

Adriana Sa

2014-12-01

Full Text Available This article draws a perceptual approach to audio-visual mapping. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping. Our aim here is to glean its constitution and aspect. We will report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, after which we pose quantitative and qualitative questions regarding their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole.
How hearing aids, background noise, and visual cues influence objective listening effort.

Science.gov (United States)

Picou, Erin M; Ricketts, Todd A; Hornsby, Benjamin W Y

2013-09-01

The purpose of this article was to evaluate factors that influence the listening effort experienced when processing speech for people with hearing loss. Specifically, the change in listening effort resulting from introducing hearing aids, visual cues, and background noise was evaluated. An additional exploratory aim was to investigate the possible relationships between the magnitude of listening effort change and individual listeners' working memory capacity, verbal processing speed, or lipreading skill. Twenty-seven participants with bilateral sensorineural hearing loss were fitted with linear behind-the-ear hearing aids and tested using a dual-task paradigm designed to evaluate listening effort. The primary task was monosyllable word recognition and the secondary task was a visual reaction time task. The test conditions varied by hearing aids (unaided, aided), visual cues (auditory-only, auditory-visual), and background noise (present, absent). For all participants, the signal to noise ratio was set individually so that speech recognition performance in noise was approximately 60% in both the auditory-only and auditory-visual conditions. In addition to measures of listening effort, working memory capacity, verbal processing speed, and lipreading ability were measured using the Automated Operational Span Task, a Lexical Decision Task, and the Revised Shortened Utley Lipreading Test, respectively. In general, the effects measured using the objective measure of listening effort were small (~10 msec). Results indicated that background noise increased listening effort, and hearing aids reduced listening effort, while visual cues did not influence listening effort. With regard to the individual variables, verbal processing speed was negatively correlated with hearing aid benefit for listening effort; faster processors were less likely to derive benefit. Working memory capacity, verbal processing speed, and lipreading ability were related to benefit from visual cues. No
Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

Directory of Open Access Journals (Sweden)

Héctor Delgado

2015-12-01

Full Text Available This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.
Neural pathways for visual speech perception

Directory of Open Access Journals (Sweden)

Lynne E Bernstein

2014-12-01

Full Text Available This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1 The visual perception of speech relies on visual pathway representations of speech qua speech. (2 A proposed site of these representations, the temporal visual speech area (TVSA has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS. (3 Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA.
Use of amplitude modulation cues recovered from frequency modulation for cochlear implant users when original speech cues are severely degraded.

Science.gov (United States)

Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T

2014-06-01

Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.
Decision-level fusion for audio-visual laughter detection

NARCIS (Netherlands)

Reuderink, B.; Poel, M.; Truong, K.; Poppe, R.; Pantic, M.

2008-01-01

Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is
Decision-Level Fusion for Audio-Visual Laughter Detection

NARCIS (Netherlands)

Reuderink, B.; Poel, Mannes; Truong, Khiet Phuong; Poppe, Ronald Walter; Pantic, Maja; Popescu-Belis, Andrei; Stiefelhagen, Rainer

Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laugh- ter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio- visual laughter detection is
'You see?' Teaching and learning how to interpret visual cues during surgery.

Science.gov (United States)

Cope, Alexandra C; Bezemer, Jeff; Kneebone, Roger; Lingard, Lorelei

2015-11-01

The ability to interpret visual cues is important in many medical specialties, including surgery, in which poor outcomes are largely attributable to errors of perception rather than poor motor skills. However, we know little about how trainee surgeons learn to make judgements in the visual domain. We explored how trainees learn visual cue interpretation in the operating room. A multiple case study design was used. Participants were postgraduate surgical trainees and their trainers. Data included observer field notes, and integrated video- and audio-recordings from 12 cases representing more than 11 hours of observation. A constant comparative methodology was used to identify dominant themes. Visual cue interpretation was a recurrent feature of trainer-trainee interactions and was achieved largely through the pedagogic mechanism of co-construction. Co-construction was a dialogic sequence between trainer and trainee in which they explored what they were looking at together to identify and name structures or pathology. Co-construction took two forms: 'guided co-construction', in which the trainer steered the trainee to see what the trainer was seeing, and 'authentic co-construction', in which neither trainer nor trainee appeared certain of what they were seeing and pieced together the information collaboratively. Whether the co-construction activity was guided or authentic appeared to be influenced by case difficulty and trainee seniority. Co-construction was shown to occur verbally, through discussion, and also through non-verbal exchanges in which gestures made with laparoscopic instruments contributed to the co-construction discourse. In the training setting, learning visual cue interpretation occurs in part through co-construction. Co-construction is a pedagogic phenomenon that is well recognised in the context of learning to interpret verbal information. In articulating the features of co-construction in the visual domain, this work enables the development of
Internet Video Telephony Allows Speech Reading by Deaf Individuals and Improves Speech Perception by Cochlear Implant Users

Science.gov (United States)

Mantokoudis, Georgios; Dähler, Claudia; Dubach, Patrick; Kompis, Martin; Caversaccio, Marco D.; Senn, Pascal

2013-01-01

Objective To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI) users. Methods Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM) sentence test. We presented video simulations using different video resolutions (1280×720, 640×480, 320×240, 160×120 px), frame rates (30, 20, 10, 7, 5 frames per second (fps)), speech velocities (three different speakers), webcameras (Logitech Pro9000, C600 and C500) and image/sound delays (0–500 ms). All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. Results Higher frame rate (>7 fps), higher camera resolution (>640×480 px) and shorter picture/sound delay (<100 ms) were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009) in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11) showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032). Conclusion Webcameras have the potential to improve telecommunication of hearing-impaired individuals. PMID:23359119
Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

Directory of Open Access Journals (Sweden)

Georgios Mantokoudis

Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.
Talker Variability in Audiovisual Speech Perception

Directory of Open Access Journals (Sweden)

Shannon eHeald

2014-07-01

Full Text Available A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition. So far, this talker-variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target-word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.
Rehabilitation of balance-impaired stroke patients through audio-visual biofeedback

DEFF Research Database (Denmark)

Gheorghe, Cristina; Nissen, Thomas; Juul Rosengreen Christensen, Daniel

2015-01-01

This study explored how audio-visual biofeedback influences physical balance of seven balance-impaired stroke patients, between 33–70 years-of-age. The setup included a bespoke balance board and a music rhythm game. The procedure was designed as follows: (1) a control group who performed a balance...... training exercise without any technological input, (2) a visual biofeedback group, performing via visual input, and (3) an audio-visual biofeedback group, performing via audio and visual input. Results retrieved from comparisons between the data sets (2) and (3) suggested superior postural stability...
Perception of Speech Modulation Cues by 6-Month-Old Infants

Science.gov (United States)

Cabrera, Laurianne; Bertoncini, Josiane; Lorenzi, Christian

2013-01-01

Purpose: The capacity of 6-month-old infants to discriminate a voicing contrast (/aba/--/apa/) on the basis of "amplitude modulation (AM) cues" and "frequency modulation (FM) cues" was evaluated. Method: Several vocoded speech conditions were designed to either degrade FM cues in 4 or 32 bands or degrade AM in 32 bands. Infants…
Visual Temporal Acuity Is Related to Auditory Speech Perception Abilities in Cochlear Implant Users.

Science.gov (United States)

Jahn, Kelly N; Stevenson, Ryan A; Wallace, Mark T

Despite significant improvements in speech perception abilities following cochlear implantation, many prelingually deafened cochlear implant (CI) recipients continue to rely heavily on visual information to develop speech and language. Increased reliance on visual cues for understanding spoken language could lead to the development of unique audiovisual integration and visual-only processing abilities in these individuals. Brain imaging studies have demonstrated that good CI performers, as indexed by auditory-only speech perception abilities, have different patterns of visual cortex activation in response to visual and auditory stimuli as compared with poor CI performers. However, no studies have examined whether speech perception performance is related to any type of visual processing abilities following cochlear implantation. The purpose of the present study was to provide a preliminary examination of the relationship between clinical, auditory-only speech perception tests, and visual temporal acuity in prelingually deafened adult CI users. It was hypothesized that prelingually deafened CI users, who exhibit better (i.e., more acute) visual temporal processing abilities would demonstrate better auditory-only speech perception performance than those with poorer visual temporal acuity. Ten prelingually deafened adult CI users were recruited for this study. Participants completed a visual temporal order judgment task to quantify visual temporal acuity. To assess auditory-only speech perception abilities, participants completed the consonant-nucleus-consonant word recognition test and the AzBio sentence recognition test. Results were analyzed using two-tailed partial Pearson correlations, Spearman's rho correlations, and independent samples t tests. Visual temporal acuity was significantly correlated with auditory-only word and sentence recognition abilities. In addition, proficient CI users, as assessed via auditory-only speech perception performance, demonstrated
Tools for signal compression applications to speech and audio coding

CERN Document Server

Moreau, Nicolas

2013-01-01

This book presents tools and algorithms required to compress/uncompress signals such as speech and music. These algorithms are largely used in mobile phones, DVD players, HDTV sets, etc. In a first rather theoretical part, this book presents the standard tools used in compression systems: scalar and vector quantization, predictive quantization, transform quantization, entropy coding. In particular we show the consistency between these different tools. The second part explains how these tools are used in the latest speech and audio coders. The third part gives Matlab programs simulating t

Measuring 3D Audio Localization Performance and Speech Quality of Conferencing Calls for a Multiparty Communication System

Directory of Open Access Journals (Sweden)

Mansoor Hyder

2013-07-01

Full Text Available Communication systems which support 3D (Three Dimensional audio offer a couple of advantages to the users/customers. Firstly, within the virtual acoustic environments all participants could easily be recognized through their placement/sitting positions. Secondly, all participants can turn their focus on any particular talker when multiple participants start talking at the same time by taking advantage of the natural listening tendency which is called the Cocktail Party Effect. On the other hand, 3D audio is known as a decreasing factor for overall speech quality because of the commencement of reverberations and echoes within the listening environment. In this article, we study the tradeoff between speech quality and human natural ability of localizing audio events/or talkers within our three dimensional audio supported telephony and teleconferencing solution. Further, we performed subjective user studies by incorporating two different HRTFs (Head Related Transfer Functions, different placements of the teleconferencing participants and different layouts of the virtual environments. Moreover, subjective user studies results for audio event localization and subjective speech quality are presented in this article. This subjective user study would help the research community to optimize the existing 3D audio systems and to design new 3D audio supported teleconferencing solutions based on the quality of experience requirements of the users/customers for agriculture personal in particular and for all potential users in general.
Measuring 3D Audio Localization Performance and Speech Quality of Conferencing Calls for a Multiparty Communication System

International Nuclear Information System (INIS)

Hyder, M.; Menghwar, G.D.; Qureshi, A.

2013-01-01

Communication systems which support 3D (Three Dimensional) audio offer a couple of advantages to the users/customers. Firstly, within the virtual acoustic environments all participants could easily be recognized through their placement/sitting positions. Secondly, all participants can turn their focus on any particular talker when multiple participants start talking at the same time by taking advantage of the natural listening tendency which is called the Cocktail Party Effect. On the other hand, 3D audio is known as a decreasing factor for overall speech quality because of the commencement of reverberations and echoes within the listening environment. In this article, we study the tradeoff between speech quality and human natural ability of localizing audio events/or talkers within our three dimensional audio supported telephony and teleconferencing solution. Further, we performed subjective user studies by incorporating two different HRTFs (Head Related Transfer Functions), different placements of the teleconferencing participants and different layouts of the virtual environments. Moreover, subjective user studies results for audio event localization and subjective speech quality are presented in this article. This subjective user study would help the research community to optimize the existing 3D audio systems and to design new 3D audio supported teleconferencing solutions based on the quality of experience requirements of the users/customers for agriculture personal in particular and for all potential users in general. (author)
Comparative evaluation of audio and audio - tactile methods to improve oral hygiene status of visually impaired school children

OpenAIRE

R Krishnakumar; Swarna Swathi Silla; Sugumaran K Durai; Mohan Govindarajan; Syed Shaheed Ahamed; Logeshwari Mathivanan

2016-01-01

Background: Visually impaired children are unable to maintain good oral hygiene, as their tactile abilities are often underdeveloped owing to their visual disturbances. Conventional brushing techniques are often poorly comprehended by these children and hence, it was decided to evaluate the effectiveness of audio and audio-tactile methods in improving the oral hygiene of these children. Objective: To evaluate and compare the effectiveness of audio and audio-tactile methods in improving oral h...
Prediction and constraint in audiovisual speech perception

Science.gov (United States)

Peelle, Jonathan E.; Sommers, Mitchell S.

2015-01-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported
Prediction and constraint in audiovisual speech perception.

Science.gov (United States)

Peelle, Jonathan E; Sommers, Mitchell S

2015-07-01

During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
Documentary management of the sport audio-visual information in the generalist televisions

OpenAIRE

Jorge Caldera Serrano; Felipe Alonso

2007-01-01

The management of the sport audio-visual documentation of the Information Systems of the state, zonal and local chains is analyzed within the framework. For it it is made makes a route by the documentary chain that makes the sport audio-visual information with the purpose of being analyzing each one of the parameters, showing therefore a series of recommendations and norms for the preparation of the sport audio-visual registry. Evidently the audio-visual sport documentation difference i...
Haptic and Audio-visual Stimuli: Enhancing Experiences and Interaction

NARCIS (Netherlands)

Nijholt, Antinus; Dijk, Esko O.; Lemmens, Paul M.C.; Luitjens, S.B.

2010-01-01

The intention of the symposium on Haptic and Audio-visual stimuli at the EuroHaptics 2010 conference is to deepen the understanding of the effect of combined Haptic and Audio-visual stimuli. The knowledge gained will be used to enhance experiences and interactions in daily life. To this end, a
Visual Grouping in Accordance With Utterance Planning Facilitates Speech Production

Directory of Open Access Journals (Sweden)

Liming Zhao

2018-03-01

Full Text Available Research on language production has focused on the process of utterance planning and involved studying the synchronization between visual gaze and the production of sentences that refer to objects in the immediate visual environment. However, it remains unclear how the visual grouping of these objects might influence this process. To shed light on this issue, the present research examined the effects of the visual grouping of objects in a visual display on utterance planning in two experiments. Participants produced utterances of the form “The snail and the necklace are above/below/on the left/right side of the toothbrush” for objects containing these referents (e.g., a snail, a necklace and a toothbrush. These objects were grouped using classic Gestalt principles of color similarity (Experiment 1 and common region (Experiment 2 so that the induced perceptual grouping was congruent or incongruent with the required phrasal organization. The results showed that speech onset latencies were shorter in congruent than incongruent conditions. The findings therefore reveal that the congruency between the visual grouping of referents and the required phrasal organization can influence speech production. Such findings suggest that, when language is produced in a visual context, speakers make use of both visual and linguistic cues to plan utterances.
Visual Grouping in Accordance With Utterance Planning Facilitates Speech Production.

Science.gov (United States)

Zhao, Liming; Paterson, Kevin B; Bai, Xuejun

2018-01-01

Research on language production has focused on the process of utterance planning and involved studying the synchronization between visual gaze and the production of sentences that refer to objects in the immediate visual environment. However, it remains unclear how the visual grouping of these objects might influence this process. To shed light on this issue, the present research examined the effects of the visual grouping of objects in a visual display on utterance planning in two experiments. Participants produced utterances of the form "The snail and the necklace are above/below/on the left/right side of the toothbrush" for objects containing these referents (e.g., a snail, a necklace and a toothbrush). These objects were grouped using classic Gestalt principles of color similarity (Experiment 1) and common region (Experiment 2) so that the induced perceptual grouping was congruent or incongruent with the required phrasal organization. The results showed that speech onset latencies were shorter in congruent than incongruent conditions. The findings therefore reveal that the congruency between the visual grouping of referents and the required phrasal organization can influence speech production. Such findings suggest that, when language is produced in a visual context, speakers make use of both visual and linguistic cues to plan utterances.
Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition

Directory of Open Access Journals (Sweden)

Michalis Papakostas

2017-06-01

Full Text Available Emotion recognition from speech may play a crucial role in many applications related to human–computer interaction or understanding the affective state of users in certain tasks, where other modalities such as video or physiological parameters are unavailable. In general, a human’s emotions may be recognized using several modalities such as analyzing facial expressions, speech, physiological parameters (e.g., electroencephalograms, electrocardiograms etc. However, measuring of these modalities may be difficult, obtrusive or require expensive hardware. In that context, speech may be the best alternative modality in many practical applications. In this work we present an approach that uses a Convolutional Neural Network (CNN functioning as a visual feature extractor and trained using raw speech information. In contrast to traditional machine learning approaches, CNNs are responsible for identifying the important features of the input thus, making the need of hand-crafted feature engineering optional in many tasks. In this paper no extra features are required other than the spectrogram representations and hand-crafted features were only extracted for validation purposes of our method. Moreover, it does not require any linguistic model and is not specific to any particular language. We compare the proposed approach using cross-language datasets and demonstrate that it is able to provide superior results vs. traditional ones that use hand-crafted features.
Audio-visual synchrony and feature-selective attention co-amplify early visual processing.

Science.gov (United States)

Keitel, Christian; Müller, Matthias M

2016-05-01

Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space.
Audio-visual feedback improves the BCI performance in the navigational control of a humanoid robot

Directory of Open Access Journals (Sweden)

Emmanuele eTidoni

2014-06-01

Full Text Available Advancement in brain computer interfaces (BCI technology allows people to actively interact in the world through surrogates. Controlling real humanoid robots using BCI as intuitively as we control our body represents a challenge for current research in robotics and neuroscience. In order to successfully interact with the environment the brain integrates multiple sensory cues to form a coherent representation of the world. Cognitive neuroscience studies demonstrate that multisensory integration may imply a gain with respect to a single modality and ultimately improve the overall sensorimotor performance. For example, reactivity to simultaneous visual and auditory stimuli may be higher than to the sum of the same stimuli delivered in isolation or in temporal sequence. Yet, knowledge about whether audio-visual integration may improve the control of a surrogate is meager. To explore this issue, we provided human footstep sounds as audio feedback to BCI users while controlling a humanoid robot. Participants were asked to steer their robot surrogate and perform a pick-and-place task through BCI-SSVEPs. We found that audio-visual synchrony between footsteps sound and actual humanoid’s walk reduces the time required for steering the robot. Thus, auditory feedback congruent with the humanoid actions may improve motor decisions of the BCI’s user and help in the feeling of control over it. Our results shed light on the possibility to increase robot’s control through the combination of multisensory feedback to a BCI user.
Selective attention modulates the direction of audio-visual temporal recalibration.

Science.gov (United States)

Ikumi, Nara; Soto-Faraco, Salvador

2014-01-01

Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging), was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes.
Selective attention modulates the direction of audio-visual temporal recalibration.

Directory of Open Access Journals (Sweden)

Nara Ikumi

Full Text Available Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging, was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes.
Aurally Aided Visual Search Performance Comparing Virtual Audio Systems

DEFF Research Database (Denmark)

Larsen, Camilla Horne; Lauritsen, David Skødt; Larsen, Jacob Junker

2014-01-01

Due to increased computational power, reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between a HRTF enhanced audio system (3D) and an...... with white dots. The results indicate that 3D audio yields faster search latencies than panning audio, especially with larger amounts of distractors. The applications of this research could fit virtual environments such as video games or virtual simulations.......Due to increased computational power, reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between a HRTF enhanced audio system (3D...
Aurally Aided Visual Search Performance Comparing Virtual Audio Systems

DEFF Research Database (Denmark)

Larsen, Camilla Horne; Lauritsen, David Skødt; Larsen, Jacob Junker

2014-01-01

Due to increased computational power reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between an HRTF enhanced audio system (3D) and an...... with white dots. The results indicate that 3D audio yields faster search latencies than panning audio, especially with larger amounts of distractors. The applications of this research could fit virtual environments such as video games or virtual simulations.......Due to increased computational power reproducing binaural hearing in real-time applications, through usage of head-related transfer functions (HRTFs), is now possible. This paper addresses the differences in aurally-aided visual search performance between an HRTF enhanced audio system (3D...
The role of continuous low-frequency harmonicity cues for interrupted speech perception in bimodal hearing.

Science.gov (United States)

Oh, Soo Hee; Donaldson, Gail S; Kong, Ying-Yee

2016-04-01

Low-frequency acoustic cues have been shown to enhance speech perception by cochlear-implant users, particularly when target speech occurs in a competing background. The present study examined the extent to which a continuous representation of low-frequency harmonicity cues contributes to bimodal benefit in simulated bimodal listeners. Experiment 1 examined the benefit of restoring a continuous temporal envelope to the low-frequency ear while the vocoder ear received a temporally interrupted stimulus. Experiment 2 examined the effect of providing continuous harmonicity cues in the low-frequency ear as compared to restoring a continuous temporal envelope in the vocoder ear. Findings indicate that bimodal benefit for temporally interrupted speech increases when continuity is restored to either or both ears. The primary benefit appears to stem from the continuous temporal envelope in the low-frequency region providing additional phonetic cues related to manner and F1 frequency; a secondary contribution is provided by low-frequency harmonicity cues when a continuous representation of the temporal envelope is present in the low-frequency, or both ears. The continuous temporal envelope and harmonicity cues of low-frequency speech are thought to support bimodal benefit by facilitating identification of word and syllable boundaries, and by restoring partial phonetic cues that occur during gaps in the temporally interrupted stimulus.
Susceptibility to a multisensory speech illusion in older persons is driven by perceptual processes

Directory of Open Access Journals (Sweden)

Annalisa eSetti

2013-09-01

Full Text Available Recent studies suggest that multisensory integration is enhanced in older adults but it is not known whether this enhancement is solely driven by perceptual processes or affected by cognitive processes. Using the ‘McGurk illusion’, in Experiment 1 we found that audio-visual integration of incongruent audio-visual words was higher in older adults than in younger adults, although the recognition of either audio- or visual-only presented words was the same across groups. In Experiment 2 we tested recall of sentences within which an incongruent audio-visual speech word was embedded. The overall semantic meaning of the sentence was compatible with either one of the unisensory components of the target word and/or with the illusory percept. Older participants recalled more illusory audio-visual words in sentences than younger adults, however, there was no differential effect of word compatibility on recall for the two groups. Our findings suggest that the relatively high susceptibility to the audio-visual speech illusion in older participants is due more to perceptual than cognitive processing.
Speed on the dance floor: Auditory and visual cues for musical tempo.

Science.gov (United States)

London, Justin; Burger, Birgitta; Thompson, Marc; Toiviainen, Petri

2016-02-01

Musical tempo is most strongly associated with the rate of the beat or "tactus," which may be defined as the most prominent rhythmic periodicity present in the music, typically in a range of 1.67-2 Hz. However, other factors such as rhythmic density, mean rhythmic inter-onset interval, metrical (accentual) structure, and rhythmic complexity can affect perceived tempo (Drake, Gros, & Penel, 1999; London, 2011 Drake, Gros, & Penel, 1999; London, 2011). Visual information can also give rise to a perceived beat/tempo (Iversen, et al., 2015), and auditory and visual temporal cues can interact and mutually influence each other (Soto-Faraco & Kingstone, 2004; Spence, 2015). A five-part experiment was performed to assess the integration of auditory and visual information in judgments of musical tempo. Participants rated the speed of six classic R&B songs on a seven point scale while observing an animated figure dancing to them. Participants were presented with original and time-stretched (±5%) versions of each song in audio-only, audio+video (A+V), and video-only conditions. In some videos the animations were of spontaneous movements to the different time-stretched versions of each song, and in other videos the animations were of "vigorous" versus "relaxed" interpretations of the same auditory stimulus. Two main results were observed. First, in all conditions with audio, even though participants were able to correctly rank the original vs. time-stretched versions of each song, a song-specific tempo-anchoring effect was observed, such that sped-up versions of slower songs were judged to be faster than slowed-down versions of faster songs, even when their objective beat rates were the same. Second, when viewing a vigorous dancing figure in the A+V condition, participants gave faster tempo ratings than from the audio alone or when viewing the same audio with a relaxed dancing figure. The implications of this illusory tempo percept for cross-modal sensory integration and
Effects of noise and audiovisual cues on speech processing in adults with and without ADHD.

Science.gov (United States)

Michalek, Anne M P; Watson, Silvana M; Ash, Ivan; Ringleb, Stacie; Raymer, Anastasia

2014-03-01

This study examined the interplay among internal (e.g. attention, working memory abilities) and external (e.g. background noise, visual information) factors in individuals with and without ADHD. A 2 × 2 × 6 mixed design with correlational analyses was used to compare participant results on a standardized listening in noise sentence repetition task (QuickSin; Killion et al, 2004 ), presented in an auditory and an audiovisual condition as signal-to-noise ratio (SNR) varied from 25-0 dB and to determine individual differences in working memory capacity and short-term recall. Thirty-eight young adults without ADHD and twenty-five young adults with ADHD. Diagnosis, modality, and signal-to-noise ratio all affected the ability to process speech in noise. The interaction between the diagnosis of ADHD, the presence of visual cues, and the level of noise had an effect on a person's ability to process speech in noise. conclusion: Young adults with ADHD benefited less from visual information during noise than young adults without ADHD, an effect influenced by working memory abilities.

Audio-Visual Fusion for Sound Source Localization and Improved Attention

Energy Technology Data Exchange (ETDEWEB)

Lee, Byoung Gi; Choi, Jong Suk; Yoon, Sang Suk; Choi, Mun Taek; Kim, Mun Sang [Korea Institute of Science and Technology, Daejeon (Korea, Republic of); Kim, Dai Jin [Pohang University of Science and Technology, Pohang (Korea, Republic of)

2011-07-15

Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. AudioFvisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audioFvision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection.
Audio-Visual Fusion for Sound Source Localization and Improved Attention

International Nuclear Information System (INIS)

Lee, Byoung Gi; Choi, Jong Suk; Yoon, Sang Suk; Choi, Mun Taek; Kim, Mun Sang; Kim, Dai Jin

2011-01-01

Service robots are equipped with various sensors such as vision camera, sonar sensor, laser scanner, and microphones. Although these sensors have their own functions, some of them can be made to work together and perform more complicated functions. AudioFvisual fusion is a typical and powerful combination of audio and video sensors, because audio information is complementary to visual information and vice versa. Human beings also mainly depend on visual and auditory information in their daily life. In this paper, we conduct two studies using audioFvision fusion: one is on enhancing the performance of sound localization, and the other is on improving robot attention through sound localization and face detection
Listening to an audio drama activates two processing networks, one for all sounds, another exclusively for speech.

Directory of Open Access Journals (Sweden)

Robert Boldt

Full Text Available Earlier studies have shown considerable intersubject synchronization of brain activity when subjects watch the same movie or listen to the same story. Here we investigated the across-subjects similarity of brain responses to speech and non-speech sounds in a continuous audio drama designed for blind people. Thirteen healthy adults listened for ∼19 min to the audio drama while their brain activity was measured with 3 T functional magnetic resonance imaging (fMRI. An intersubject-correlation (ISC map, computed across the whole experiment to assess the stimulus-driven extrinsic brain network, indicated statistically significant ISC in temporal, frontal and parietal cortices, cingulate cortex, and amygdala. Group-level independent component (IC analysis was used to parcel out the brain signals into functionally coupled networks, and the dependence of the ICs on external stimuli was tested by comparing them with the ISC map. This procedure revealed four extrinsic ICs of which two-covering non-overlapping areas of the auditory cortex-were modulated by both speech and non-speech sounds. The two other extrinsic ICs, one left-hemisphere-lateralized and the other right-hemisphere-lateralized, were speech-related and comprised the superior and middle temporal gyri, temporal poles, and the left angular and inferior orbital gyri. In areas of low ISC four ICs that were defined intrinsic fluctuated similarly as the time-courses of either the speech-sound-related or all-sounds-related extrinsic ICs. These ICs included the superior temporal gyrus, the anterior insula, and the frontal, parietal and midline occipital cortices. Taken together, substantial intersubject synchronization of cortical activity was observed in subjects listening to an audio drama, with results suggesting that speech is processed in two separate networks, one dedicated to the processing of speech sounds and the other to both speech and non-speech sounds.
A conceptual framework for audio-visual museum media

DEFF Research Database (Denmark)

Kirkedahl Lysholm Nielsen, Mikkel

2017-01-01

In today's history museums, the past is communicated through many other means than original artefacts. This interdisciplinary and theoretical article suggests a new approach to studying the use of audio-visual media, such as film, video and related media types, in a museum context. The centre...... and museum studies, existing case studies, and real life observations, the suggested framework instead stress particular characteristics of contextual use of audio-visual media in history museums, such as authenticity, virtuality, interativity, social context and spatial attributes of the communication...
The role of automated speech and audio analysis in semantic multimedia annotation

NARCIS (Netherlands)

de Jong, Franciska M.G.; Ordelman, Roeland J.F.; van Hessen, Adrianus J.

This paper overviews the various ways in which automatic speech and audio analysis can be deployed to enhance the semantic annotation of multimedia content, and as a consequence to improve the effectiveness of conceptual access tools. A number of techniques will be presented, including the alignment
The role of reverberation-related binaural cues in the externalization of speech

DEFF Research Database (Denmark)

Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

2015-01-01

The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners’ ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones....... The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient...
Auditory Emotional Cues Enhance Visual Perception

Science.gov (United States)

Zeelenberg, Rene; Bocanegra, Bruno R.

2010-01-01

Recent studies show that emotional stimuli impair performance to subsequently presented neutral stimuli. Here we show a cross-modal perceptual enhancement caused by emotional cues. Auditory cue words were followed by a visually presented neutral target word. Two-alternative forced-choice identification of the visual target was improved by…
The role of reverberation-related binaural cues in the externalization of speech.

Science.gov (United States)

Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

2015-08-01

The perception of externalization of speech sounds was investigated with respect to the monaural and binaural cues available at the listeners' ears in a reverberant environment. Individualized binaural room impulse responses (BRIRs) were used to simulate externalized sound sources via headphones. The measured BRIRs were subsequently modified such that the proportion of the response containing binaural vs monaural information was varied. Normal-hearing listeners were presented with speech sounds convolved with such modified BRIRs. Monaural reverberation cues were found to be sufficient for the externalization of a lateral sound source. In contrast, for a frontal source, an increased amount of binaural cues from reflections was required in order to obtain well externalized sound images. It was demonstrated that the interaction between the interaural cues of the direct sound and the reverberation strongly affects the perception of externalization. An analysis of the short-term binaural cues showed that the amount of fluctuations of the binaural cues corresponded well to the externalization ratings obtained in the listening tests. The results further suggested that the precedence effect is involved in the auditory processing of the dynamic binaural cues that are utilized for externalization perception.
Hierarchical acquisition of visual specificity in spatial contextual cueing.

Science.gov (United States)

Lie, Kin-Pou

2015-01-01

Spatial contextual cueing refers to visual search performance's being improved when invariant associations between target locations and distractor spatial configurations are learned incidentally. Using the instance theory of automatization and the reverse hierarchy theory of visual perceptual learning, this study explores the acquisition of visual specificity in spatial contextual cueing. Two experiments in which detailed visual features were irrelevant for distinguishing between spatial contexts found that spatial contextual cueing was visually generic in difficult trials when the trials were not preceded by easy trials (Experiment 1) but that spatial contextual cueing progressed to visual specificity when difficult trials were preceded by easy trials (Experiment 2). These findings support reverse hierarchy theory, which predicts that even when detailed visual features are irrelevant for distinguishing between spatial contexts, spatial contextual cueing can progress to visual specificity if the stimuli remain constant, the task is difficult, and difficult trials are preceded by easy trials. However, these findings are inconsistent with instance theory, which predicts that when detailed visual features are irrelevant for distinguishing between spatial contexts, spatial contextual cueing will not progress to visual specificity. This study concludes that the acquisition of visual specificity in spatial contextual cueing is more plausibly hierarchical, rather than instance-based.
Cross-language differences in cue use for speech segmentation

NARCIS (Netherlands)

Tyler, M.D.; Cutler, A.

2009-01-01

Two artificial-language learning experiments directly compared English, French, and Dutch listeners' use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable
BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

Directory of Open Access Journals (Sweden)

A. A. Karpov

2014-09-01

Full Text Available We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating all the components for generation of audio, visual and signed speech. The proposed system performs automatic translation of input textual information into speech (audio information and gestures (video information, information fusion and its output in the form of multimedia information. A user can input any grammatically correct text in Russian or Czech languages to the system; it is analyzed by the text processor to detect sentences, words and characters. Then this textual information is converted into symbols of the sign language notation. We apply international «Hamburg Notation System» - HamNoSys, which describes the main differential features of each manual sign: hand shape, hand orientation, place and type of movement. On their basis the 3D signing avatar displays the elements of the sign language. The virtual 3D model of human’s head and upper body has been created using VRML virtual reality modeling language, and it is controlled by the software based on OpenGL graphical library. The developed multimodal synthesis system is a universal one since it is oriented for both regular users and disabled people (in particular, for the hard-of-hearing and visually impaired, and it serves for multimedia output (by audio and visual modalities of input textual information.
Training the Brain to Weight Speech Cues Differently: A Study of Finnish Second-language Users of English

Science.gov (United States)

Ylinen, Sari; Uther, Maria; Latvala, Antti; Vepsalainen, Sara; Iverson, Paul; Akahane-Yamada, Reiko; Naatanen, Risto

2010-01-01

Foreign-language learning is a prime example of a task that entails perceptual learning. The correct comprehension of foreign-language speech requires the correct recognition of speech sounds. The most difficult speech-sound contrasts for foreign-language learners often are the ones that have multiple phonetic cues, especially if the cues are…
Blind speech separation system for humanoid robot with FastICA for audio filtering and separation

Science.gov (United States)

Budiharto, Widodo; Santoso Gunawan, Alexander Agung

2016-07-01

Nowadays, there are many developments in building intelligent humanoid robot, mainly in order to handle voice and image. In this research, we propose blind speech separation system using FastICA for audio filtering and separation that can be used in education or entertainment. Our main problem is to separate the multi speech sources and also to filter irrelevant noises. After speech separation step, the results will be integrated with our previous speech and face recognition system which is based on Bioloid GP robot and Raspberry Pi 2 as controller. The experimental results show the accuracy of our blind speech separation system is about 88% in command and query recognition cases.
Computationally efficient clustering of audio-visual meeting data

NARCIS (Netherlands)

Hung, H.; Friedland, G.; Yeo, C.; Shao, L.; Shan, C.; Luo, J.; Etoh, M.

2010-01-01

This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors,
Investigating the impact of audio instruction and audio-visual biofeedback for lung cancer radiation therapy

Science.gov (United States)

George, Rohini

Lung cancer accounts for 13% of all cancers in the Unites States and is the leading cause of deaths among both men and women. The five-year survival for lung cancer patients is approximately 15%.(ACS facts & figures) Respiratory motion decreases accuracy of thoracic radiotherapy during imaging and delivery. To account for respiration, generally margins are added during radiation treatment planning, which may cause a substantial dose delivery to normal tissues and increase the normal tissue toxicity. To alleviate the above-mentioned effects of respiratory motion, several motion management techniques are available which can reduce the doses to normal tissues, thereby reducing treatment toxicity and allowing dose escalation to the tumor. This may increase the survival probability of patients who have lung cancer and are receiving radiation therapy. However the accuracy of these motion management techniques are inhibited by respiration irregularity. The rationale of this thesis was to study the improvement in regularity of respiratory motion by breathing coaching for lung cancer patients using audio instructions and audio-visual biofeedback. A total of 331 patient respiratory motion traces, each four minutes in length, were collected from 24 lung cancer patients enrolled in an IRB-approved breathing-training protocol. It was determined that audio-visual biofeedback significantly improved the regularity of respiratory motion compared to free breathing and audio instruction, thus improving the accuracy of respiratory gated radiotherapy. It was also observed that duty cycles below 30% showed insignificant reduction in residual motion while above 50% there was a sharp increase in residual motion. The reproducibility of exhale based gating was higher than that of inhale base gating. Modeling the respiratory cycles it was found that cosine and cosine 4 models had the best correlation with individual respiratory cycles. The overall respiratory motion probability distribution
Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

Science.gov (United States)

Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of
Pengaruh layanan informasi bimbingan konseling berbantuan media audio visual terhadap empati siswa

Directory of Open Access Journals (Sweden)

Rita Kumalasari

2017-05-01

The results of research effective of audio-visual media counseling techniques effective and practical to increase the empathy of students are rational design, key concepts, understanding, purpose, content models, the role and qualifications tutor (counselor is expected, procedures or steps in the implementation of the audio-visual, evaluation, follow-up, support system. This research is proven effective in improving student behavior. Empathy behavior of students increases 28.9% from the previous 45.08% increase to 73.98%. This increase occurred in all aspects of empathy Keywords: Effective, Audio visual, Empathy
Selected Audio-Visual Materials for Consumer Education. [New Version.

Science.gov (United States)

Johnston, William L.

Ninety-two films, filmstrips, multi-media kits, slides, and audio cassettes, produced between 1964 and 1974, are listed in this selective annotated bibliography on consumer education. The major portion of the bibliography is devoted to films and filmstrips. The main topics of the audio-visual materials include purchasing, advertising, money…
Basic to Applied Research: The Benefits of Audio-Visual Speech Perception Research in Teaching Foreign Languages

Science.gov (United States)

Erdener, Dogu

2016-01-01

Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…
Visual cue-specific craving is diminished in stressed smokers.

Science.gov (United States)

Cochran, Justinn R; Consedine, Nathan S; Lee, John M J; Pandit, Chinmay; Sollers, John J; Kydd, Robert R

2017-09-01

Craving among smokers is increased by stress and exposure to smoking-related visual cues. However, few experimental studies have tested both elicitors concurrently and considered how exposures may interact to influence craving. The current study examined craving in response to stress and visual cue exposure, separately and in succession, in order to better understand the relationship between craving elicitation and the elicitor. Thirty-nine smokers (21 males) who forwent smoking for 30 minutes were randomized to complete a stress task and a visual cue task in counterbalanced orders (creating the experimental groups); for the cue task, counterbalanced blocks of neutral, motivational control, and smoking images were presented. Self-reported craving was assessed after each block of visual stimuli and stress task, and after a recovery period following each task. As expected, the stress and smoking images generated greater craving than neutral or motivational control images (p smokers are stressed, visual cues have little additive effect on craving, and different types of visual cues elicit comparable craving. These findings may imply that once stressed, smokers will crave cigarettes comparably notwithstanding whether they are exposed to smoking image cues.

Effects of Visual Speech on Early Auditory Evoked Fields - From the Viewpoint of Individual Variance

Science.gov (United States)

Yahata, Izumi; Kanno, Akitake; Hidaka, Hiroshi; Sakamoto, Shuichi; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio

2017-01-01

The effects of visual speech (the moving image of the speaker’s face uttering speech sound) on early auditory evoked fields (AEFs) were examined using a helmet-shaped magnetoencephalography system in 12 healthy volunteers (9 males, mean age 35.5 years). AEFs (N100m) in response to the monosyllabic sound /be/ were recorded and analyzed under three different visual stimulus conditions, the moving image of the same speaker’s face uttering /be/ (congruent visual stimuli) or uttering /ge/ (incongruent visual stimuli), and visual noise (still image processed from speaker’s face using a strong Gaussian filter: control condition). On average, latency of N100m was significantly shortened in the bilateral hemispheres for both congruent and incongruent auditory/visual (A/V) stimuli, compared to the control A/V condition. However, the degree of N100m shortening was not significantly different between the congruent and incongruent A/V conditions, despite the significant differences in psychophysical responses between these two A/V conditions. Moreover, analysis of the magnitudes of these visual effects on AEFs in individuals showed that the lip-reading effects on AEFs tended to be well correlated between the two different audio-visual conditions (congruent vs. incongruent visual stimuli) in the bilateral hemispheres but were not significantly correlated between right and left hemisphere. On the other hand, no significant correlation was observed between the magnitudes of visual speech effects and psychophysical responses. These results may indicate that the auditory-visual interaction observed on the N100m is a fundamental process which does not depend on the congruency of the visual information. PMID:28141836
Visual Speech Fills in Both Discrimination and Identification of Non-Intact Auditory Speech in Children

Science.gov (United States)

Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve

2018-01-01

To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
Visual Input Enhances Selective Speech Envelope Tracking in Auditory Cortex at a ‘Cocktail Party’

Science.gov (United States)

Golumbic, Elana Zion; Cogan, Gregory B.; Schroeder, Charles E.; Poeppel, David

2013-01-01

Our ability to selectively attend to one auditory signal amidst competing input streams, epitomized by the ‘Cocktail Party’ problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared to responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic (MEG) signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker’s face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a ‘Cocktail Party’ setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive. PMID:23345218
Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations

Directory of Open Access Journals (Sweden)

Md. Rabiul Islam

2014-01-01

Full Text Available The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs and Linear Prediction Cepstral Coefficients (LPCCs are combined to get the audio feature vectors and Active Shape Model (ASM based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features.
PERANCANGAN MEDIA PEMBELAJARAN BERBASIS AUDIO VISUAL UNTUK MATA KULIAH TIPOGRAFI PADA PROGRAM STUDI DESAIN KOMUNIKASI VISUAL UNIVERSITAS DIAN NUSWANTORO

Directory of Open Access Journals (Sweden)

Puri Sulistiyawati

2017-02-01

Full Text Available Abstrak Tipografi merupakan salah satu mata kuliah pada bidang desain komunikasi visual yang mengutamakan aspek visual. Namun berdasarkan hasil observasi diketahui bahwa media pembelajaran yang selama ini digunakan kurang efektif karena kurangnya pemanfaatan teknologi informasi, sehingga mahasiswa kurang maksimal dalam memahami materi kuliah yang disampaikan oleh pengajar. Perkembangan teknologi informasi saat ini banyak memberikan dampak positif bagi kemajuan bidang pendidikan diantaranya dapat digunakan untuk mendukung media dalam proses pembelajaran. Tujuan penelitian ini adalah merancang media pembelajaran untuk mata kuliah tipografi dengan memanfaatkan teknologi informasi yaitu media audio visual. Metode yang digunakan dalam penelitian ini adalah Research and Development dengan pendekatan model ADDIE (Analysis, Design, Development, Implementation, Evaluation. Dengan diciptakannya media pembelajaran audio visual ini diharapkan proses pembelajaran mata kuliah Tipografi dapat lebih efektif dan materi kuliah lebih mudah dipahami oleh mahasiswa. Kata Kunci : audio visual, media pembelajaran, tipografi Abstract Typography is one of the subjects in the field of visual communication design that prioritizes the visual aspect. However, based on the observation note that the media has been used less effective because the lack of use information technology, so students can't understand the course material that explained by lecturers. Today, the development of information technology is being positive impact for the advancement of education which can be used to support the media in the learning process. The purpose of this research is to design learning media for the course of typography by utilizing information technology, called audio-visual media. The method that used in this research is Research and Development with ADDIE model (Analysis, Design, Development, Implementation, Evaluation. With the creation of audio-visual learning media is expected
Visual cues for data mining

Science.gov (United States)

Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.

1996-04-01

This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.
Encoding Specificity and Nonverbal Cue Context: An Expansion of Episodic Memory Research.

Science.gov (United States)

Woodall, W. Gill; Folger, Joseph P.

1981-01-01

Reports two studies demonstrating the ability of nonverbal contextual cues to act as retrieval mechanisms for co-occurring language. Suggests that visual contextual cues, such as speech primacy and motor primacy gestures, can access linguistic target information. Motor primacy cues are shown to act as stronger retrieval cues. (JMF)
Audio-visual materials usage preference among agricultural ...

African Journals Online (AJOL)

It was found that respondents preferred radio, television, poster, advert, photographs, specimen, bulletin, magazine, cinema, videotape, chalkboard, and bulletin board as audio-visual materials for extension work. These are the materials that can easily be manipulated and utilized for extension work. Nigerian Journal of ...
DEVELOPING VISUAL NOVEL GAME WITH SPEECH-RECOGNITION INTERACTIVITY TO ENHANCE STUDENTS’ MASTERY ON ENGLISH EXPRESSIONS

Directory of Open Access Journals (Sweden)

Elizabeth Anggraeni Amalo

2017-11-01

Full Text Available The teaching of English-expressions has always been done through conversation samples in form of written texts, audio recordings, and videos. In the meantime, the development of computer-aided learning technology has made autonomous language learning possible. Game, as one of computer-aided learning technology products, can serve as a medium to provide educational contents like that of language teaching and learning. Visual Novel is considered as a conversational game that is suitable to be combined with English-expressions material. Unlike the other click-based interaction Visual Novel Games, the visual novel game in this research implements speech recognition as the interaction trigger. Hence, this paper aims at elaborating how visual novel games are utilized to deliver English-expressions with speech recognition command for the interaction. This research used Research and Development (R&D method with Experimental design through control and experimental groups to measure its effectiveness in enhancing students’ English-expressions mastery. ANOVA was utilized to prove the significant differences between the control and experimental groups. It is expected that the result of this development and experiment can devote benefits to the English teaching and learning, especially on English-expressions.
Tiger salamanders' (Ambystoma tigrinum) response learning and usage of visual cues.

Science.gov (United States)

Kundey, Shannon M A; Millar, Roberto; McPherson, Justin; Gonzalez, Maya; Fitz, Aleyna; Allen, Chadbourne

2016-05-01

We explored tiger salamanders' (Ambystoma tigrinum) learning to execute a response within a maze as proximal visual cue conditions varied. In Experiment 1, salamanders learned to turn consistently in a T-maze for reinforcement before the maze was rotated. All learned the initial task and executed the trained turn during test, suggesting that they learned to demonstrate the reinforced response during training and continued to perform it during test. In a second experiment utilizing a similar procedure, two visual cues were placed consistently at the maze junction. Salamanders were reinforced for turning towards one cue. Cue placement was reversed during test. All learned the initial task, but executed the trained turn rather than turning towards the visual cue during test, evidencing response learning. In Experiment 3, we investigated whether a compound visual cue could control salamanders' behaviour when it was the only cue predictive of reinforcement in a cross-maze by varying start position and cue placement. All learned to turn in the direction indicated by the compound visual cue, indicating that visual cues can come to control their behaviour. Following training, testing revealed that salamanders attended to stimuli foreground over background features. Overall, these results suggest that salamanders learn to execute responses over learning to use visual cues but can use visual cues if required. Our success with this paradigm offers the potential in future studies to explore salamanders' cognition further, as well as to shed light on how features of the tiger salamanders' life history (e.g. hibernation and metamorphosis) impact cognition.
Getting more from visual working memory: Retro-cues enhance retrieval and protect from visual interference.

Science.gov (United States)

Souza, Alessandra S; Rerko, Laura; Oberauer, Klaus

2016-06-01

Visual working memory (VWM) has a limited capacity. This limitation can be mitigated by the use of focused attention: if attention is drawn to the relevant working memory content before test, performance improves (the so-called retro-cue benefit). This study tests 2 explanations of the retro-cue benefit: (a) Focused attention protects memory representations from interference by visual input at test, and (b) focusing attention enhances retrieval. Across 6 experiments using color recognition and color reproduction tasks, we varied the amount of color interference at test, and the delay between a retrieval cue (i.e., the retro-cue) and the memory test. Retro-cue benefits were larger when the memory test introduced interfering visual stimuli, showing that the retro-cue effect is in part because of protection from visual interference. However, when visual interference was held constant, retro-cue benefits were still obtained whenever the retro-cue enabled retrieval of an object from VWM but delayed response selection. Our results show that accessible information in VWM might be lost in the processes of testing memory because of visual interference and incomplete retrieval. This is not an inevitable state of affairs, though: Focused attention can be used to get the most out of VWM. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Semantic congruency but not temporal synchrony enhances long-term memory performance for audio-visual scenes.

Science.gov (United States)

Meyerhoff, Hauke S; Huff, Markus

2016-04-01

Human long-term memory for visual objects and scenes is tremendous. Here, we test how auditory information contributes to long-term memory performance for realistic scenes. In a total of six experiments, we manipulated the presentation modality (auditory, visual, audio-visual) as well as semantic congruency and temporal synchrony between auditory and visual information of brief filmic clips. Our results show that audio-visual clips generally elicit more accurate memory performance than unimodal clips. This advantage even increases with congruent visual and auditory information. However, violations of audio-visual synchrony hardly have any influence on memory performance. Memory performance remained intact even with a sequential presentation of auditory and visual information, but finally declined when the matching tracks of one scene were presented separately with intervening tracks during learning. With respect to memory performance, our results therefore show that audio-visual integration is sensitive to semantic congruency but remarkably robust against asymmetries between different modalities.
Common cues to emotion in the dynamic facial expressions of speech and song.

Science.gov (United States)

Livingstone, Steven R; Thompson, William F; Wanderley, Marcelo M; Palmer, Caroline

2015-01-01

Speech and song are universal forms of vocalization that may share aspects of emotional expression. Research has focused on parallels in acoustic features, overlooking facial cues to emotion. In three experiments, we compared moving facial expressions in speech and song. In Experiment 1, vocalists spoke and sang statements each with five emotions. Vocalists exhibited emotion-dependent movements of the eyebrows and lip corners that transcended speech-song differences. Vocalists' jaw movements were coupled to their acoustic intensity, exhibiting differences across emotion and speech-song. Vocalists' emotional movements extended beyond vocal sound to include large sustained expressions, suggesting a communicative function. In Experiment 2, viewers judged silent videos of vocalists' facial expressions prior to, during, and following vocalization. Emotional intentions were identified accurately for movements during and after vocalization, suggesting that these movements support the acoustic message. Experiment 3 compared emotional identification in voice-only, face-only, and face-and-voice recordings. Emotion judgements for voice-only singing were poorly identified, yet were accurate for all other conditions, confirming that facial expressions conveyed emotion more accurately than the voice in song, yet were equivalent in speech. Collectively, these findings highlight broad commonalities in the facial cues to emotion in speech and song, yet highlight differences in perception and acoustic-motor production.
Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

Directory of Open Access Journals (Sweden)

José Antonio Palao Errando

2007-10-01

Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.
Multiperson visual focus of attention from head pose and meeting contextual cues.

Science.gov (United States)

Ba, Sileye O; Odobez, Jean-Marc

2011-01-01

This paper introduces a novel contextual model for the recognition of people's visual focus of attention (VFOA) in meetings from audio-visual perceptual cues. More specifically, instead of independently recognizing the VFOA of each meeting participant from his own head pose, we propose to jointly recognize the participants' visual attention in order to introduce context-dependent interaction models that relate to group activity and the social dynamics of communication. Meeting contextual information is represented by the location of people, conversational events identifying floor holding patterns, and a presentation activity variable. By modeling the interactions between the different contexts and their combined and sometimes contradictory impact on the gazing behavior, our model allows us to handle VFOA recognition in difficult task-based meetings involving artifacts, presentations, and moving people. We validated our model through rigorous evaluation on a publicly available and challenging data set of 12 real meetings (5 hours of data). The results demonstrated that the integration of the presentation and conversation dynamical context using our model can lead to significant performance improvements.
Causal inference of asynchronous audiovisual speech

Directory of Open Access Journals (Sweden)

John F Magnotti

2013-11-01

Full Text Available During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions abut the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.
The audio and visual communication systems for suited engineering activities on JET

International Nuclear Information System (INIS)

Pearce, R.J.H.; Bruce, J.; Callaghan, C.; Hart, M.; Martin, P.; Middleton, R.; Tait, J.

2001-01-01

The beryllium and/or tritium contamination of the JET tokamak and auxiliary systems necessitates that many activities are carried out in air line fed pressurised suits. To enable often complex engineering activities to be performed, a number of novel audio and visual and communications systems have been designed. The paper describes these systems which give freedom of visual and audio communication between suited personnel, supervisors, operators and engineers. The system enhances the safety of the working environment as well as helping to minimise the radiation dose to personnel. It is concluded, from a number of years experience of using the audio and visual communications systems for suited operations, that safety and the progress of complex engineering tasks have been significantly enhanced
The audio and visual communication systems for suited engineering activities on JET

Energy Technology Data Exchange (ETDEWEB)

Pearce, R.J.H. E-mail: robert.pearce@jet.uk; Bruce, J.; Callaghan, C.; Hart, M.; Martin, P.; Middleton, R.; Tait, J

2001-11-01

The beryllium and/or tritium contamination of the JET tokamak and auxiliary systems necessitates that many activities are carried out in air line fed pressurised suits. To enable often complex engineering activities to be performed, a number of novel audio and visual and communications systems have been designed. The paper describes these systems which give freedom of visual and audio communication between suited personnel, supervisors, operators and engineers. The system enhances the safety of the working environment as well as helping to minimise the radiation dose to personnel. It is concluded, from a number of years experience of using the audio and visual communications systems for suited operations, that safety and the progress of complex engineering tasks have been significantly enhanced.
Market potential for interactive audio-visual media

NARCIS (Netherlands)

Leurdijk, A.; Limonard, S.

2005-01-01

NM2 (New Media for a New Millennium) develops tools for interactive, personalised and non-linear audio-visual content that will be tested in seven pilot productions. This paper looks at the market potential for these productions from a technological, a business and a users' perspective. It shows
Enhanced audio-visual interactions in the auditory cortex of elderly cochlear-implant users.

Science.gov (United States)

Schierholz, Irina; Finke, Mareike; Schulte, Svenja; Hauthal, Nadine; Kantzke, Christoph; Rach, Stefan; Büchner, Andreas; Dengler, Reinhard; Sandmann, Pascale

2015-10-01

Auditory deprivation and the restoration of hearing via a cochlear implant (CI) can induce functional plasticity in auditory cortical areas. How these plastic changes affect the ability to integrate combined auditory (A) and visual (V) information is not yet well understood. In the present study, we used electroencephalography (EEG) to examine whether age, temporary deafness and altered sensory experience with a CI can affect audio-visual (AV) interactions in post-lingually deafened CI users. Young and elderly CI users and age-matched NH listeners performed a speeded response task on basic auditory, visual and audio-visual stimuli. Regarding the behavioral results, a redundant signals effect, that is, faster response times to cross-modal (AV) than to both of the two modality-specific stimuli (A, V), was revealed for all groups of participants. Moreover, in all four groups, we found evidence for audio-visual integration. Regarding event-related responses (ERPs), we observed a more pronounced visual modulation of the cortical auditory response at N1 latency (approximately 100 ms after stimulus onset) in the elderly CI users when compared with young CI users and elderly NH listeners. Thus, elderly CI users showed enhanced audio-visual binding which may be a consequence of compensatory strategies developed due to temporary deafness and/or degraded sensory input after implantation. These results indicate that the combination of aging, sensory deprivation and CI facilitates the coupling between the auditory and the visual modality. We suggest that this enhancement in multisensory interactions could be used to optimize auditory rehabilitation, especially in elderly CI users, by the application of strong audio-visually based rehabilitation strategies after implant switch-on. Copyright © 2015 Elsevier B.V. All rights reserved.

Cue-induced craving among inhalant users: Development and preliminary validation of a visual cue paradigm.

Science.gov (United States)

Jain, Shobhit; Dhawan, Anju; Kumaran, S Senthil; Pattanayak, Raman Deep; Jain, Raka

2017-12-01

Cue-induced craving is known to be associated with a higher risk of relapse, wherein drug-specific cues become conditioned stimuli, eliciting conditioned responses. Cue-reactivity paradigm are important tools to study psychological responses and functional neuroimaging changes. However, till date, there has been no specific study or a validated paradigm for inhalant cue-induced craving research. The study aimed to develop and validate visual cue stimulus for inhalant cue-associated craving. The first step (picture selection) involved screening and careful selection of 30 cue- and 30 neutral-pictures based on their relevance for naturalistic settings. In the second step (time optimization), a random selection of ten cue-pictures each was presented for 4s, 6s, and 8s to seven adolescent male inhalant users, and pre-post craving response was compared using a Visual Analogue Scale(VAS) for each of the picture and time. In the third step (validation), craving response for each of 30 cue- and 30 neutral-pictures were analysed among 20 adolescent inhalant users. Findings revealed a significant difference in before and after craving response for the cue-pictures, but not neutral-pictures. Using ROC-curve, pictures were arranged in order of craving intensity. Finally, 20 best cue- and 20 neutral-pictures were used for the development of a 480s visual cue paradigm. This is the first study to systematically develop an inhalant cue picture paradigm which can be used as a tool to examine cue induced craving in neurobiological studies. Further research, including its further validation in larger study and diverse samples, is required. Copyright © 2017 Elsevier B.V. All rights reserved.
Audiovisual discrimination between speech and laughter: Why and when visual information might help

NARCIS (Netherlands)

Petridis, Stavros; Pantic, Maja

Past research on automatic laughter classification/detection has focused mainly on audio-based approaches. Here we present an audiovisual approach to distinguishing laughter from speech, and we show that integrating the information from audio and video channels may lead to improved performance over
Audio/visual analysis for high-speed TV advertisement detection from MPEG bitstream

OpenAIRE

Sadlier, David A.

2002-01-01

Advertisement breaks dunng or between television programmes are typically flagged by senes of black-and-silent video frames, which recurrendy occur in order to audio-visually separate individual advertisement spots from one another. It is the regular prevalence of these flags that enables automatic differentiauon between what is programme content and what is advertisement break. Detection of these audio-visual depressions within broadcast television content provides a basis on which advertise...
Modulation of auditory spatial attention by visual emotional cues: differential effects of attentional engagement and disengagement for pleasant and unpleasant cues.

Science.gov (United States)

Harrison, Neil R; Woodhouse, Rob

2016-05-01

Previous research has demonstrated that threatening, compared to neutral pictures, can bias attention towards non-emotional auditory targets. Here we investigated which subcomponents of attention contributed to the influence of emotional visual stimuli on auditory spatial attention. Participants indicated the location of an auditory target, after brief (250 ms) presentation of a spatially non-predictive peripheral visual cue. Responses to targets were faster at the location of the preceding visual cue, compared to at the opposite location (cue validity effect). The cue validity effect was larger for targets following pleasant and unpleasant cues compared to neutral cues, for right-sided targets. For unpleasant cues, the crossmodal cue validity effect was driven by delayed attentional disengagement, and for pleasant cues, it was driven by enhanced engagement. We conclude that both pleasant and unpleasant visual cues influence the distribution of attention across modalities and that the associated attentional mechanisms depend on the valence of the visual cue.
Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach

Directory of Open Access Journals (Sweden)

W. Bastiaan Kleijn

2005-06-01

Full Text Available Auditory modeling is a well-established methodology that provides insight into human perception and that facilitates the extraction of signal features that are most relevant to the listener. The aim of this paper is to provide a tutorial on perceptual speech and audio coding using an invertible auditory model. In this approach, the audio signal is converted into an auditory representation using an invertible auditory model. The auditory representation is quantized and coded. Upon decoding, it is then transformed back into the acoustic domain. This transformation converts a complex distortion criterion into a simple one, thus facilitating quantization with low complexity. We briefly review past work on auditory models and describe in more detail the components of our invertible model and its inversion procedure, that is, the method to reconstruct the signal from the output of the auditory model. We summarize attempts to use the auditory representation for low-bit-rate coding. Our approach also allows the exploitation of the inherent redundancy of the human auditory system for the purpose of multiple description (joint source-channel coding.
Retro-dimension-cue benefit in visual working memory

OpenAIRE

Ye, Chaoxiong; Hu, Zhonghua; Ristaniemi, Tapani; Gendron, Maria; Liu, Qiang

2016-01-01

In visual working memory (VWM) tasks, participants? performance can be improved by a retro-object-cue. However, previous studies have not investigated whether participants? performance can also be improved by a retro-dimension-cue. Three experiments investigated this issue. We used a recall task with a retro-dimension-cue in all experiments. In Experiment 1, we found benefits from retro-dimension-cues compared to neutral cues. This retro-dimension-cue benefit is reflected in an increased prob...
Automatic lip reading by using multimodal visual features

Science.gov (United States)

Takahashi, Shohei; Ohya, Jun

2013-12-01

Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.
Towards Unification of Methods for Speech, Audio, Picture and Multimedia Quality Assessment

DEFF Research Database (Denmark)

Zielinski, S.; Rumsey, F.; Bech, Søren

2015-01-01

attempting to “bridge the gap” between the quality assessment methods used in various disciplines are indicated. Prospective challenges faced by researchers in the unification process are outlined. They include development of unified scales, defining unified anchors, integration of objective models......The paper addresses the need to develop unified methods for subjective and objective quality assessment across speech, audio, picture, and multimedia applications. Commonalities and differences between the currently used standards are overviewed. Examples of the already undertaken research...
Neuromorphic Audio-Visual Sensor Fusion on a Sound-Localising Robot

Directory of Open Access Journals (Sweden)

Vincent Yue-Sek Chan

2012-02-01

Full Text Available This paper presents the first robotic system featuring audio-visual sensor fusion with neuromorphic sensors. We combine a pair of silicon cochleae and a silicon retina on a robotic platform to allow the robot to learn sound localisation through self-motion and visual feedback, using an adaptive ITD-based sound localisation algorithm. After training, the robot can localise sound sources (white or pink noise in a reverberant environment with an RMS error of 4 to 5 degrees in azimuth. In the second part of the paper, we investigate the source binding problem. An experiment is conducted to test the effectiveness of matching an audio event with a corresponding visual event based on their onset time. The results show that this technique can be quite effective, despite its simplicity.
When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

Science.gov (United States)

Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

2017-11-01

Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.
Treatment of Speech Anxiety by Cue-Controlled Relaxation and Desensitization with Professional and Paraprofessional Counselors

Science.gov (United States)

Russell, Richard K.; Wise, Fred

1976-01-01

This investigation compared the relative effectiveness of group-administered cue-controlled relaxation and group systematic desensitization in the treatment of speech anxiety. Also examined was the role of professional versus paraprofessional counselors in implementing the treatment program. A description of the cue-controlled relaxation technique…
Proper Use of Audio-Visual Aids: Essential for Educators.

Science.gov (United States)

Dejardin, Conrad

1989-01-01

Criticizes educators as the worst users of audio-visual aids and among the worst public speakers. Offers guidelines for the proper use of an overhead projector and the development of transparencies. (DMM)
Audio-Visual Aid in Teaching "Fatty Liver"

Science.gov (United States)

Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

2016-01-01

Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various…
Age-related sensitive periods influence visual language discrimination in adults.

Science.gov (United States)

Weikum, Whitney M; Vouloumanos, Athena; Navarra, Jordi; Soto-Faraco, Salvador; Sebastián-Gallés, Núria; Werker, Janet F

2013-01-01

Adults as well as infants have the capacity to discriminate languages based on visual speech alone. Here, we investigated whether adults' ability to discriminate languages based on visual speech cues is influenced by the age of language acquisition. Adult participants who had all learned English (as a first or second language) but did not speak French were shown faces of bilingual (French/English) speakers silently reciting sentences in either language. Using only visual speech information, adults who had learned English from birth or as a second language before the age of 6 could discriminate between French and English significantly better than chance. However, adults who had learned English as a second language after age 6 failed to discriminate these two languages, suggesting that early childhood exposure is crucial for using relevant visual speech information to separate languages visually. These findings raise the possibility that lowered sensitivity to non-native visual speech cues may contribute to the difficulties encountered when learning a new language in adulthood.
The role of visual spatial attention in audiovisual speech perception

DEFF Research Database (Denmark)

Andersen, Tobias; Tiippana, K.; Laarni, J.

2009-01-01

Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre-attentive b......Auditory and visual information is integrated when perceiving speech, as evidenced by the McGurk effect in which viewing an incongruent talking face categorically alters auditory speech perception. Audiovisual integration in speech perception has long been considered automatic and pre...... from each of the faces and from the voice on the auditory speech percept. We found that directing visual spatial attention towards a face increased the influence of that face on auditory perception. However, the influence of the voice on auditory perception did not change suggesting that audiovisual...... integration did not change. Visual spatial attention was also able to select between the faces when lip reading. This suggests that visual spatial attention acts at the level of visual speech perception prior to audiovisual integration and that the effect propagates through audiovisual integration...
Language/Culture Modulates Brain and Gaze Processes in Audiovisual Speech Perception.

Science.gov (United States)

Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki

2016-10-13

Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs' response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs' early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.
Guidelines for the integration of audio cues into computer user interfaces

Energy Technology Data Exchange (ETDEWEB)

Sumikawa, D.A.

1985-06-01

Throughout the history of computers, vision has been the main channel through which information is conveyed to the computer user. As the complexities of man-machine interactions increase, more and more information must be transferred from the computer to the user and then successfully interpreted by the user. A logical next step in the evolution of the computer-user interface is the incorporation of sound and thereby using the sense of ''hearing'' in the computer experience. This allows our visual and auditory capabilities to work naturally together in unison leading to more effective and efficient interpretation of all information received by the user from the computer. This thesis presents an initial set of guidelines to assist interface developers in designing an effective sight and sound user interface. This study is a synthesis of various aspects of sound, human communication, computer-user interfaces, and psychoacoustics. We introduce the notion of an earcon. Earcons are audio cues used in the computer-user interface to provide information and feedback to the user about some computer object, operation, or interaction. A possible construction technique for earcons, the use of earcons in the interface, how earcons are learned and remembered, and the affects of earcons on their users are investigated. This study takes the point of view that earcons are a language and human/computer communication issue and are therefore analyzed according to the three dimensions of linguistics; syntactics, semantics, and pragmatics.
Speech misperception: speaking and seeing interfere differently with hearing.

Directory of Open Access Journals (Sweden)

Takemi Mochida

Full Text Available Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener's own speech action and the effects of viewing another's speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another's mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech.
Exclusively visual analysis of classroom group interactions

Science.gov (United States)

Tucker, Laura; Scherr, Rachel E.; Zickler, Todd; Mazur, Eric

2016-12-01

Large-scale audiovisual data that measure group learning are time consuming to collect and analyze. As an initial step towards scaling qualitative classroom observation, we qualitatively coded classroom video using an established coding scheme with and without its audio cues. We find that interrater reliability is as high when using visual data only—without audio—as when using both visual and audio data to code. Also, interrater reliability is high when comparing use of visual and audio data to visual-only data. We see a small bias to code interactions as group discussion when visual and audio data are used compared with video-only data. This work establishes that meaningful educational observation can be made through visual information alone. Further, it suggests that after initial work to create a coding scheme and validate it in each environment, computer-automated visual coding could drastically increase the breadth of qualitative studies and allow for meaningful educational analysis on a far greater scale.
Exclusively visual analysis of classroom group interactions

Directory of Open Access Journals (Sweden)

Laura Tucker

2016-11-01

Full Text Available Large-scale audiovisual data that measure group learning are time consuming to collect and analyze. As an initial step towards scaling qualitative classroom observation, we qualitatively coded classroom video using an established coding scheme with and without its audio cues. We find that interrater reliability is as high when using visual data only—without audio—as when using both visual and audio data to code. Also, interrater reliability is high when comparing use of visual and audio data to visual-only data. We see a small bias to code interactions as group discussion when visual and audio data are used compared with video-only data. This work establishes that meaningful educational observation can be made through visual information alone. Further, it suggests that after initial work to create a coding scheme and validate it in each environment, computer-automated visual coding could drastically increase the breadth of qualitative studies and allow for meaningful educational analysis on a far greater scale.

Subconscious visual cues during movement execution allow correct online choice reactions.

Directory of Open Access Journals (Sweden)

Christian Leukel

Full Text Available Part of the sensory information is processed by our central nervous system without conscious perception. Subconscious processing has been shown to be capable of triggering motor reactions. In the present study, we asked the question whether visual information, which is not consciously perceived, could influence decision-making in a choice reaction task. Ten healthy subjects (28 ± 5 years executed two different experimental protocols. In the Motor reaction protocol, a visual target cue was shown on a computer screen. Depending on the displayed cue, subjects had to either complete a reaching movement (go-condition or had to abort the movement (stop-condition. The cue was presented with different display durations (20-160 ms. In the second Verbalization protocol, subjects verbalized what they experienced on the screen. Again, the cue was presented with different display durations. This second protocol tested for conscious perception of the visual cue. The results of this study show that subjects achieved significantly more correct responses in the Motor reaction protocol than in the Verbalization protocol. This difference was only observed at the very short display durations of the visual cue. Since correct responses in the Verbalization protocol required conscious perception of the visual information, our findings imply that the subjects performed correct motor responses to visual cues, which they were not conscious about. It is therefore concluded that humans may reach decisions based on subconscious visual information in a choice reaction task.
Seeing the talker's face supports executive processing of speech in steady state noise.

Science.gov (United States)

Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

2013-01-01

Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.
Phonetic Category Cues in Adult-Directed Speech: Evidence from Three Languages with Distinct Vowel Characteristics

Science.gov (United States)

Pons, Ferran; Biesanz, Jeremy C.; Kajikawa, Sachiyo; Fais, Laurel; Narayan, Chandan R.; Amano, Shigeaki; Werker, Janet F.

2012-01-01

Using an artificial language learning manipulation, Maye, Werker, and Gerken (2002) demonstrated that infants' speech sound categories change as a function of the distributional properties of the input. In a recent study, Werker et al. (2007) showed that Infant-directed Speech (IDS) input contains reliable acoustic cues that support distributional…
Motor Training: Comparison of Visual and Auditory Coded Proprioceptive Cues

Directory of Open Access Journals (Sweden)

Philip Jepson

2012-05-01

Full Text Available Self-perception of body posture and movement is achieved through multi-sensory integration, particularly the utilisation of vision, and proprioceptive information derived from muscles and joints. Disruption to these processes can occur following a neurological accident, such as stroke, leading to sensory and physical impairment. Rehabilitation can be helped through use of augmented visual and auditory biofeedback to stimulate neuro-plasticity, but the effective design and application of feedback, particularly in the auditory domain, is non-trivial. Simple auditory feedback was tested by comparing the stepping accuracy of normal subjects when given a visual spatial target (step length and an auditory temporal target (step duration. A baseline measurement of step length and duration was taken using optical motion capture. Subjects (n=20 took 20 ‘training’ steps (baseline ±25% using either an auditory target (950 Hz tone, bell-shaped gain envelope or visual target (spot marked on the floor and were then asked to replicate the target step (length or duration corresponding to training with all feedback removed. Visual cues (mean percentage error=11.5%; SD ± 7.0%; auditory cues (mean percentage error = 12.9%; SD ± 11.8%. Visual cues elicit a high degree of accuracy both in training and follow-up un-cued tasks; despite the novelty of the auditory cues present for subjects, the mean accuracy of subjects approached that for visual cues, and initial results suggest a limited amount of practice using auditory cues can improve performance.
Self-organizing maps for measuring similarity of audiovisual speech percepts

DEFF Research Database (Denmark)

Bothe, Hans-Heinrich

The goal of this work is to find a way to measure similarity of audiovisual speech percepts. Phoneme-related self-organizing maps (SOM) with a rectangular basis are trained with data material from a (labeled) video film. For the training, a combination of auditory speech features and corresponding....... Dependent on the training data, these other units may also be contextually immediate neighboring units. The poster demonstrates the idea with text material spoken by one individual subject using a set of simple audio-visual features. The data material for the training process consists of 44 labeled...... sentences in German with a balanced phoneme repertoire. As a result it can be stated that (i) the SOM can be trained to map auditory and visual features in a topology-preserving way and (ii) they show strain due to the influence of other audio-visual units. The SOM can be used to measure similarity amongst...
Making the invisible visible: verbal but not visual cues enhance visual detection.

Science.gov (United States)

Lupyan, Gary; Spivey, Michael J

2010-07-07

Can hearing a word change what one sees? Although visual sensitivity is known to be enhanced by attending to the location of the target, perceptual enhancements of following cues to the identity of an object have been difficult to find. Here, we show that perceptual sensitivity is enhanced by verbal, but not visual cues. Participants completed an object detection task in which they made an object-presence or -absence decision to briefly-presented letters. Hearing the letter name prior to the detection task increased perceptual sensitivity (d'). A visual cue in the form of a preview of the to-be-detected letter did not. Follow-up experiments found that the auditory cuing effect was specific to validly cued stimuli. The magnitude of the cuing effect positively correlated with an individual measure of vividness of mental imagery; introducing uncertainty into the position of the stimulus did not reduce the magnitude of the cuing effect, but eliminated the correlation with mental imagery. Hearing a word made otherwise invisible objects visible. Interestingly, seeing a preview of the target stimulus did not similarly enhance detection of the target. These results are compatible with an account in which auditory verbal labels modulate lower-level visual processing. The findings show that a verbal cue in the form of hearing a word can influence even the most elementary visual processing and inform our understanding of how language affects perception.
Making the invisible visible: verbal but not visual cues enhance visual detection.

Directory of Open Access Journals (Sweden)

Gary Lupyan

Full Text Available BACKGROUND: Can hearing a word change what one sees? Although visual sensitivity is known to be enhanced by attending to the location of the target, perceptual enhancements of following cues to the identity of an object have been difficult to find. Here, we show that perceptual sensitivity is enhanced by verbal, but not visual cues. METHODOLOGY/PRINCIPAL FINDINGS: Participants completed an object detection task in which they made an object-presence or -absence decision to briefly-presented letters. Hearing the letter name prior to the detection task increased perceptual sensitivity (d'. A visual cue in the form of a preview of the to-be-detected letter did not. Follow-up experiments found that the auditory cuing effect was specific to validly cued stimuli. The magnitude of the cuing effect positively correlated with an individual measure of vividness of mental imagery; introducing uncertainty into the position of the stimulus did not reduce the magnitude of the cuing effect, but eliminated the correlation with mental imagery. CONCLUSIONS/SIGNIFICANCE: Hearing a word made otherwise invisible objects visible. Interestingly, seeing a preview of the target stimulus did not similarly enhance detection of the target. These results are compatible with an account in which auditory verbal labels modulate lower-level visual processing. The findings show that a verbal cue in the form of hearing a word can influence even the most elementary visual processing and inform our understanding of how language affects perception.
The effect of a concurrent working memory task and temporal offsets on the integration of auditory and visual speech information.

Science.gov (United States)

Buchan, Julie N; Munhall, Kevin G

2012-01-01

Audiovisual speech perception is an everyday occurrence of multisensory integration. Conflicting visual speech information can influence the perception of acoustic speech (namely the McGurk effect), and auditory and visual speech are integrated over a rather wide range of temporal offsets. This research examined whether the addition of a concurrent cognitive load task would affect the audiovisual integration in a McGurk speech task and whether the cognitive load task would cause more interference at increasing offsets. The amount of integration was measured by the proportion of responses in incongruent trials that did not correspond to the audio (McGurk response). An eye-tracker was also used to examine whether the amount of temporal offset and the presence of a concurrent cognitive load task would influence gaze behavior. Results from this experiment show a very modest but statistically significant decrease in the number of McGurk responses when subjects also perform a cognitive load task, and that this effect is relatively constant across the various temporal offsets. Participant's gaze behavior was also influenced by the addition of a cognitive load task. Gaze was less centralized on the face, less time was spent looking at the mouth and more time was spent looking at the eyes, when a concurrent cognitive load task was added to the speech task.
Real-time decreased sensitivity to an audio-visual illusion during goal-directed reaching.

Directory of Open Access Journals (Sweden)

Luc Tremblay

Full Text Available In humans, sensory afferences are combined and integrated by the central nervous system (Ernst MO, Bülthoff HH (2004 Trends Cogn. Sci. 8: 162-169 and appear to provide a holistic representation of the environment. Empirical studies have repeatedly shown that vision dominates the other senses, especially for tasks with spatial demands. In contrast, it has also been observed that sound can strongly alter the perception of visual events. For example, when presented with 2 flashes and 1 beep in a very brief period of time, humans often report seeing 1 flash (i.e. fusion illusion, Andersen TS, Tiippana K, Sams M (2004 Brain Res. Cogn. Brain Res. 21: 301-308. However, it is not known how an unfolding movement modulates the contribution of vision to perception. Here, we used the audio-visual illusion to demonstrate that goal-directed movements can alter visual information processing in real-time. Specifically, the fusion illusion was linearly reduced as a function of limb velocity. These results suggest that cue combination and integration can be modulated in real-time by goal-directed behaviors; perhaps through sensory gating (Chapman CE, Beauchamp E (2006 J. Neurophysiol. 96: 1664-1675 and/or altered sensory noise (Ernst MO, Bülthoff HH (2004 Trends Cogn. Sci. 8: 162-169 during limb movements.
Seeing the talker’s face supports executive processing of speech in steady state noise

Directory of Open Access Journals (Sweden)

Sushmit eMishra

2013-11-01

Full Text Available Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013 along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity. Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.
Seeing the talker’s face supports executive processing of speech in steady state noise

Science.gov (United States)

Mishra, Sushmit; Lunner, Thomas; Stenfelt, Stefan; Rönnberg, Jerker; Rudner, Mary

2013-01-01

Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT; Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition) and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity (WMC). Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills. PMID:24324411
Automatic summarization of soccer highlights using audio-visual descriptors.

Science.gov (United States)

Raventós, A; Quijada, R; Torres, Luis; Tarrés, Francesc

2015-01-01

Automatic summarization generation of sports video content has been object of great interest for many years. Although semantic descriptions techniques have been proposed, many of the approaches still rely on low-level video descriptors that render quite limited results due to the complexity of the problem and to the low capability of the descriptors to represent semantic content. In this paper, a new approach for automatic highlights summarization generation of soccer videos using audio-visual descriptors is presented. The approach is based on the segmentation of the video sequence into shots that will be further analyzed to determine its relevance and interest. Of special interest in the approach is the use of the audio information that provides additional robustness to the overall performance of the summarization system. For every video shot a set of low and mid level audio-visual descriptors are computed and lately adequately combined in order to obtain different relevance measures based on empirical knowledge rules. The final summary is generated by selecting those shots with highest interest according to the specifications of the user and the results of relevance measures. A variety of results are presented with real soccer video sequences that prove the validity of the approach.
Effectiveness of auditory and tactile crossmodal cues in a dual-task visual and auditory scenario.

Science.gov (United States)

Hopkins, Kevin; Kass, Steven J; Blalock, Lisa Durrance; Brill, J Christopher

2017-05-01

In this study, we examined how spatially informative auditory and tactile cues affected participants' performance on a visual search task while they simultaneously performed a secondary auditory task. Visual search task performance was assessed via reaction time and accuracy. Tactile and auditory cues provided the approximate location of the visual target within the search display. The inclusion of tactile and auditory cues improved performance in comparison to the no-cue baseline conditions. In comparison to the no-cue conditions, both tactile and auditory cues resulted in faster response times in the visual search only (single task) and visual-auditory (dual-task) conditions. However, the effectiveness of auditory and tactile cueing for visual task accuracy was shown to be dependent on task-type condition. Crossmodal cueing remains a viable strategy for improving task performance without increasing attentional load within a singular sensory modality. Practitioner Summary: Crossmodal cueing with dual-task performance has not been widely explored, yet has practical applications. We examined the effects of auditory and tactile crossmodal cues on visual search performance, with and without a secondary auditory task. Tactile cues aided visual search accuracy when also engaged in a secondary auditory task, whereas auditory cues did not.
SPEECH VISUALIZATION SISTEM AS A BASIS FOR SPEECH TRAINING AND COMMUNICATION AIDS

Directory of Open Access Journals (Sweden)

Oliana KRSTEVA

1997-09-01

Full Text Available One receives much more information through a visual sense than through a tactile one. However, most visual aids for hearing-impaired persons are not wearable because it is difficult to make them compact and it is not a best way to mask always their vision.Generally it is difficult to get the integrated patterns by a single mathematical transform of signals, such as a Foruier transform. In order to obtain the integrated pattern speech parameters should be carefully extracted by an analysis according as each parameter, and a visual pattern, which can intuitively be understood by anyone, must be synthesized from them. Successful integration of speech parameters will never disturb understanding of individual features, so that the system can be used for speech training and communication.
Can you hear me yet? An intracranial investigation of speech and non-speech audiovisual interactions in human cortex.

Science.gov (United States)

Rhone, Ariane E; Nourski, Kirill V; Oya, Hiroyuki; Kawasaki, Hiroto; Howard, Matthew A; McMurray, Bob

In everyday conversation, viewing a talker's face can provide information about the timing and content of an upcoming speech signal, resulting in improved intelligibility. Using electrocorticography, we tested whether human auditory cortex in Heschl's gyrus (HG) and on superior temporal gyrus (STG) and motor cortex on precentral gyrus (PreC) were responsive to visual/gestural information prior to the onset of sound and whether early stages of auditory processing were sensitive to the visual content (speech syllable versus non-speech motion). Event-related band power (ERBP) in the high gamma band was content-specific prior to acoustic onset on STG and PreC, and ERBP in the beta band differed in all three areas. Following sound onset, we found with no evidence for content-specificity in HG, evidence for visual specificity in PreC, and specificity for both modalities in STG. These results support models of audio-visual processing in which sensory information is integrated in non-primary cortical areas.
The relationship between level of autistic traits and local bias in the context of the McGurk effect.

Directory of Open Access Journals (Sweden)

Yuta eUjiie

2015-06-01

Full Text Available The McGurk effect is a well-known illustration that demonstrates the influence of visual information on hearing in the context of speech perception. Some studies have reported that individuals with autism spectrum disorder (ASD display abnormal processing of audio-visual speech integration, while other studies showed contradictory results. Based on the dimensional model of ASD, we administered two analog studies to examine the link between level of autistic traits, as assessed by the Autism Spectrum Quotient (AQ, and the McGurk effect among a sample of university students. In the first experiment, we found that autistic traits correlated negatively with fused (McGurk responses. Then, we manipulated presentation types of visual stimuli to examine whether the local bias toward visual speech cues modulated individual differences in the McGurk effect. The presentation included four types of visual images, comprising no image, mouth only, mouth and eyes, and full face. The results revealed that global facial information facilitates the influence of visual speech cues on McGurk stimuli. Moreover, individual differences between groups with low and high levels of autistic traits appeared when the full-face visual speech cue with an incongruent voice condition was presented. These results suggest that individual differences in the McGurk effect might be due to a weak ability to process global facial information in individuals with high levels of autistic traits.
Detecting Parkinson's disease from sustained phonation and speech signals.

Directory of Open Access Journals (Sweden)

Evaldas Vaiciukynas

Full Text Available This study investigates signals from sustained phonation and text-dependent speech modalities for Parkinson's disease screening. Phonation corresponds to the vowel /a/ voicing task and speech to the pronunciation of a short sentence in Lithuanian language. Signals were recorded through two channels simultaneously, namely, acoustic cardioid (AC and smart phone (SP microphones. Additional modalities were obtained by splitting speech recording into voiced and unvoiced parts. Information in each modality is summarized by 18 well-known audio feature sets. Random forest (RF is used as a machine learning algorithm, both for individual feature sets and for decision-level fusion. Detection performance is measured by the out-of-bag equal error rate (EER and the cost of log-likelihood-ratio. Essentia audio feature set was the best using the AC speech modality and YAAFE audio feature set was the best using the SP unvoiced modality, achieving EER of 20.30% and 25.57%, respectively. Fusion of all feature sets and modalities resulted in EER of 19.27% for the AC and 23.00% for the SP channel. Non-linear projection of a RF-based proximity matrix into the 2D space enriched medical decision support by visualization.
Visual-vestibular cue integration for heading perception: applications of optimal cue integration theory.

Science.gov (United States)

Fetsch, Christopher R; Deangelis, Gregory C; Angelaki, Dora E

2010-05-01

The perception of self-motion is crucial for navigation, spatial orientation and motor control. In particular, estimation of one's direction of translation, or heading, relies heavily on multisensory integration in most natural situations. Visual and nonvisual (e.g., vestibular) information can be used to judge heading, but each modality alone is often insufficient for accurate performance. It is not surprising, then, that visual and vestibular signals converge frequently in the nervous system, and that these signals interact in powerful ways at the level of behavior and perception. Early behavioral studies of visual-vestibular interactions consisted mainly of descriptive accounts of perceptual illusions and qualitative estimation tasks, often with conflicting results. In contrast, cue integration research in other modalities has benefited from the application of rigorous psychophysical techniques, guided by normative models that rest on the foundation of ideal-observer analysis and Bayesian decision theory. Here we review recent experiments that have attempted to harness these so-called optimal cue integration models for the study of self-motion perception. Some of these studies used nonhuman primate subjects, enabling direct comparisons between behavioral performance and simultaneously recorded neuronal activity. The results indicate that humans and monkeys can integrate visual and vestibular heading cues in a manner consistent with optimal integration theory, and that single neurons in the dorsal medial superior temporal area show striking correlates of the behavioral effects. This line of research and other applications of normative cue combination models should continue to shed light on mechanisms of self-motion perception and the neuronal basis of multisensory integration.
Extraction of Information of Audio-Visual Contents

Directory of Open Access Journals (Sweden)

Carlos Aguilar

2011-10-01

Full Text Available In this article we show how it is possible to use Channel Theory (Barwise and Seligman, 1997 for modeling the process of information extraction realized by audiences of audio-visual contents. To do this, we rely on the concepts pro- posed by Channel Theory and, especially, its treatment of representational systems. We then show how the information that an agent is capable of extracting from the content depends on the number of channels he is able to establish between the content and the set of classifications he is able to discriminate. The agent can endeavor the extraction of information through these channels from the totality of content; however, we discuss the advantages of extracting from its constituents in order to obtain a greater number of informational items that represent it. After showing how the extraction process is endeavored for each channel, we propose a method of representation of all the informative values an agent can obtain from a content using a matrix constituted by the channels the agent is able to establish on the content (source classifications, and the ones he can understand as individual (destination classifications. We finally show how this representation allows reflecting the evolution of the informative items through the evolution of audio-visual content.
Visual information constrains early and late stages of spoken-word recognition in sentence context.

Science.gov (United States)

Brunellière, Angèle; Sánchez-García, Carolina; Ikumi, Nara; Soto-Faraco, Salvador

2013-07-01

Audiovisual speech perception has been frequently studied considering phoneme, syllable and word processing levels. Here, we examined the constraints that visual speech information might exert during the recognition of words embedded in a natural sentence context. We recorded event-related potentials (ERPs) to words that could be either strongly or weakly predictable on the basis of the prior semantic sentential context and, whose initial phoneme varied in the degree of visual saliency from lip movements. When the sentences were presented audio-visually (Experiment 1), words weakly predicted from semantic context elicited a larger long-lasting N400, compared to strongly predictable words. This semantic effect interacted with the degree of visual saliency over a late part of the N400. When comparing audio-visual versus auditory alone presentation (Experiment 2), the typical amplitude-reduction effect over the auditory-evoked N100 response was observed in the audiovisual modality. Interestingly, a specific benefit of high- versus low-visual saliency constraints occurred over the early N100 response and at the late N400 time window, confirming the result of Experiment 1. Taken together, our results indicate that the saliency of visual speech can exert an influence over both auditory processing and word recognition at relatively late stages, and thus suggest strong interactivity between audio-visual integration and other (arguably higher) stages of information processing during natural speech comprehension. Copyright © 2013 Elsevier B.V. All rights reserved.

Concurrent audio-visual feedback for supporting drivers at intersections: A study using two linked driving simulators.

Science.gov (United States)

Houtenbos, M; de Winter, J C F; Hale, A R; Wieringa, P A; Hagenzieker, M P

2017-04-01

A large portion of road traffic crashes occur at intersections for the reason that drivers lack necessary visual information. This research examined the effects of an audio-visual display that provides real-time sonification and visualization of the speed and direction of another car approaching the crossroads on an intersecting road. The location of red blinking lights (left vs. right on the speedometer) and the lateral input direction of beeps (left vs. right ear in headphones) corresponded to the direction from where the other car approached, and the blink and beep rates were a function of the approaching car's speed. Two driving simulators were linked so that the participant and the experimenter drove in the same virtual world. Participants (N = 25) completed four sessions (two with the audio-visual display on, two with the audio-visual display off), each session consisting of 22 intersections at which the experimenter approached from the left or right and either maintained speed or slowed down. Compared to driving with the display off, the audio-visual display resulted in enhanced traffic efficiency (i.e., greater mean speed, less coasting) while not compromising safety (i.e., the time gap between the two vehicles was equivalent). A post-experiment questionnaire showed that the beeps were regarded as more useful than the lights. It is argued that the audio-visual display is a promising means of supporting drivers until fully automated driving is technically feasible. Copyright © 2016. Published by Elsevier Ltd.
Discriminating individually considerate and authoritarian leaders by speech activity cues

OpenAIRE

Feese, Sebastian; Muaremi, Amir; Arnrich, Bert; Tröster, Gerhard; Meyer, Bertolt; Jonas, Klaus

2011-01-01

Effective leadership can increase team performance, however up to now the influence of specific micro-level behavioral patterns on team performance is unclear. At the same time, current behavior observation methods in social psychology mostly rely on manual video annotations that impede research. In our work, we follow a sensor-based approach to automatically extract speech activity cues to discriminate individualized considerate from authoritarian leadership. On a subset of 35 selected...
First-Pass Processing of Value Cues in the Ventral Visual Pathway.

Science.gov (United States)

Sasikumar, Dennis; Emeric, Erik; Stuphorn, Veit; Connor, Charles E

2018-02-19

Real-world value often depends on subtle, continuously variable visual cues specific to particular object categories, like the tailoring of a suit, the condition of an automobile, or the construction of a house. Here, we used microelectrode recording in behaving monkeys to test two possible mechanisms for category-specific value-cue processing: (1) previous findings suggest that prefrontal cortex (PFC) identifies object categories, and based on category identity, PFC could use top-down attentional modulation to enhance visual processing of category-specific value cues, providing signals to PFC for calculating value, and (2) a faster mechanism would be first-pass visual processing of category-specific value cues, immediately providing the necessary visual information to PFC. This, however, would require learned mechanisms for processing the appropriate cues in a given object category. To test these hypotheses, we trained monkeys to discriminate value in four letter-like stimulus categories. Each category had a different, continuously variable shape cue that signified value (liquid reward amount) as well as other cues that were irrelevant. Monkeys chose between stimuli of different reward values. Consistent with the first-pass hypothesis, we found early signals for category-specific value cues in area TE (the final stage in monkey ventral visual pathway) beginning 81 ms after stimulus onset-essentially at the start of TE responses. Task-related activity emerged in lateral PFC approximately 40 ms later and consisted mainly of category-invariant value tuning. Our results show that, for familiar, behaviorally relevant object categories, high-level ventral pathway cortex can implement rapid, first-pass processing of category-specific value cues. Copyright © 2018 Elsevier Ltd. All rights reserved.
Changes of the Prefrontal EEG (Electroencephalogram) Activities According to the Repetition of Audio-Visual Learning.

Science.gov (United States)

Kim, Yong-Jin; Chang, Nam-Kee

2001-01-01

Investigates the changes of neuronal response according to a four time repetition of audio-visual learning. Obtains EEG data from the prefrontal (Fp1, Fp2) lobe from 20 subjects at the 8th grade level. Concludes that the habituation of neuronal response shows up in repetitive audio-visual learning and brain hemisphericity can be changed by…
Competition between auditory and visual spatial cues during visual task performance

NARCIS (Netherlands)

Koelewijn, T.; Bronkhorst, A.; Theeuwes, J.

2009-01-01

There is debate in the crossmodal cueing literature as to whether capture of visual attention by means of sound is a fully automatic process. Recent studies show that when visual attention is endogenously focused sound still captures attention. The current study investigated whether there is
Comparison of Cue-Controlled Desensitization, Rational Restructuring, and a Credible Placebo in the Treatment of Speech Anxiety.

Science.gov (United States)

Lent, Robert W.; And Others

1981-01-01

The efficacy of cue-controlled desensitization and systematic rational restructuring was compared with a placebo method and a waiting-list control in reducing public speaking and nontargeted anxieties. Cue-controlled desensitization was generally more effective than the other groups in reducing subjective speech anxiety. (Author)
Intelligent audio analysis

CERN Document Server

Schuller, Björn W

2013-01-01

This book provides the reader with the knowledge necessary for comprehension of the field of Intelligent Audio Analysis. It firstly introduces standard methods and discusses the typical Intelligent Audio Analysis chain going from audio data to audio features to audio recognition. Further, an introduction to audio source separation, and enhancement and robustness are given. After the introductory parts, the book shows several applications for the three types of audio: speech, music, and general sound. Each task is shortly introduced, followed by a description of the specific data and methods applied, experiments and results, and a conclusion for this specific task. The books provides benchmark results and standardized test-beds for a broader range of audio analysis tasks. The main focus thereby lies on the parallel advancement of realism in audio analysis, as too often today’s results are overly optimistic owing to idealized testing conditions, and it serves to stimulate synergies arising from transfer of ...
Early, but not late visual distractors affect movement synchronization to a temporal-spatial visual cue

Directory of Open Access Journals (Sweden)

Ashley J Booth

2015-06-01

Full Text Available The ease of synchronising movements to a rhythmic cue is dependent on the modality of the cue presentation: timing accuracy is much higher when synchronising with discrete auditory rhythms than an equivalent visual stimulus presented through flashes. However, timing accuracy is improved if the visual cue presents spatial as well as temporal information (e.g. a dot following an oscillatory trajectory. Similarly, when synchronising with an auditory target metronome in the presence of a second visual distracting metronome, the distraction is stronger when the visual cue contains spatial-temporal information rather than temporal only. The present study investigates individuals’ ability to synchronise movements to a temporal-spatial visual cue in the presence of same-modality temporal-spatial distractors. Moreover, we investigated how increasing the number of distractor stimuli impacted on maintaining synchrony with the target cue. Participants made oscillatory vertical arm movements in time with a vertically oscillating white target dot centred on a large projection screen. The target dot was surrounded by 2, 8 or 14 distractor dots, which had an identical trajectory to the target but at a phase lead or lag of 0, 100 or 200ms. We found participants’ timing performance was only affected in the phase-lead conditions and when there were large numbers of distractors present (8 and 14. This asymmetry suggests participants still rely on salient events in the stimulus trajectory to synchronise movements. Subsequently, distractions occurring in the window of attention surrounding those events have the maximum impact on timing performance.
A psychophysical imaging method evidencing auditory cue extraction during speech perception: a group analysis of auditory classification images.

Science.gov (United States)

Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel

2015-01-01

Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.
Head orientation of walking blowflies is controlled by visual and mechanical cues

OpenAIRE

Monteagudo Ibarreta, José; Lindemann, Jens Peter; Egelhaaf, Martin

2017-01-01

During locomotion, animals employ visual and mechanical cues in order to establish the orientation of their head, which reflects the orientation of the visual coordinate system. However, in certain situations, contradictory cues may suggest different orientations relative to the environment. We recorded blowflies walking on a horizontal or tilted surface surrounded by visual cues suggesting a variety of orientations.We found that the different orientations relative to gra...
Teacher’s Voice on Metacognitive Strategy Based Instruction Using Audio Visual Aids for Listening

Directory of Open Access Journals (Sweden)

Salasiah Salasiah

2018-02-01

Full Text Available The paper primarily stresses on exploring the teacher’s voice toward the application of metacognitive strategy with audio-visual aid in improving listening comprehension. The metacognitive strategy model applied in the study was inspired from Vandergrift and Tafaghodtari (2010 instructional model. Thus it is modified in the procedure and applied with audio-visual aids for improving listening comprehension. The study’s setting was at SMA Negeri 2 Parepare, South Sulawesi Province, Indonesia. The population of the research was the teacher of English at tenth grade at SMAN 2. The sample was taken by using random sampling technique. The data was collected by using in depth interview during the research, recorded, and analyzed using qualitative analysis. This study explored the teacher’s response toward the modified model of metacognitive strategy with audio visual aids in class of listening which covers positive and negative response toward the strategy applied during the teaching of listening. The result of data showed that this strategy helped the teacher a lot in teaching listening comprehension as the procedure has systematic steps toward students’ listening comprehension. Also, it eases the teacher to teach listening by empowering audio visual aids such as video taken from youtube.
CREATING AUDIO VISUAL DIALOGUE TASK AS STUDENTS’ SELF ASSESSMENT TO ENHANCE THEIR SPEAKING ABILITY

Directory of Open Access Journals (Sweden)

Novia Trisanti

2017-04-01

Full Text Available The study is about giving overview of employing audio visual dialogue task as students creativity task and self assessment in EFL speaking class of tertiary education to enhance the students speaking ability. The qualitative research was done in one of the speaking classes at English Department, Semarang State University, Central Java, Indonesia. The results that can be seen from the rubric of self assessment show that the oral performance through audio visual recorded tasks done by the students as their self assessment gave positive evidences. The audio visual dialogue task can be very beneficial since it can motivate the students learning and increase their learning experiences. The self-assessment can be a valuable additional means to improve their speaking ability since it is one of the motives that drive self- evaluatioan, along with self- verification and self- enhancement.
Lip movements affect infants' audiovisual speech perception.

Science.gov (United States)

Yeung, H Henny; Werker, Janet F

2013-05-01

Speech is robustly audiovisual from early in infancy. Here we show that audiovisual speech perception in 4.5-month-old infants is influenced by sensorimotor information related to the lip movements they make while chewing or sucking. Experiment 1 consisted of a classic audiovisual matching procedure, in which two simultaneously displayed talking faces (visual [i] and [u]) were presented with a synchronous vowel sound (audio /i/ or /u/). Infants' looking patterns were selectively biased away from the audiovisual matching face when the infants were producing lip movements similar to those needed to produce the heard vowel. Infants' looking patterns returned to those of a baseline condition (no lip movements, looking longer at the audiovisual matching face) when they were producing lip movements that did not match the heard vowel. Experiment 2 confirmed that these sensorimotor effects interacted with the heard vowel, as looking patterns differed when infants produced these same lip movements while seeing and hearing a talking face producing an unrelated vowel (audio /a/). These findings suggest that the development of speech perception and speech production may be mutually informative.
Speech endpoint detection with non-language speech sounds for generic speech processing applications

Science.gov (United States)

McClain, Matthew; Romanowski, Brian

2009-05-01

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Mobile video-to-audio transducer and motion detection for sensory substitution

Directory of Open Access Journals (Sweden)

Maxime eAmbard

2015-10-01

Full Text Available Visuo-auditory sensory substitution systems are augmented reality devices that translate a video stream into an audio stream in order to help the blind in daily tasks requiring visuo-spatial information. In this work, we present both a new mobile device and a transcoding method specifically designed to sonify moving objects. Frame differencing is used to extract spatial features from the video stream and two-dimensional spatial information is converted into audio cues using pitch, interaural time difference and interaural level difference. Using numerical methods, we attempt to reconstruct visuo-spatial information based on audio signals generated from various video stimuli. We show that despite a contrasted visual background and a highly lossy encoding method, the information in the audio signal is sufficient to allow object localization, object trajectory evaluation, object approach detection, and spatial separation of multiple objects. We also show that this type of audio signal can be interpreted by human users by asking ten subjects to discriminate trajectories based on generated audio signals.
Audio Classification in Speech and Music: A Comparison between a Statistical and a Neural Approach

Directory of Open Access Journals (Sweden)

Alessandro Bugatti

2002-04-01

Full Text Available We focus the attention on the problem of audio classification in speech and music for multimedia applications. In particular, we present a comparison between two different techniques for speech/music discrimination. The first method is based on Zero crossing rate and Bayesian classification. It is very simple from a computational point of view, and gives good results in case of pure music or speech. The simulation results show that some performance degradation arises when the music segment contains also some speech superimposed on music, or strong rhythmic components. To overcome these problems, we propose a second method, that uses more features, and is based on neural networks (specifically a multi-layer Perceptron. In this case we obtain better performance, at the expense of a limited growth in the computational complexity. In practice, the proposed neural network is simple to be implemented if a suitable polynomial is used as the activation function, and a real-time implementation is possible even if low-cost embedded systems are used.
Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users.

Science.gov (United States)

Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha

2015-03-01

While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed
Applications in accessibility of text-to-speech synthesis for South African languages: Initial system integration and user engagement

CSIR Research Space (South Africa)

Schlünz, Georg I

2017-09-01

Full Text Available with little or no functional speech to speak out loud. Screen readers and accessible e-books allow a print-disabled (visually-impaired, partially-sighted or dyslexic) individual to read text material by listening to audio versions. Text-to-speech synthesis...
The Auditory-Visual Speech Benefit on Working Memory in Older Adults with Hearing Impairment

Directory of Open Access Journals (Sweden)

Jana B. Frtusova

2016-04-01

Full Text Available This study examined the effect of auditory-visual (AV speech stimuli on working memory in hearing impaired participants (HIP in comparison to age- and education-matched normal elderly controls (NEC. Participants completed a working memory n-back task (0- to 2-back in which sequences of digits were presented in visual-only (i.e., speech-reading, auditory-only (A-only, and AV conditions. Auditory event-related potentials (ERP were collected to assess the relationship between perceptual and working memory processing. The behavioural results showed that both groups were faster in the AV condition in comparison to the unisensory conditions. The ERP data showed perceptual facilitation in the AV condition, in the form of reduced amplitudes and latencies of the auditory N1 and/or P1 components, in the HIP group. Furthermore, a working memory ERP component, the P3, peaked earlier for both groups in the AV condition compared to the A-only condition. In general, the HIP group showed a more robust AV benefit; however, the NECs showed a dose-response relationship between perceptual facilitation and working memory improvement, especially for facilitation of processing speed. Two measures, reaction time and P3 amplitude, suggested that the presence of visual speech cues may have helped the HIP to counteract the demanding auditory processing, to the level that no group differences were evident during the AV modality despite lower performance during the A-only condition. Overall, this study provides support for the theory of an integrated perceptual-cognitive system. The practical significance of these findings is also discussed.
Anemonefishes rely on visual and chemical cues to correctly identify conspecifics

Science.gov (United States)

Johnston, Nicole K.; Dixson, Danielle L.

2017-09-01

Organisms rely on sensory cues to interpret their environment and make important life-history decisions. Accurate recognition is of particular importance in diverse reef environments. Most evidence on the use of sensory cues focuses on those used in predator avoidance or habitat recognition, with little information on their role in conspecific recognition. Yet conspecific recognition is essential for life-history decisions including settlement, mate choice, and dominance interactions. Using a sensory manipulated tank and a two-chamber choice flume, anemonefish conspecific response was measured in the presence and absence of chemical and/or visual cues. Experiments were then repeated in the presence or absence of two heterospecific species to evaluate whether a heterospecific fish altered the conspecific response. Anemonefishes responded to both the visual and chemical cues of conspecifics, but relied on the combination of the two cues to recognize conspecifics inside the sensory manipulated tank. These results contrast previous studies focusing on predator detection where anemonefishes were found to compensate for the loss of one sensory cue (chemical) by utilizing a second cue (visual). This lack of sensory compensation may impact the ability of anemonefishes to acclimate to changing reef environments in the future.

Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech.

Science.gov (United States)

Poon, Matthew; Schutz, Michael

2015-01-01

Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound "happier" than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of "balanced" major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach's Well-Tempered Clavier (book 1), as well as all 24 of Chopin's Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma ("A," "B," "C," etc.). Consistent with predictions derived from speech, we found major-key (nominally "happy") pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally "sad") pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music.
Optimized Audio Classification and Segmentation Algorithm by Using Ensemble Methods

Directory of Open Access Journals (Sweden)

Saadia Zahid

2015-01-01

Full Text Available Audio segmentation is a basis for multimedia content analysis which is the most important and widely used application nowadays. An optimized audio classification and segmentation algorithm is presented in this paper that segments a superimposed audio stream on the basis of its content into four main audio types: pure-speech, music, environment sound, and silence. An algorithm is proposed that preserves important audio content and reduces the misclassification rate without using large amount of training data, which handles noise and is suitable for use for real-time applications. Noise in an audio stream is segmented out as environment sound. A hybrid classification approach is used, bagged support vector machines (SVMs with artificial neural networks (ANNs. Audio stream is classified, firstly, into speech and nonspeech segment by using bagged support vector machines; nonspeech segment is further classified into music and environment sound by using artificial neural networks and lastly, speech segment is classified into silence and pure-speech segments on the basis of rule-based classifier. Minimum data is used for training classifier; ensemble methods are used for minimizing misclassification rate and approximately 98% accurate segments are obtained. A fast and efficient algorithm is designed that can be used with real-time multimedia applications.
N1 enhancement in synesthesia during visual and audio-visual perception in semantic cross-modal conflict situations: an ERP study

Directory of Open Access Journals (Sweden)

Christopher eSinke

2014-01-01

Full Text Available Synesthesia entails a special kind of sensory perception, where stimulation in one sensory modality leads to an internally generated perceptual experience of another, not stimulated sensory modality. This phenomenon can be viewed as an abnormal multisensory integration process as here the synesthetic percept is aberrantly fused with the stimulated modality. Indeed, recent synesthesia research has focused on multimodal processing even outside of the specific synesthesia-inducing context and has revealed changed multimodal integration, thus suggesting perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and inanimated objects in an audio-visual congruent and incongruent manner. Fourteen subjects with auditory-visual and/or grapheme-color synesthesia and 14 control subjects participated in the experiment. During presentation of the stimuli, event-related potentials were recorded from 32 electrodes. The analysis of reaction times and error rates revealed no group differences with best performance for audio-visually congruent stimulation indicating the well-known multimodal facilitation effect. We found an enhanced amplitude of the N1 component over occipital electrode sites for synesthetes compared to controls. The differences occurred irrespective of the experimental condition and therefore suggest a global influence on early sensory processing in synesthetes.
The Role of Visual Cues in Microgravity Spatial Orientation

Science.gov (United States)

Oman, Charles M.; Howard, Ian P.; Smith, Theodore; Beall, Andrew C.; Natapoff, Alan; Zacher, James E.; Jenkin, Heather L.

2003-01-01

In weightlessness, astronauts must rely on vision to remain spatially oriented. Although gravitational down cues are missing, most astronauts maintain a subjective vertical -a subjective sense of which way is up. This is evidenced by anecdotal reports of crewmembers feeling upside down (inversion illusions) or feeling that a floor has become a ceiling and vice versa (visual reorientation illusions). Instability in the subjective vertical direction can trigger disorientation and space motion sickness. On Neurolab, a virtual environment display system was used to conduct five interrelated experiments, which quantified: (a) how the direction of each person's subjective vertical depends on the orientation of the surrounding visual environment, (b) whether rolling the virtual visual environment produces stronger illusions of circular self-motion (circular vection) and more visual reorientation illusions than on Earth, (c) whether a virtual scene moving past the subject produces a stronger linear self-motion illusion (linear vection), and (d) whether deliberate manipulation of the subjective vertical changes a crewmember's interpretation of shading or the ability to recognize objects. None of the crew's subjective vertical indications became more independent of environmental cues in weightlessness. Three who were either strongly dependent on or independent of stationary visual cues in preflight tests remained so inflight. One other became more visually dependent inflight, but recovered postflight. Susceptibility to illusions of circular self-motion increased in flight. The time to the onset of linear self-motion illusions decreased and the illusion magnitude significantly increased for most subjects while free floating in weightlessness. These decreased toward one-G levels when the subject 'stood up' in weightlessness by wearing constant force springs. For several subjects, changing the relative direction of the subjective vertical in weightlessness-either by body
Visually induced gains in pitch discrimination: Linking audio-visual processing with auditory abilities.

Science.gov (United States)

Møller, Cecilie; Højlund, Andreas; Bærentsen, Klaus B; Hansen, Niels Chr; Skewes, Joshua C; Vuust, Peter

2018-05-01

Perception is fundamentally a multisensory experience. The principle of inverse effectiveness (PoIE) states how the multisensory gain is maximal when responses to the unisensory constituents of the stimuli are weak. It is one of the basic principles underlying multisensory processing of spatiotemporally corresponding crossmodal stimuli that are well established at behavioral as well as neural levels. It is not yet clear, however, how modality-specific stimulus features influence discrimination of subtle changes in a crossmodally corresponding feature belonging to another modality. Here, we tested the hypothesis that reliance on visual cues to pitch discrimination follow the PoIE at the interindividual level (i.e., varies with varying levels of auditory-only pitch discrimination abilities). Using an oddball pitch discrimination task, we measured the effect of varying visually perceived vertical position in participants exhibiting a wide range of pitch discrimination abilities (i.e., musicians and nonmusicians). Visual cues significantly enhanced pitch discrimination as measured by the sensitivity index d', and more so in the crossmodally congruent than incongruent condition. The magnitude of gain caused by compatible visual cues was associated with individual pitch discrimination thresholds, as predicted by the PoIE. This was not the case for the magnitude of the congruence effect, which was unrelated to individual pitch discrimination thresholds, indicating that the pitch-height association is robust to variations in auditory skills. Our findings shed light on individual differences in multisensory processing by suggesting that relevant multisensory information that crucially aids some perceivers' performance may be of less importance to others, depending on their unisensory abilities.
Infants' preference for native audiovisual speech dissociated from congruency preference.

Directory of Open Access Journals (Sweden)

Kathleen Shaw

Full Text Available Although infant speech perception in often studied in isolated modalities, infants' experience with speech is largely multimodal (i.e., speech sounds they hear are accompanied by articulating faces. Across two experiments, we tested infants' sensitivity to the relationship between the auditory and visual components of audiovisual speech in their native (English and non-native (Spanish language. In Experiment 1, infants' looking times were measured during a preferential looking task in which they saw two simultaneous visual speech streams articulating a story, one in English and the other in Spanish, while they heard either the English or the Spanish version of the story. In Experiment 2, looking times from another group of infants were measured as they watched single displays of congruent and incongruent combinations of English and Spanish audio and visual speech streams. Findings demonstrated an age-related increase in looking towards the native relative to non-native visual speech stream when accompanied by the corresponding (native auditory speech. This increase in native language preference did not appear to be driven by a difference in preference for native vs. non-native audiovisual congruence as we observed no difference in looking times at the audiovisual streams in Experiment 2.
Visual feedback of tongue movement for novel speech sound learning

Directory of Open Access Journals (Sweden)

William F Katz

2015-11-01

Full Text Available Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV information. Second language (L2 learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals. However, little is known about the role of viewing one’s own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker’s learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ̠/; a voiced, coronal, palatal stop before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers’ productions were evaluated using kinematic (tongue-tip spatial positioning and acoustic (burst spectra measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing.
A psychophysical imaging method evidencing auditory cue extraction during speech perception: a group analysis of auditory classification images.

Directory of Open Access Journals (Sweden)

Léo Varnet

Full Text Available Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.
Facial Speech Gestures: The Relation between Visual Speech Processing, Phonological Awareness, and Developmental Dyslexia in 10-Year-Olds

Science.gov (United States)

Schaadt, Gesa; Männel, Claudia; van der Meer, Elke; Pannekamp, Ann; Friederici, Angela D.

2016-01-01

Successful communication in everyday life crucially involves the processing of auditory and visual components of speech. Viewing our interlocutor and processing visual components of speech facilitates speech processing by triggering auditory processing. Auditory phoneme processing, analyzed by event-related brain potentials (ERP), has been shown…
Computerized Audio-Visual Instructional Sequences (CAVIS): A Versatile System for Listening Comprehension in Foreign Language Teaching.

Science.gov (United States)

Aleman-Centeno, Josefina R.

1983-01-01

Discusses the development and evaluation of CAVIS, which consists of an Apple microcomputer used with audiovisual dialogs. Includes research on the effects of three conditions: (1) computer with audio and visual, (2) computer with audio alone and (3) audio alone in short-term and long-term recall. (EKN)
Retro-dimension-cue benefit in visual working memory.

Science.gov (United States)

Ye, Chaoxiong; Hu, Zhonghua; Ristaniemi, Tapani; Gendron, Maria; Liu, Qiang

2016-10-24

In visual working memory (VWM) tasks, participants' performance can be improved by a retro-object-cue. However, previous studies have not investigated whether participants' performance can also be improved by a retro-dimension-cue. Three experiments investigated this issue. We used a recall task with a retro-dimension-cue in all experiments. In Experiment 1, we found benefits from retro-dimension-cues compared to neutral cues. This retro-dimension-cue benefit is reflected in an increased probability of reporting the target, but not in the probability of reporting the non-target, as well as increased precision with which this item is remembered. Experiment 2 replicated the retro-dimension-cue benefit and showed that the length of the blank interval after the cue disappeared did not influence recall performance. Experiment 3 replicated the results of Experiment 2 with a lower memory load. Our studies provide evidence that there is a robust retro-dimension-cue benefit in VWM. Participants can use internal attention to flexibly allocate cognitive resources to a particular dimension of memory representations. The results also support the feature-based storing hypothesis.
Attentional bias to food-related visual cues: is there a role in obesity?

Science.gov (United States)

Doolan, K J; Breslin, G; Hanna, D; Gallagher, A M

2015-02-01

The incentive sensitisation model of obesity suggests that modification of the dopaminergic associated reward systems in the brain may result in increased awareness of food-related visual cues present in the current food environment. Having a heightened awareness of these visual food cues may impact on food choices and eating behaviours with those being most aware of or demonstrating greater attention to food-related stimuli potentially being at greater risk of overeating and subsequent weight gain. To date, research related to attentional responses to visual food cues has been both limited and conflicting. Such inconsistent findings may in part be explained by the use of different methodological approaches to measure attentional bias and the impact of other factors such as hunger levels, energy density of visual food cues and individual eating style traits that may influence visual attention to food-related cues outside of weight status alone. This review examines the various methodologies employed to measure attentional bias with a particular focus on the role that attentional processing of food-related visual cues may have in obesity. Based on the findings of this review, it appears that it may be too early to clarify the role visual attention to food-related cues may have in obesity. Results however highlight the importance of considering the most appropriate methodology to use when measuring attentional bias and the characteristics of the study populations targeted while interpreting results to date and in designing future studies.
The location but not the attributes of visual cues are automatically encoded into working memory.

Science.gov (United States)

Chen, Hui; Wyble, Brad

2015-02-01

Although it has been well known that visual cues affect the perception of subsequent visual stimuli, relatively little is known about how the cues themselves are processed. The present study attempted to characterize the processing of a visual cue by investigating what information about the cue is stored in terms of both location ("where" is the cue) and attributes ("what" are the attributes of the cue). In 11 experiments subjects performed several trials of reporting a target letter and then answered an unexpected question about the cue (e.g., the location, color, or identity of the cue). This surprise question revealed that participants could report the location of the cue even when the cue never indicated the target location and they were explicitly told to ignore it. Furthermore, the memory trace of this location information endured during encoding of the subsequent target. In contrast to location, attributes of the cue (e.g., color) were poorly reported, even for attributes that were used by subjects to perform the task. These results shed new light on the mechanisms underlying cueing effects and suggest also that the visual system may create empty object files in response to visual cues. Copyright © 2014 Elsevier Ltd. All rights reserved.
Audio Frequency Analysis in Mobile Phones

Science.gov (United States)

Aguilar, Horacio Munguía

2016-01-01

A new experiment using mobile phones is proposed in which its audio frequency response is analyzed using the audio port for inputting external signal and getting a measurable output. This experiment shows how the limited audio bandwidth used in mobile telephony is the main cause of the poor speech quality in this service. A brief discussion is…
Visual context enhanced. The joint contribution of iconic gestures and visible speech to degraded speech comprehension.

NARCIS (Netherlands)

Drijvers, L.; Özyürek, A.

2017-01-01

Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech
Visual speech alters the discrimination and identification of non-intact auditory speech in children with hearing loss.

Science.gov (United States)

Jerger, Susan; Damian, Markus F; McAlpine, Rachel P; Abdi, Hervé

2017-03-01

Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/-B/aa or/-B/az). The items started with an easy-to-speechread/B/or difficult-to-speechread/G/onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/-B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same-as opposed to different-responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g.,/-B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz-as opposed to az- responses in the audiovisual than auditory mode. Performance in the audiovisual mode showed more same
Visual Speech Alters the Discrimination and Identification of Non-Intact Auditory Speech in Children with Hearing Loss

Science.gov (United States)

Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Hervé

2017-01-01

Objectives Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Methods Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/–B/aa or /–B/az). The items started with an easy-to-speechread /B/ or difficult-to-speechread /G/ onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/–B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same—as opposed to different—responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g., /–B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz—as opposed to az— responses in the audiovisual than auditory mode. Results
Improving visual spatial working memory in younger and older adults: effects of cross-modal cues.

Science.gov (United States)

Curtis, Ashley F; Turner, Gary R; Park, Norman W; Murtha, Susan J E

2017-11-06

Spatially informative auditory and vibrotactile (cross-modal) cues can facilitate attention but little is known about how similar cues influence visual spatial working memory (WM) across the adult lifespan. We investigated the effects of cues (spatially informative or alerting pre-cues vs. no cues), cue modality (auditory vs. vibrotactile vs. visual), memory array size (four vs. six items), and maintenance delay (900 vs. 1800 ms) on visual spatial location WM recognition accuracy in younger adults (YA) and older adults (OA). We observed a significant interaction between spatially informative pre-cue type, array size, and delay. OA and YA benefitted equally from spatially informative pre-cues, suggesting that attentional orienting prior to WM encoding, regardless of cue modality, is preserved with age. Contrary to predictions, alerting pre-cues generally impaired performance in both age groups, suggesting that maintaining a vigilant state of arousal by facilitating the alerting attention system does not help visual spatial location WM.
Visual and Auditory Input in Second-Language Speech Processing

Science.gov (United States)

Hardison, Debra M.

2010-01-01

The majority of studies in second-language (L2) speech processing have involved unimodal (i.e., auditory) input; however, in many instances, speech communication involves both visual and auditory sources of information. Some researchers have argued that multimodal speech is the primary mode of speech perception (e.g., Rosenblum 2005). Research on…
The Auditory-Visual Speech Benefit on Working Memory in Older Adults with Hearing Impairment.

Science.gov (United States)

Frtusova, Jana B; Phillips, Natalie A

2016-01-01

This study examined the effect of auditory-visual (AV) speech stimuli on working memory in older adults with poorer-hearing (PH) in comparison to age- and education-matched older adults with better hearing (BH). Participants completed a working memory n-back task (0- to 2-back) in which sequences of digits were presented in visual-only (i.e., speech-reading), auditory-only (A-only), and AV conditions. Auditory event-related potentials (ERP) were collected to assess the relationship between perceptual and working memory processing. The behavioral results showed that both groups were faster in the AV condition in comparison to the unisensory conditions. The ERP data showed perceptual facilitation in the AV condition, in the form of reduced amplitudes and latencies of the auditory N1 and/or P1 components, in the PH group. Furthermore, a working memory ERP component, the P3, peaked earlier for both groups in the AV condition compared to the A-only condition. In general, the PH group showed a more robust AV benefit; however, the BH group showed a dose-response relationship between perceptual facilitation and working memory improvement, especially for facilitation of processing speed. Two measures, reaction time and P3 amplitude, suggested that the presence of visual speech cues may have helped the PH group to counteract the demanding auditory processing, to the level that no group differences were evident during the AV modality despite lower performance during the A-only condition. Overall, this study provides support for the theory of an integrated perceptual-cognitive system. The practical significance of these findings is also discussed.

Audio Recording of Children with Dyslalia

OpenAIRE

Stefan Gheorghe Pentiuc; Maria D. Schipor; Ovidiu A. Schipor

2008-01-01

In this paper we present our researches regarding automat parsing of audio recordings. These recordings are obtained from children with dyslalia and are necessary for an accurate identification of speech problems. We develop a software application that helps parsing audio, real time, recordings.
Attentional reorienting triggers spatial asymmetries in a search task with cross-modal spatial cueing.

Directory of Open Access Journals (Sweden)

Rebecca E Paladini

Full Text Available Cross-modal spatial cueing can affect performance in a visual search task. For example, search performance improves if a visual target and an auditory cue originate from the same spatial location, and it deteriorates if they originate from different locations. Moreover, it has recently been postulated that multisensory settings, i.e., experimental settings, in which critical stimuli are concurrently presented in different sensory modalities (e.g., visual and auditory, may trigger asymmetries in visuospatial attention. Thereby, a facilitation has been observed for visual stimuli presented in the right compared to the left visual space. However, it remains unclear whether auditory cueing of attention differentially affects search performance in the left and the right hemifields in audio-visual search tasks. The present study investigated whether spatial asymmetries would occur in a search task with cross-modal spatial cueing. Participants completed a visual search task that contained no auditory cues (i.e., unimodal visual condition, spatially congruent, spatially incongruent, and spatially non-informative auditory cues. To further assess participants' accuracy in localising the auditory cues, a unimodal auditory spatial localisation task was also administered. The results demonstrated no left/right asymmetries in the unimodal visual search condition. Both an additional incongruent, as well as a spatially non-informative, auditory cue resulted in lateral asymmetries. Thereby, search times were increased for targets presented in the left compared to the right hemifield. No such spatial asymmetry was observed in the congruent condition. However, participants' performance in the congruent condition was modulated by their tone localisation accuracy. The findings of the present study demonstrate that spatial asymmetries in multisensory processing depend on the validity of the cross-modal cues, and occur under specific attentional conditions, i.e., when
Tactical decisions for changeable cuttlefish camouflage: visual cues for choosing masquerade are relevant from a greater distance than visual cues used for background matching.

Science.gov (United States)

Buresch, Kendra C; Ulmer, Kimberly M; Cramer, Corinne; McAnulty, Sarah; Davison, William; Mäthger, Lydia M; Hanlon, Roger T

2015-10-01

Cuttlefish use multiple camouflage tactics to evade their predators. Two common tactics are background matching (resembling the background to hinder detection) and masquerade (resembling an uninteresting or inanimate object to impede detection or recognition). We investigated how the distance and orientation of visual stimuli affected the choice of these two camouflage tactics. In the current experiments, cuttlefish were presented with three visual cues: 2D horizontal floor, 2D vertical wall, and 3D object. Each was placed at several distances: directly beneath (in a circle whose diameter was one body length (BL); at zero BL [(0BL); i.e., directly beside, but not beneath the cuttlefish]; at 1BL; and at 2BL. Cuttlefish continued to respond to 3D visual cues from a greater distance than to a horizontal or vertical stimulus. It appears that background matching is chosen when visual cues are relevant only in the immediate benthic surroundings. However, for masquerade, objects located multiple body lengths away remained relevant for choice of camouflage. © 2015 Marine Biological Laboratory.
Integration of auditory and visual speech information

NARCIS (Netherlands)

Hall, M.; Smeele, P.M.T.; Kuhl, P.K.

1998-01-01

The integration of auditory and visual speech is observed when modes specify different places of articulation. Influences of auditory variation on integration were examined using consonant identifi-cation, plus quality and similarity ratings. Auditory identification predicted auditory-visual
The use of auditory and visual context in speech perception by listeners with normal hearing and listeners with cochlear implants

Directory of Open Access Journals (Sweden)

Matthew eWinn

2013-11-01

Full Text Available There is a wide range of acoustic and visual variability across different talkers and different speaking contexts. Listeners with normal hearing accommodate that variability in ways that facilitate efficient perception, but it is not known whether listeners with cochlear implants can do the same. In this study, listeners with normal hearing (NH and listeners with cochlear implants (CIs were tested for accommodation to auditory and visual phonetic contexts created by gender-driven speech differences as well as vowel coarticulation and lip rounding in both consonants and vowels. Accommodation was measured as the shifting of perceptual boundaries between /s/ and /ʃ/ sounds in various contexts, as modeled by mixed-effects logistic regression. Owing to the spectral contrasts thought to underlie these context effects, CI listeners were predicted to perform poorly, but showed considerable success. Listeners with cochlear implants not only showed sensitivity to auditory cues to gender, they were also able to use visual cues to gender (i.e. faces as a supplement or proxy for information in the acoustic domain, in a pattern that was not observed for listeners with normal hearing. Spectrally-degraded stimuli heard by listeners with normal hearing generally did not elicit strong context effects, underscoring the limitations of noise vocoders and/or the importance of experience with electric hearing. Visual cues for consonant lip rounding and vowel lip rounding were perceived in a manner consistent with coarticulation and were generally used more heavily by listeners with CIs. Results suggest that listeners with cochlear implants are able to accommodate various sources of acoustic variability either by attending to appropriate acoustic cues or by inferring them via the visual signal.
Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech

Directory of Open Access Journals (Sweden)

Matthew ePoon

2015-11-01

Full Text Available Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound happier than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here we describe a novel, score-based exploration of the use of pitch height and timing in a set of balanced major and minor key compositions. Our corpus contained all 24 Preludes and 24 Fugues from Bach’s Well Tempered Clavier (book 1, as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor and key chroma (A, B, C, etc.. Consistent with predictions derived from speech, we found major-key (nominally happy pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally sad pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post-hoc analyses illustrate interesting trade-offs, with
Effects of audio-visual aids on foreign language test anxiety, reading and listening comprehension, and retention in EFL learners.

Science.gov (United States)

Lee, Shu-Ping; Lee, Shin-Da; Liao, Yuan-Lin; Wang, An-Chi

2015-04-01

This study examined the effects of audio-visual aids on anxiety, comprehension test scores, and retention in reading and listening to short stories in English as a Foreign Language (EFL) classrooms. Reading and listening tests, general and test anxiety, and retention were measured in English-major college students in an experimental group with audio-visual aids (n=83) and a control group without audio-visual aids (n=94) with similar general English proficiency. Lower reading test anxiety, unchanged reading comprehension scores, and better reading short-term and long-term retention after four weeks were evident in the audiovisual group relative to the control group. In addition, lower listening test anxiety, higher listening comprehension scores, and unchanged short-term and long-term retention were found in the audiovisual group relative to the control group after the intervention. Audio-visual aids may help to reduce EFL learners' listening test anxiety and enhance their listening comprehension scores without facilitating retention of such materials. Although audio-visual aids did not increase reading comprehension scores, they helped reduce EFL learners' reading test anxiety and facilitated retention of reading materials.
Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension

Science.gov (United States)

Drijvers, Linda; Ozyurek, Asli

2017-01-01

Purpose: This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method:…
An Annotated Guide to Audio-Visual Materials for Teaching Shakespeare.

Science.gov (United States)

Albert, Richard N.

Audio-visual materials, found in a variety of periodicals, catalogs, and reference works, are listed in this guide to expedite the process of finding appropriate classroom materials for a study of William Shakespeare in the classroom. Separate listings of films, filmstrips, and recordings are provided, with subdivisions for "The Plays"…
Saccade frequency response to visual cues during gait in Parkinson's disease: the selective role of attention.

Science.gov (United States)

Stuart, Samuel; Lord, Sue; Galna, Brook; Rochester, Lynn

2018-04-01

Gait impairment is a core feature of Parkinson's disease (PD) with implications for falls risk. Visual cues improve gait in PD, but the underlying mechanisms are unclear. Evidence suggests that attention and vision play an important role; however, the relative contribution from each is unclear. Measurement of visual exploration (specifically saccade frequency) during gait allows for real-time measurement of attention and vision. Understanding how visual cues influence visual exploration may allow inferences of the underlying mechanisms to response which could help to develop effective therapeutics. This study aimed to examine saccade frequency during gait in response to a visual cue in PD and older adults and investigate the roles of attention and vision in visual cue response in PD. A mobile eye-tracker measured saccade frequency during gait in 55 people with PD and 32 age-matched controls. Participants walked in a straight line with and without a visual cue (50 cm transverse lines) presented under single task and dual-task (concurrent digit span recall). Saccade frequency was reduced when walking in PD compared to controls; however, visual cues ameliorated saccadic deficit. Visual cues significantly increased saccade frequency in both PD and controls under both single task and dual-task. Attention rather than visual function was central to saccade frequency and gait response to visual cues in PD. In conclusion, this study highlights the impact of visual cues on visual exploration when walking and the important role of attention in PD. Understanding these complex features will help inform intervention development. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Audio-Visual Processing in Meetings: Seven Questions and Current AMI Answers

NARCIS (Netherlands)

Al-Hamas, Marc; Hain, Thomas; Cernocky, Jan; Schreiber, Sascha; Poel, Mannes; Rienks, R.J.

2007-01-01

The project Augmented Multi-party Interaction (AMI) is concerned with the development of meeting browsers and remote meeting assistants for instrumented meeting rooms – and the required component technologies R&D themes: group dynamics, audio, visual, and multimodal processing, content abstraction,
Speech to Text Translation for Malay Language

Science.gov (United States)

Al-khulaidi, Rami Ali; Akmeliawati, Rini

2017-11-01

The speech recognition system is a front end and a back-end process that receives an audio signal uttered by a speaker and converts it into a text transcription. The speech system can be used in several fields including: therapeutic technology, education, social robotics and computer entertainments. In most cases in control tasks, which is the purpose of proposing our system, wherein the speed of performance and response concern as the system should integrate with other controlling platforms such as in voiced controlled robots. Therefore, the need for flexible platforms, that can be easily edited to jibe with functionality of the surroundings, came to the scene; unlike other software programs that require recording audios and multiple training for every entry such as MATLAB and Phoenix. In this paper, a speech recognition system for Malay language is implemented using Microsoft Visual Studio C#. 90 (ninety) Malay phrases were tested by 10 (ten) speakers from both genders in different contexts. The result shows that the overall accuracy (calculated from Confusion Matrix) is satisfactory as it is 92.69%.
Activation and Functional Connectivity of the Left Inferior Temporal Gyrus during Visual Speech Priming in Healthy Listeners and Listeners with Schizophrenia.

Science.gov (United States)

Wu, Chao; Zheng, Yingjun; Li, Juanhua; Zhang, Bei; Li, Ruikeng; Wu, Haibo; She, Shenglin; Liu, Sha; Peng, Hongjun; Ning, Yuping; Li, Liang

2017-01-01

Under a "cocktail-party" listening condition with multiple-people talking, compared to healthy people, people with schizophrenia benefit less from the use of visual-speech (lipreading) priming (VSP) cues to improve speech recognition. The neural mechanisms underlying the unmasking effect of VSP remain unknown. This study investigated the brain substrates underlying the unmasking effect of VSP in healthy listeners and the schizophrenia-induced changes in the brain substrates. Using functional magnetic resonance imaging, brain activation and functional connectivity for the contrasts of the VSP listening condition vs. the visual non-speech priming (VNSP) condition were examined in 16 healthy listeners (27.4 ± 8.6 years old, 9 females and 7 males) and 22 listeners with schizophrenia (29.0 ± 8.1 years old, 8 females and 14 males). The results showed that in healthy listeners, but not listeners with schizophrenia, the VSP-induced activation (against the VNSP condition) of the left posterior inferior temporal gyrus (pITG) was significantly correlated with the VSP-induced improvement in target-speech recognition against speech masking. Compared to healthy listeners, listeners with schizophrenia showed significantly lower VSP-induced activation of the left pITG and reduced functional connectivity of the left pITG with the bilateral Rolandic operculum, bilateral STG, and left insular. Thus, the left pITG and its functional connectivity may be the brain substrates related to the unmasking effect of VSP, assumedly through enhancing both the processing of target visual-speech signals and the inhibition of masking-speech signals. In people with schizophrenia, the reduced unmasking effect of VSP on speech recognition may be associated with a schizophrenia-related reduction of VSP-induced activation and functional connectivity of the left pITG.
Modular Sensor Environment : Audio Visual Industry Monitoring Applications

OpenAIRE

Guillot, Calvin

2017-01-01

This work was made for Electro Waves Oy. The company specializes in Audio-visual services and interactive systems. The purpose of this work is to design and implement a modular sensor environment for the company, which will be used for developing automated systems. This thesis begins with an introduction to sensor systems and their different topologies. It is followed by an introduction to the technologies used in this project. The system is divided in three parts. The client, tha...
The Dynamics and Neural Correlates of Audio-Visual Integration Capacity as Determined by Temporal Unpredictability, Proactive Interference, and SOA.

Science.gov (United States)

Wilbiks, Jonathan M P; Dyson, Benjamin J

2016-01-01

Over 5 experiments, we challenge the idea that the capacity of audio-visual integration need be fixed at 1 item. We observe that the conditions under which audio-visual integration is most likely to exceed 1 occur when stimulus change operates at a slow rather than fast rate of presentation and when the task is of intermediate difficulty such as when low levels of proactive interference (3 rather than 8 interfering visual presentations) are combined with the temporal unpredictability of the critical frame (Experiment 2), or, high levels of proactive interference are combined with the temporal predictability of the critical frame (Experiment 4). Neural data suggest that capacity might also be determined by the quality of perceptual information entering working memory. Experiment 5 supported the proposition that audio-visual integration was at play during the previous experiments. The data are consistent with the dynamic nature usually associated with cross-modal binding, and while audio-visual integration capacity likely cannot exceed uni-modal capacity estimates, performance may be better than being able to associate only one visual stimulus with one auditory stimulus.
The development of visual speech perception in Mandarin Chinese-speaking children.

Science.gov (United States)

Chen, Liang; Lei, Jianghua

2017-01-01

The present study aimed to investigate the development of visual speech perception in Chinese-speaking children. Children aged 7, 13 and 16 were asked to visually identify both consonant and vowel sounds in Chinese as quickly and accurately as possible. Results revealed (1) an increase in accuracy of visual speech perception between ages 7 and 13 after which the accuracy rate either stagnates or drops; and (2) a U-shaped development pattern in speed of perception with peak performance in 13-year olds. Results also showed that across all age groups, the overall levels of accuracy rose, whereas the response times fell for simplex finals, complex finals and initials. These findings suggest that (1) visual speech perception in Chinese is a developmental process that is acquired over time and is still fine-tuned well into late adolescence; (2) factors other than cross-linguistic differences in phonological complexity and degrees of reliance on visual information are involved in development of visual speech perception.
Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion

Directory of Open Access Journals (Sweden)

Butko Taras

2011-01-01

Full Text Available Abstract Recently, audio segmentation has attracted research interest because of its usefulness in several applications like audio indexing and retrieval, subtitling, monitoring of acoustic scenes, etc. Moreover, a previous audio segmentation stage may be useful to improve the robustness of speech technologies like automatic speech recognition and speaker diarization. In this article, we present the evaluation of broadcast news audio segmentation systems carried out in the context of the Albayzín-2010 evaluation campaign. That evaluation consisted of segmenting audio from the 3/24 Catalan TV channel into five acoustic classes: music, speech, speech over music, speech over noise, and the other. The evaluation results displayed the difficulty of this segmentation task. In this article, after presenting the database and metric, as well as the feature extraction methods and segmentation techniques used by the submitted systems, the experimental results are analyzed and compared, with the aim of gaining an insight into the proposed solutions, and looking for directions which are promising.
Independent Interactive Inquiry-Based Learning Modules Using Audio-Visual Instruction In Statistics

OpenAIRE

McDaniel, Scott N.; Green, Lisa

2012-01-01

Simulations can make complex ideas easier for students to visualize and understand. It has been shown that guidance in the use of these simulations enhances students’ learning. This paper describes the implementation and evaluation of the Independent Interactive Inquiry-based (I3) Learning Modules, which use existing open-source Java applets, combined with audio-visual instruction. Students are guided to discover and visualize important concepts in post-calculus and algebra-based courses in p...
Audio Recording of Children with Dyslalia

Directory of Open Access Journals (Sweden)

Stefan Gheorghe Pentiuc

2008-01-01

Full Text Available In this paper we present our researches regarding automat parsing of audio recordings. These recordings are obtained from children with dyslalia and are necessary for an accurate identification of speech problems. We develop a software application that helps parsing audio, real time, recordings.
Impact of audio-visual storytelling in simulation learning experiences of undergraduate nursing students.

Science.gov (United States)

Johnston, Sandra; Parker, Christina N; Fox, Amanda

2017-09-01

Use of high fidelity simulation has become increasingly popular in nursing education to the extent that it is now an integral component of most nursing programs. Anecdotal evidence suggests that students have difficulty engaging with simulation manikins due to their unrealistic appearance. Introduction of the manikin as a 'real patient' with the use of an audio-visual narrative may engage students in the simulated learning experience and impact on their learning. A paucity of literature currently exists on the use of audio-visual narratives to enhance simulated learning experiences. This study aimed to determine if viewing an audio-visual narrative during a simulation pre-brief altered undergraduate nursing student perceptions of the learning experience. A quasi-experimental post-test design was utilised. A convenience sample of final year baccalaureate nursing students at a large metropolitan university. Participants completed a modified version of the Student Satisfaction with Simulation Experiences survey. This 12-item questionnaire contained questions relating to the ability to transfer skills learned in simulation to the real clinical world, the realism of the simulation and the overall value of the learning experience. Descriptive statistics were used to summarise demographic information. Two tailed, independent group t-tests were used to determine statistical differences within the categories. Findings indicated that students reported high levels of value, realism and transferability in relation to the viewing of an audio-visual narrative. Statistically significant results (t=2.38, psimulation to clinical practice. The subgroups of age and gender although not significant indicated some interesting results. High satisfaction with simulation was indicated by all students in relation to value and realism. There was a significant finding in relation to transferability on knowledge and this is vital to quality educational outcomes. Copyright © 2017. Published by

Tracing Trajectories of Audio-Visual Learning in the Infant Brain

Science.gov (United States)

Kersey, Alyssa J.; Emberson, Lauren L.

2017-01-01

Although infants begin learning about their environment before they are born, little is known about how the infant brain changes during learning. Here, we take the initial steps in documenting how the neural responses in the brain change as infants learn to associate audio and visual stimuli. Using functional near-infrared spectroscopy (fNRIS) to…
Spectacular Attractions: Museums, Audio-Visuals and the Ghosts of Memory

Directory of Open Access Journals (Sweden)

Mandelli Elisa

2015-12-01

Full Text Available In the last decades, moving images have become a common feature not only in art museums, but also in a wide range of institutions devoted to the conservation and transmission of memory. This paper focuses on the role of audio-visuals in the exhibition design of history and memory museums, arguing that they are privileged means to achieve the spectacular effects and the visitors’ emotional and “experiential” engagement that constitute the main objective of contemporary museums. I will discuss this topic through the concept of “cinematic attraction,” claiming that when embedded in displays, films and moving images often produce spectacular mises en scène with immersive effects, creating wonder and astonishment, and involving visitors on an emotional, visceral and physical level. Moreover, I will consider the diffusion of audio-visual witnesses of real or imaginary historical characters, presented in Phantasmagoria-like displays that simulate ghostly and uncanny apparitions, creating an ambiguous and often problematic coexistence of truth and illusion, subjectivity and objectivity, facts and imagination.
Finding the Correspondence of Audio-Visual Events by Object Manipulation

Science.gov (United States)

Nishibori, Kento; Takeuchi, Yoshinori; Matsumoto, Tetsuya; Kudo, Hiroaki; Ohnishi, Noboru

A human being understands the objects in the environment by integrating information obtained by the senses of sight, hearing and touch. In this integration, active manipulation of objects plays an important role. We propose a method for finding the correspondence of audio-visual events by manipulating an object. The method uses the general grouping rules in Gestalt psychology, i.e. “simultaneity” and “similarity” among motion command, sound onsets and motion of the object in images. In experiments, we used a microphone, a camera, and a robot which has a hand manipulator. The robot grasps an object like a bell and shakes it or grasps an object like a stick and beat a drum in a periodic, or non-periodic motion. Then the object emits periodical/non-periodical events. To create more realistic scenario, we put other event source (a metronome) in the environment. As a result, we had a success rate of 73.8 percent in finding the correspondence between audio-visual events (afferent signal) which are relating to robot motion (efferent signal).
A Joint Audio-Visual Approach to Audio Localization

DEFF Research Database (Denmark)

Jensen, Jesper Rindom; Christensen, Mads Græsbøll

2015-01-01

Localization of audio sources is an important research problem, e.g., to facilitate noise reduction. In the recent years, the problem has been tackled using distributed microphone arrays (DMA). A common approach is to apply direction-of-arrival (DOA) estimation on each array (denoted as nodes), a...... time-of-flight cameras. Moreover, we propose an optimal method for weighting such DOA and range information for audio localization. Our experiments on both synthetic and real data show that there is a clear, potential advantage of using the joint audiovisual localization framework....
High-Order Sparse Linear Predictors for Audio Processing

DEFF Research Database (Denmark)

Giacobello, Daniele; van Waterschoot, Toon; Christensen, Mads Græsbøll

2010-01-01

Linear prediction has generally failed to make a breakthrough in audio processing, as it has done in speech processing. This is mostly due to its poor modeling performance, since an audio signal is usually an ensemble of different sources. Nevertheless, linear prediction comes with a whole set...... of interesting features that make the idea of using it in audio processing not far fetched, e.g., the strong ability of modeling the spectral peaks that play a dominant role in perception. In this paper, we provide some preliminary conjectures and experiments on the use of high-order sparse linear predictors...... in audio processing. These predictors, successfully implemented in modeling the short-term and long-term redundancies present in speech signals, will be used to model tonal audio signals, both monophonic and polyphonic. We will show how the sparse predictors are able to model efﬁciently the different...
Influence of combined visual and vestibular cues on human perception and control of horizontal rotation

Science.gov (United States)

Zacharias, G. L.; Young, L. R.

1981-01-01

Measurements are made of manual control performance in the closed-loop task of nulling perceived self-rotation velocity about an earth-vertical axis. Self-velocity estimation is modeled as a function of the simultaneous presentation of vestibular and peripheral visual field motion cues. Based on measured low-frequency operator behavior in three visual field environments, a parallel channel linear model is proposed which has separate visual and vestibular pathways summing in a complementary manner. A dual-input describing function analysis supports the complementary model; vestibular cues dominate sensation at higher frequencies. The describing function model is extended by the proposal of a nonlinear cue conflict model, in which cue weighting depends on the level of agreement between visual and vestibular cues.
The Dynamics and Neural Correlates of Audio-Visual Integration Capacity as Determined by Temporal Unpredictability, Proactive Interference, and SOA.

Directory of Open Access Journals (Sweden)

Jonathan M P Wilbiks

Full Text Available Over 5 experiments, we challenge the idea that the capacity of audio-visual integration need be fixed at 1 item. We observe that the conditions under which audio-visual integration is most likely to exceed 1 occur when stimulus change operates at a slow rather than fast rate of presentation and when the task is of intermediate difficulty such as when low levels of proactive interference (3 rather than 8 interfering visual presentations are combined with the temporal unpredictability of the critical frame (Experiment 2, or, high levels of proactive interference are combined with the temporal predictability of the critical frame (Experiment 4. Neural data suggest that capacity might also be determined by the quality of perceptual information entering working memory. Experiment 5 supported the proposition that audio-visual integration was at play during the previous experiments. The data are consistent with the dynamic nature usually associated with cross-modal binding, and while audio-visual integration capacity likely cannot exceed uni-modal capacity estimates, performance may be better than being able to associate only one visual stimulus with one auditory stimulus.
Haptic Cues Used for Outdoor Wayfinding by Individuals with Visual Impairments

Science.gov (United States)

Koutsoklenis, Athanasios; Papadopoulos, Konstantinos

2014-01-01

Introduction: The study presented here examines which haptic cues individuals with visual impairments use more frequently and determines which of these cues are deemed by these individuals to be the most important for way-finding in urban environments. It also investigates the ways in which these haptic cues are used by individuals with visual…
Multisensory training can promote or impede visual perceptual learning of speech stimuli: visual-tactile vs. visual-auditory training.

Science.gov (United States)

Eberhardt, Silvio P; Auer, Edward T; Bernstein, Lynne E

2014-01-01

In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee's primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee's lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT).
When Meaning Is Not Enough: Distributional and Semantic Cues to Word Categorization in Child Directed Speech.

Science.gov (United States)

Feijoo, Sara; Muñoz, Carmen; Amadó, Anna; Serrat, Elisabet

2017-01-01

One of the most important tasks in first language development is assigning words to their grammatical category. The Semantic Bootstrapping Hypothesis postulates that, in order to accomplish this task, children are guided by a neat correspondence between semantic and grammatical categories, since nouns typically refer to objects and verbs to actions. It is this correspondence that guides children's initial word categorization. Other approaches, on the other hand, suggest that children might make use of distributional cues and word contexts to accomplish the word categorization task. According to such approaches, the Semantic Bootstrapping assumption offers an important limitation, as it might not be true that all the nouns that children hear refer to specific objects or people. In order to explore that, we carried out two studies based on analyses of children's linguistic input. We analyzed child-directed speech addressed to four children under the age of 2;6, taken from the CHILDES database. The corpora were selected from the Manchester corpus. The corpora from the four selected children contained a total of 10,681 word types and 364,196 word tokens. In our first study, discriminant analyses were performed using semantic cues alone. The results show that many of the nouns found in parents' speech do not relate to specific objects and that semantic information alone might not be sufficient for successful word categorization. Given that there must be an additional source of information which, alongside with semantics, might assist young learners in word categorization, our second study explores the availability of both distributional and semantic cues in child-directed speech. Our results confirm that this combination might yield better results for word categorization. These results are in line with theories that suggest the need for an integration of multiple cues from different sources in language development.
Interaction between visual and chemical cues in a Liolaemus lizard: a multimodal approach.

Science.gov (United States)

Vicente, Natalin S; Halloy, Monique

2017-12-01

Multimodal communication involves the use of signals and cues across two or more sensory modalities. The genus Liolaemus (Iguania: Liolaemidae) offers a great potential for studies on the ecology and evolution of multimodal communication, including visual and chemical signals. In this study, we analyzed the response of male and female Liolaemus pacha to chemical, visual and combined (multimodal) stimuli. Using cue-isolation tests, we registered the number of tongue flicks and headbob displays from exposure to signals in each modality. Number of tongue flicks was greater when a chemical stimulus was presented alone than in the presence of visual or multimodal stimuli. In contrast, headbob displays were fewer in number with visual and chemical stimuli alone, but significantly higher in number when combined. Female signallers triggered significantly more tongue flicks than male signallers, suggesting that chemical cues are involved in sexual recognition. We did not find an inhibition between chemical and visual cues. On the contrary, we observed a dominance of the chemical modality, because when presented with visual stimuli, lizards also responded with more tongue flicks than headbob displays. The total response produced by multimodal stimuli was similar to that of the chemical stimuli alone, possibly suggesting non-redundancy. We discuss whether the visual component of a multimodal signal could attract attention at a distance, increasing the effectiveness of transmission and reception of the information in chemical cues. Copyright © 2017 Elsevier GmbH. All rights reserved.
Visual Attention in Flies-Dopamine in the Mushroom Bodies Mediates the After-Effect of Cueing.

Science.gov (United States)

Koenig, Sebastian; Wolf, Reinhard; Heisenberg, Martin

2016-01-01

Visual environments may simultaneously comprise stimuli of different significance. Often such stimuli require incompatible responses. Selective visual attention allows an animal to respond exclusively to the stimuli at a certain location in the visual field. In the process of establishing its focus of attention the animal can be influenced by external cues. Here we characterize the behavioral properties and neural mechanism of cueing in the fly Drosophila melanogaster. A cue can be attractive, repulsive or ineffective depending upon (e.g.) its visual properties and location in the visual field. Dopamine signaling in the brain is required to maintain the effect of cueing once the cue has disappeared. Raising or lowering dopamine at the synapse abolishes this after-effect. Specifically, dopamine is necessary and sufficient in the αβ-lobes of the mushroom bodies. Evidence is provided for an involvement of the αβposterior Kenyon cells.
Real-Time Lane Detection on Suburban Streets Using Visual Cue Integration

Directory of Open Access Journals (Sweden)

Shehan Fernando

2014-04-01

Full Text Available The detection of lane boundaries on suburban streets using images obtained from video constitutes a challenging task. This is mainly due to the difficulties associated with estimating the complex geometric structure of lane boundaries, the quality of lane markings as a result of wear, occlusions by traffic, and shadows caused by road-side trees and structures. Most of the existing techniques for lane boundary detection employ a single visual cue and will only work under certain conditions and where there are clear lane markings. Also, better results are achieved when there are no other on-road objects present. This paper extends our previous work and discusses a novel lane boundary detection algorithm specifically addressing the abovementioned issues through the integration of two visual cues. The first visual cue is based on stripe-like features found on lane lines extracted using a two-dimensional symmetric Gabor filter. The second visual cue is based on a texture characteristic determined using the entropy measure of the predefined neighbourhood around a lane boundary line. The visual cues are then integrated using a rule-based classifier which incorporates a modified sequential covering algorithm to improve robustness. To separate lane boundary lines from other similar features, a road mask is generated using road chromaticity values estimated from CIE L*a*b* colour transformation. Extraneous points around lane boundary lines are then removed by an outlier removal procedure based on studentized residuals. The lane boundary lines are then modelled with Bezier spline curves. To validate the algorithm, extensive experimental evaluation was carried out on suburban streets and the results are presented.
Subconscious visual cues during movement execution allow correct online choice reactions

DEFF Research Database (Denmark)

Leukel, Christian; Lundbye-Jensen, Jesper; Christensen, Mark Schram

2012-01-01

Part of the sensory information is processed by our central nervous system without conscious perception. Subconscious processing has been shown to be capable of triggering motor reactions. In the present study, we asked the question whether visual information, which is not consciously perceived......, could influence decision-making in a choice reaction task. Ten healthy subjects (28±5 years) executed two different experimental protocols. In the Motor reaction protocol, a visual target cue was shown on a computer screen. Depending on the displayed cue, subjects had to either complete a reaching....... This second protocol tested for conscious perception of the visual cue. The results of this study show that subjects achieved significantly more correct responses in the Motor reaction protocol than in the Verbalization protocol. This difference was only observed at the very short display durations...
Reinforcing Visual Grouping Cues to Communicate Complex Informational Structure.

Science.gov (United States)

Bae, Juhee; Watson, Benjamin

2014-12-01

In his book Multimedia Learning [7], Richard Mayer asserts that viewers learn best from imagery that provides them with cues to help them organize new information into the correct knowledge structures. Designers have long been exploiting the Gestalt laws of visual grouping to deliver viewers those cues using visual hierarchy, often communicating structures much more complex than the simple organizations studied in psychological research. Unfortunately, designers are largely practical in their work, and have not paused to build a complex theory of structural communication. If we are to build a tool to help novices create effective and well structured visuals, we need a better understanding of how to create them. Our work takes a first step toward addressing this lack, studying how five of the many grouping cues (proximity, color similarity, common region, connectivity, and alignment) can be effectively combined to communicate structured text and imagery from real world examples. To measure the effectiveness of this structural communication, we applied a digital version of card sorting, a method widely used in anthropology and cognitive science to extract cognitive structures. We then used tree edit distance to measure the difference between perceived and communicated structures. Our most significant findings are: 1) with careful design, complex structure can be communicated clearly; 2) communicating complex structure is best done with multiple reinforcing grouping cues; 3) common region (use of containers such as boxes) is particularly effective at communicating structure; and 4) alignment is a weak structural communicator.
Intensive treatment with ultrasound visual feedback for speech sound errors in childhood apraxia

Directory of Open Access Journals (Sweden)

Jonathan L Preston

2016-08-01

Full Text Available Ultrasound imaging is an adjunct to traditional speech therapy that has shown to be beneficial in the remediation of speech sound errors. Ultrasound biofeedback can be utilized during therapy to provide clients additional knowledge about their tongue shapes when attempting to produce sounds that are in error. The additional feedback may assist children with childhood apraxia of speech in stabilizing motor patterns, thereby facilitating more consistent and accurate productions of sounds and syllables. However, due to its specialized nature, ultrasound visual feedback is a technology that is not widely available to clients. Short-term intensive treatment programs are one option that can be utilized to expand access to ultrasound biofeedback. Schema-based motor learning theory suggests that short-term intensive treatment programs (massed practice may assist children in acquiring more accurate motor patterns. In this case series, three participants ages 10-14 diagnosed with childhood apraxia of speech attended 16 hours of speech therapy over a two-week period to address residual speech sound errors. Two participants had distortions on rhotic sounds, while the third participant demonstrated lateralization of sibilant sounds. During therapy, cues were provided to assist participants in obtaining a tongue shape that facilitated a correct production of the erred sound. Additional practice without ultrasound was also included. Results suggested that all participants showed signs of acquisition of sounds in error. Generalization and retention results were mixed. One participant showed generalization and retention of sounds that were treated; one showed generalization but limited retention; and the third showed no evidence of generalization or retention. Individual characteristics that may facilitate generalization are discussed. Short-term intensive treatment programs using ultrasound biofeedback may result in the acquisition of more accurate motor
Audio feature extraction using probability distribution function

Science.gov (United States)

Suhaib, A.; Wan, Khairunizam; Aziz, Azri A.; Hazry, D.; Razlan, Zuradzman M.; Shahriman A., B.

2015-05-01

Voice recognition has been one of the popular applications in robotic field. It is also known to be recently used for biometric and multimedia information retrieval system. This technology is attained from successive research on audio feature extraction analysis. Probability Distribution Function (PDF) is a statistical method which is usually used as one of the processes in complex feature extraction methods such as GMM and PCA. In this paper, a new method for audio feature extraction is proposed which is by using only PDF as a feature extraction method itself for speech analysis purpose. Certain pre-processing techniques are performed in prior to the proposed feature extraction method. Subsequently, the PDF result values for each frame of sampled voice signals obtained from certain numbers of individuals are plotted. From the experimental results obtained, it can be seen visually from the plotted data that each individuals' voice has comparable PDF values and shapes.
Attention to affective audio-visual information: Comparison between musicians and non-musicians

NARCIS (Netherlands)

Weijkamp, J.; Sadakata, M.

2017-01-01

Individuals with more musical training repeatedly demonstrate enhanced auditory perception abilities. The current study examined how these enhanced auditory skills interact with attention to affective audio-visual stimuli. A total of 16 participants with more than 5 years of musical training
Attitude of medical students towards the use of audio visual aids during didactic lectures in pharmacology in a medical college of central India

OpenAIRE

Mehul Agrawal; Rajanish Kumar Sankdia

2016-01-01

Background: Students favour teaching methods employing audio visual aids over didactic lectures not using these aids. However, the optimum use of audio visual aids is essential for deriving their benefits. During a lecture, both the visual and auditory senses are used to absorb information. Different methods of lecture are and ndash; chalk and board, power point presentations (PPT) and mix of aids. This study was done to know the students' preference regarding the various audio visual aids, ...
How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

OpenAIRE

Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

2014-01-01

Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combi...

A chimpanzee recognizes synthetic speech with significantly reduced acoustic cues to phonetic content.

Science.gov (United States)

Heimbauer, Lisa A; Beran, Michael J; Owren, Michael J

2011-07-26

A long-standing debate concerns whether humans are specialized for speech perception, which some researchers argue is demonstrated by the ability to understand synthetic speech with significantly reduced acoustic cues to phonetic content. We tested a chimpanzee (Pan troglodytes) that recognizes 128 spoken words, asking whether she could understand such speech. Three experiments presented 48 individual words, with the animal selecting a corresponding visuographic symbol from among four alternatives. Experiment 1 tested spectrally reduced, noise-vocoded (NV) synthesis, originally developed to simulate input received by human cochlear-implant users. Experiment 2 tested "impossibly unspeechlike" sine-wave (SW) synthesis, which reduces speech to just three moving tones. Although receiving only intermittent and noncontingent reward, the chimpanzee performed well above chance level, including when hearing synthetic versions for the first time. Recognition of SW words was least accurate but improved in experiment 3 when natural words in the same session were rewarded. The chimpanzee was more accurate with NV than SW versions, as were 32 human participants hearing these items. The chimpanzee's ability to spontaneously recognize acoustically reduced synthetic words suggests that experience rather than specialization is critical for speech-perception capabilities that some have suggested are uniquely human. Copyright © 2011 Elsevier Ltd. All rights reserved.
Robust Speech/Non-Speech Classification in Heterogeneous Multimedia Content

NARCIS (Netherlands)

Huijbregts, M.A.H.; de Jong, Franciska M.G.

In this paper we present a speech/non-speech classification method that allows high quality classification without the need to know in advance what kinds of audible non-speech events are present in an audio recording and that does not require a single parameter to be tuned on in-domain data. Because
Non-fluent speech following stroke is caused by impaired efference copy.

Science.gov (United States)

Feenaughty, Lynda; Basilakos, Alexandra; Bonilha, Leonardo; den Ouden, Dirk-Bart; Rorden, Chris; Stark, Brielle; Fridriksson, Julius

2017-09-01

Efference copy is a cognitive mechanism argued to be critical for initiating and monitoring speech: however, the extent to which breakdown of efference copy mechanisms impact speech production is unclear. This study examined the best mechanistic predictors of non-fluent speech among 88 stroke survivors. Objective speech fluency measures were subjected to a principal component analysis (PCA). The primary PCA factor was then entered into a multiple stepwise linear regression analysis as the dependent variable, with a set of independent mechanistic variables. Participants' ability to mimic audio-visual speech ("speech entrainment response") was the best independent predictor of non-fluent speech. We suggest that this "speech entrainment" factor reflects integrity of internal monitoring (i.e., efference copy) of speech production, which affects speech initiation and maintenance. Results support models of normal speech production and suggest that therapy focused on speech initiation and maintenance may improve speech fluency for individuals with chronic non-fluent aphasia post stroke.
Generating physical symptoms from visual cues: An experimental study.

Science.gov (United States)

Ogden, Jane; Zoukas, Serafim

2009-12-01

This experimental study explored whether the physical symptoms of cold, pain and itchiness could be generated by visual cues, whether they varied in the ease with which they could be generated and whether they were related to negative affect. Participants were randomly allocated by group to watch one of three videos relating to cold (e.g. ice, snow, wind), pain (e.g. sporting injuries, tattoos) or itchiness (e.g. head lice, scratching). They then rated their self-reported symptoms of cold, pain and itchiness as well as their negative affect (depression and anxiety). The researcher recorded their observed behaviour relating to these symptoms. The results showed that the interventions were successful and that all three symptoms could be generated by the visual cues in terms of both self-report and observed behaviour. In addition, the pain video generated higher levels of anxiety and depression than the other two videos. Further, the degree of itchiness was related to the degree of anxiety. This symptom onset process also showed variability between symptoms with self-reported cold symptoms being greater than either pain or itchy symptoms. The results show that physical symptoms can be generated by visual cues indicating that psychological factors are not only involved in symptom perception but also in symptom onset.
Correspondence between audio and visual deep models for musical instrument detection in video recordings

OpenAIRE

Slizovskaia, Olga; Gómez, Emilia; Haro, Gloria

2017-01-01

This work aims at investigating cross-modal connections between audio and video sources in the task of musical instrument recognition. We also address in this work the understanding of the representations learned by convolutional neural networks (CNNs) and we study feature correspondence between audio and visual components of a multimodal CNN architecture. For each instrument category, we select the most activated neurons and investigate exist- ing cross-correlations between neurons from the ...
A centralized audio presentation manager

Energy Technology Data Exchange (ETDEWEB)

Papp, A.L. III; Blattner, M.M.

1994-05-16

The centralized audio presentation manager addresses the problems which occur when multiple programs running simultaneously attempt to use the audio output of a computer system. Time dependence of sound means that certain auditory messages must be scheduled simultaneously, which can lead to perceptual problems due to psychoacoustic phenomena. Furthermore, the combination of speech and nonspeech audio is examined; each presents its own problems of perceptibility in an acoustic environment composed of multiple auditory streams. The centralized audio presentation manager receives abstract parameterized message requests from the currently running programs, and attempts to create and present a sonic representation in the most perceptible manner through the use of a theoretically and empirically designed rule set.
Automaticity of phasic alertness: Evidence for a three-component model of visual cueing.

Science.gov (United States)

Lin, Zhicheng; Lu, Zhong-Lin

2016-10-01

The automaticity of phasic alertness is investigated using the attention network test. Results show that the cueing effect from the alerting cue-double cue-is strongly enhanced by the task relevance of visual cues, as determined by the informativeness of the orienting cue-single cue-that is being mixed (80 % vs. 50 % valid in predicting where the target will appear). Counterintuitively, the cueing effect from the alerting cue can be negatively affected by its visibility, such that masking the cue from awareness can reveal a cueing effect that is otherwise absent when the cue is visible. Evidently, then, top-down influences-in the form of contextual relevance and cue awareness-can have opposite influences on the cueing effect from the alerting cue. These findings lead us to the view that a visual cue can engage three components of attention-orienting, alerting, and inhibition-to determine the behavioral cueing effect. We propose that phasic alertness, particularly in the form of specific response readiness, is regulated by both internal, top-down expectation and external, bottom-up stimulus properties. In contrast to some existing views, we advance the perspective that phasic alertness is strongly tied to temporal orienting, attentional capture, and spatial orienting. Finally, we discuss how translating attention research to clinical applications would benefit from an improved ability to measure attention. To this end, controlling the degree of intraindividual variability in the attentional components and improving the precision of the measurement tools may prove vital.
Determining the Effectiveness of Visual Input Enhancement across Multiple Linguistic Cues

Science.gov (United States)

Comeaux, Ian; McDonald, Janet L.

2018-01-01

Visual input enhancement (VIE) increases the salience of grammatical forms, potentially facilitating acquisition through attention mechanisms. Native English speakers were exposed to an artificial language containing four linguistic cues (verb agreement, case marking, animacy, word order), with morphological cues either unmarked, marked in the…
Retrospective cues based on object features improve visual working memory performance in older adults.

Science.gov (United States)

Gilchrist, Amanda L; Duarte, Audrey; Verhaeghen, Paul

2016-01-01

Research with younger adults has shown that retrospective cues can be used to orient top-down attention toward relevant items in working memory. We examined whether older adults could take advantage of these cues to improve memory performance. Younger and older adults were presented with visual arrays of five colored shapes; during maintenance, participants were presented either with an informative cue based on an object feature (here, object shape or color) that would be probed, or with an uninformative, neutral cue. Although older adults were less accurate overall, both age groups benefited from the presentation of an informative, feature-based cue relative to a neutral cue. Surprisingly, we also observed differences in the effectiveness of shape versus color cues and their effects upon post-cue memory load. These results suggest that older adults can use top-down attention to remove irrelevant items from visual working memory, provided that task-relevant features function as cues.
The reliability of retro-cues determines the fate of noncued visual working memory representations

NARCIS (Netherlands)

Günseli, E.; van Moorselaar, D.; Meeter, M.; Olivers, C.N.L.

2015-01-01

Retrospectively cueing an item retained in visual working memory during maintenance is known to improve its retention. However, studies have provided conflicting results regarding the costs of such retro-cues for the noncued items, leading to different theories on the mechanisms behind visual
Concurrent audio-visual feedback for supporting drivers at intersections : a study using two linked driving simulators.

NARCIS (Netherlands)

Houtenbos, M. Winter, J.C.F. de Hale, A.R. Wieringa, P.A. & Hagenzieker, M.P.

2016-01-01

A large portion of road traffic crashes occur at intersections for the reason that drivers lack necessary visual information. This research examined the effects of an audio-visual display that provides real-time sonification and visualization of the speed and direction of another car approaching the
Applying Spatial Audio to Human Interfaces: 25 Years of NASA Experience

Science.gov (United States)

Begault, Durand R.; Wenzel, Elizabeth M.; Godfrey, Martine; Miller, Joel D.; Anderson, Mark R.

2010-01-01

From the perspective of human factors engineering, the inclusion of spatial audio within a human-machine interface is advantageous from several perspectives. Demonstrated benefits include the ability to monitor multiple streams of speech and non-speech warning tones using a cocktail party advantage, and for aurally-guided visual search. Other potential benefits include the spatial coordination and interaction of multimodal events, and evaluation of new communication technologies and alerting systems using virtual simulation. Many of these technologies were developed at NASA Ames Research Center, beginning in 1985. This paper reviews examples and describes the advantages of spatial sound in NASA-related technologies, including space operations, aeronautics, and search and rescue. The work has involved hardware and software development as well as basic and applied research.
Markers of Deception in Italian Speech

Directory of Open Access Journals (Sweden)

Katelyn eSpence

2012-10-01

Full Text Available Lying is a universal activity and the detection of lying a universal concern. Presently, there is great interest in determining objective measures of deception. The examination of speech, in particular, holds promise in this regard; yet, most of what we know about the relationship between speech and lying is based on the assessment of English-speaking participants. Few studies have examined indicators of deception in languages other than English. The world’s languages differ in significant ways, and cross-linguistic studies of deceptive communications are a research imperative. Here we review some of these differences amongst the world’s languages, and provide an overview of a number of recent studies demonstrating that cross-linguistic research is a worthwhile endeavour. In addition, we report the results of an empirical investigation of pitch, response latency, and speech rate as cues to deception in Italian speech. True and false opinions were elicited in an audio-taped interview. A within subjects analysis revealed no significant difference between the average pitch of the two conditions; however, speech rate was significantly slower, while response latency was longer, during deception compared with truth-telling. We explore the implications of these findings and propose directions for future research, with the aim of expanding the cross-linguistic branch of research on markers of deception.
Real-time speech-driven animation of expressive talking faces

Science.gov (United States)

Liu, Jia; You, Mingyu; Chen, Chun; Song, Mingli

2011-05-01

In this paper, we present a real-time facial animation system in which speech drives mouth movements and facial expressions synchronously. Considering five basic emotions, a hierarchical structure with an upper layer of emotion classification is established. Based on the recognized emotion label, the under-layer classification at sub-phonemic level has been modelled on the relationship between acoustic features of frames and audio labels in phonemes. Using certain constraint, the predicted emotion labels of speech are adjusted to gain the facial expression labels which are combined with sub-phonemic labels. The combinations are mapped into facial action units (FAUs), and audio-visual synchronized animation with mouth movements and facial expressions is generated by morphing between FAUs. The experimental results demonstrate that the two-layer structure succeeds in both emotion and sub-phonemic classifications, and the synthesized facial sequences reach a comparative convincing quality.
Effect of Nicotine on Audio and Visual Reaction Time in Dipping ...

African Journals Online (AJOL)

Nicotine through blood is harmful and as there are fewer studies in India with respect to nicotines influence on reaction time especially in the smokeless tobacco users we studied this. Reaction time is a measure of the sensorimotor integration in a person. We used a PC 1000 Hz reaction timer to record the audio and visual ...
Visual attention to food cues in obesity: an eye-tracking study.

Science.gov (United States)

Doolan, Katy J; Breslin, Gavin; Hanna, Donncha; Murphy, Kate; Gallagher, Alison M

2014-12-01

Based on the theory of incentive sensitization, the aim of this study was to investigate differences in attentional processing of food-related visual cues between normal-weight and overweight/obese males and females. Twenty-six normal-weight (14M, 12F) and 26 overweight/obese (14M, 12F) adults completed a visual probe task and an eye-tracking paradigm. Reaction times and eye movements to food and control images were collected during both a fasted and fed condition in a counterbalanced design. Participants had greater visual attention towards high-energy-density food images compared to low-energy-density food images regardless of hunger condition. This was most pronounced in overweight/obese males who had significantly greater maintained attention towards high-energy-density food images when compared with their normal-weight counterparts however no between weight group differences were observed for female participants. High-energy-density food images appear to capture visual attention more readily than low-energy-density food images. Results also suggest the possibility of an altered visual food cue-associated reward system in overweight/obese males. Attentional processing of food cues may play a role in eating behaviors thus should be taken into consideration as part of an integrated approach to curbing obesity. © 2014 The Obesity Society.
Food and conspecific chemical cues modify visual behavior of zebrafish, Danio rerio.

Science.gov (United States)

Stephenson, Jessica F; Partridge, Julian C; Whitlock, Kathleen E

2012-06-01

Animals use the different qualities of olfactory and visual sensory information to make decisions. Ethological and electrophysiological evidence suggests that there is cross-modal priming between these sensory systems in fish. We present the first experimental study showing that ecologically relevant chemical mixtures alter visual behavior, using adult male and female zebrafish, Danio rerio. Neutral-density filters were used to attenuate the light reaching the tank to an initial light intensity of 2.3×10(16) photons/s/m2. Fish were exposed to food cue and to alarm cue. The light intensity was then increased by the removal of one layer of filter (nominal absorbance 0.3) every minute until, after 10 minutes, the light level was 15.5×10(16) photons/s/m2. Adult male and female zebrafish responded to a moving visual stimulus at lower light levels if they had been first exposed to food cue, or to conspecific alarm cue. These results suggest the need for more integrative studies of sensory biology.
Interplay of Gravicentric, Egocentric, and Visual Cues About the Vertical in the Control of Arm Movement Direction.

Science.gov (United States)

Bock, Otmar; Bury, Nils

2018-03-01

Our perception of the vertical corresponds to the weighted sum of gravicentric, egocentric, and visual cues. Here we evaluate the interplay of those cues not for the perceived but rather for the motor vertical. Participants were asked to flip an omnidirectional switch down while their egocentric vertical was dissociated from their visual-gravicentric vertical. Responses were directed mid-between the two verticals; specifically, the data suggest that the relative weight of congruent visual-gravicentric cues averages 0.62, and correspondingly, the relative weight of egocentric cues averages 0.38. We conclude that the interplay of visual-gravicentric cues with egocentric cues is similar for the motor and for the perceived vertical. Unexpectedly, we observed a consistent dependence of the motor vertical on hand position, possibly mediated by hand orientation or by spatial selective attention.
The footprints of visual attention during search with 100% valid and 100% invalid cues.

Science.gov (United States)

Eckstein, Miguel P; Pham, Binh T; Shimozaki, Steven S

2004-06-01

Human performance during visual search typically improves when spatial cues indicate the possible target locations. In many instances, the performance improvement is quantitatively predicted by a Bayesian or quasi-Bayesian observer in which visual attention simply selects the information at the cued locations without changing the quality of processing or sensitivity and ignores the information at the uncued locations. Aside from the general good agreement between the effect of the cue on model and human performance, there has been little independent confirmation that humans are effectively selecting the relevant information. In this study, we used the classification image technique to assess the effectiveness of spatial cues in the attentional selection of relevant locations and suppression of irrelevant locations indicated by spatial cues. Observers searched for a bright target among dimmer distractors that might appear (with 50% probability) in one of eight locations in visual white noise. The possible target location was indicated using a 100% valid box cue or seven 100% invalid box cues in which the only potential target locations was uncued. For both conditions, we found statistically significant perceptual templates shaped as differences of Gaussians at the relevant locations with no perceptual templates at the irrelevant locations. We did not find statistical significant differences between the shapes of the inferred perceptual templates for the 100% valid and 100% invalid cues conditions. The results confirm the idea that during search visual attention allows the observer to effectively select relevant information and ignore irrelevant information. The results for the 100% invalid cues condition suggests that the selection process is not drawn automatically to the cue but can be under the observers' voluntary control.
Memory for location and visual cues in white-eared hummingbirds Hylocharis leucotis

Directory of Open Access Journals (Sweden)

Guillermo PÉREZ, Carlos LARA, José VICCON-PALE, Martha SIGNORET-POILLON

2011-08-01

Full Text Available In nature hummingbirds face floral resources whose availability, quality and quantity can vary spatially and temporally. Thus, they must constantly make foraging decisions about which patches, plants and flowers to visit, partly as a function of the nectar reward. The uncertainty of these decisions would possibly be reduced if an individual could remember locations or use visual cues to avoid revisiting recently depleted flowers. In the present study, we carried out field experiments with white-eared hummingbirds Hylocharis leucotis, to evaluate their use of locations or visual cues when foraging on natural flowers Penstemon roseus. We evaluated the use of spatial memory by observing birds while they were foraging between two plants and within a single plant. Our results showed that hummingbirds prefer to use location when foraging in two plants, but they also use visual cues to efficiently locate unvisited rewarded flowers when they feed on a single plant. However, in absence of visual cues, in both experiments birds mainly used the location of previously visited flowers to make subsequent visits. Our data suggest that hummingbirds are capable of learning and employing this flexibility depending on the faced environmental conditions and the information acquired in previous visits [Current Zoology 57 (4: 468–476, 2011].

Visual form predictions facilitate auditory processing at the N1.

Science.gov (United States)

Paris, Tim; Kim, Jeesun; Davis, Chris

2017-02-20

Auditory-visual (AV) events often involve a leading visual cue (e.g. auditory-visual speech) that allows the perceiver to generate predictions about the upcoming auditory event. Electrophysiological evidence suggests that when an auditory event is predicted, processing is sped up, i.e., the N1 component of the ERP occurs earlier (N1 facilitation). However, it is not clear (1) whether N1 facilitation is based specifically on predictive rather than multisensory integration and (2) which particular properties of the visual cue it is based on. The current experiment used artificial AV stimuli in which visual cues predicted but did not co-occur with auditory cues. Visual form cues (high and low salience) and the auditory-visual pairing were manipulated so that auditory predictions could be based on form and timing or on timing only. The results showed that N1 facilitation occurred only for combined form and temporal predictions. These results suggest that faster auditory processing (as indicated by N1 facilitation) is based on predictive processing generated by a visual cue that clearly predicts both what and when the auditory stimulus will occur. Copyright © 2016. Published by Elsevier Ltd.
Humans tend to walk in circles as directed by memorized visual locations at large distances

OpenAIRE

Consolo, Patricia; Holanda, Humberto C.; Fukusima, Sérgio S.

2014-01-01

Human veering while walking blindfolded or walking straight without any visual cues has been widely studied over the last 100 years, but the results are still controversial. The present study attempted to describe and understand the human ability to maintain the direction of a trajectory while walking without visual or audio cues with reference to a proposed mathematical model and using data collected by a global positioning system (GPS). Fifteen right-handed people of both genders, aged 18-3...
Deaf children's use of clear visual cues in mindreading.

Science.gov (United States)

Hao, Jian; Su, Yanjie

2014-11-01

Previous studies show that typically developing 4-year old children can understand other people's false beliefs but that deaf children of hearing families have difficulty in understanding false beliefs until the age of approximately 13. Because false beliefs are implicit mental states that are not expressed through clear visual cues in standard false belief tasks, the present study examines the hypothesis that the deaf children's developmental delay in understanding false beliefs may reflect their difficulty in understanding a spectrum of mental states that are not expressed through clear visual cues. Nine- to 13-year-old deaf children of hearing families and 4-6-year-old typically developing children completed false belief tasks and emotion recognition tasks under different cue conditions. The results indicated that after controlling for the effect of the children's language abilities, the deaf children inferred other people's false beliefs as accurately as the typically developing children when other people's false beliefs were clearly expressed through their eye-gaze direction. However, the deaf children performed worse than the typically developing children when asked to infer false beliefs with ambiguous or no eye-gaze cues. Moreover, the deaf children were capable of recognizing other people's emotions that were clearly conveyed by their facial or body expressions. The results suggest that although theory-based or simulation-based mental state understanding is typical of hearing children's theory of mind mechanism, for deaf children of hearing families, clear cue-based mental state understanding may be their specific theory of mind mechanism. Copyright © 2014 Elsevier Ltd. All rights reserved.
Exclusively Visual Analysis of Classroom Group Interactions

Science.gov (United States)

Tucker, Laura; Scherr, Rachel E.; Zickler, Todd; Mazur, Eric

2016-01-01

Large-scale audiovisual data that measure group learning are time consuming to collect and analyze. As an initial step towards scaling qualitative classroom observation, we qualitatively coded classroom video using an established coding scheme with and without its audio cues. We find that interrater reliability is as high when using visual data…
Audiovisual integration in children listening to spectrally degraded speech.

Science.gov (United States)

Maidment, David W; Kang, Hi Jee; Stewart, Hannah J; Amitay, Sygal

2015-02-01

The study explored whether visual information improves speech identification in typically developing children with normal hearing when the auditory signal is spectrally degraded. Children (n=69) and adults (n=15) were presented with noise-vocoded sentences from the Children's Co-ordinate Response Measure (Rosen, 2011) in auditory-only or audiovisual conditions. The number of bands was adaptively varied to modulate the degradation of the auditory signal, with the number of bands required for approximately 79% correct identification calculated as the threshold. The youngest children (4- to 5-year-olds) did not benefit from accompanying visual information, in comparison to 6- to 11-year-old children and adults. Audiovisual gain also increased with age in the child sample. The current data suggest that children younger than 6 years of age do not fully utilize visual speech cues to enhance speech perception when the auditory signal is degraded. This evidence not only has implications for understanding the development of speech perception skills in children with normal hearing but may also inform the development of new treatment and intervention strategies that aim to remediate speech perception difficulties in pediatric cochlear implant users.
Insects and the Kafkaesque: Insectuous Re-Writings in Visual and Audio-Visual Media

Directory of Open Access Journals (Sweden)

Damianos Grammatikopoulos

2017-09-01

Full Text Available In this article, I examine techniques at work in visual and audio-visual media that deal with the creative imitation of central Kafkan themes, particularly those related to hybrid insects and bodily deformity. In addition, the opening section of my study offers a detailed and thorough discussion of the concept of the “Kafkaesque”, and an attempt will be made to circumscribe its signifying limits. The main objective of the study is to explore the relationship between Kafka’s texts and the works of contemporary cartoonists, illustrators (Charles Burns, and filmmakers (David Cronenberg and identify themes and motifs that they have in common. My approach is informed by transtextual practices and source studies, and I draw systematically on Gerard Genette’s Palimpsests and Harold Bloom’s The Anxiety of Influence.
Degradation of labial information modifies audiovisual speech perception in cochlear-implanted children.

Science.gov (United States)

Huyse, Aurélie; Berthommier, Frédéric; Leybaert, Jacqueline

2013-01-01

The aim of the present study was to examine audiovisual speech integration in cochlear-implanted children and in normally hearing children exposed to degraded auditory stimuli. Previous studies have shown that speech perception in cochlear-implanted users is biased toward the visual modality when audition and vision provide conflicting information. Our main question was whether an experimentally designed degradation of the visual speech cue would increase the importance of audition in the response pattern. The impact of auditory proficiency was also investigated. A group of 31 children with cochlear implants and a group of 31 normally hearing children matched for chronological age were recruited. All children with cochlear implants had profound congenital deafness and had used their implants for at least 2 years. Participants had to perform an /aCa/ consonant-identification task in which stimuli were presented randomly in three conditions: auditory only, visual only, and audiovisual (congruent and incongruent McGurk stimuli). In half of the experiment, the visual speech cue was normal; in the other half (visual reduction) a degraded visual signal was presented, aimed at preventing lipreading of good quality. The normally hearing children received a spectrally reduced speech signal (simulating the input delivered by the cochlear implant). First, performance in visual-only and in congruent audiovisual modalities were decreased, showing that the visual reduction technique used here was efficient at degrading lipreading. Second, in the incongruent audiovisual trials, visual reduction led to a major increase in the number of auditory based responses in both groups. Differences between proficient and nonproficient children were found in both groups, with nonproficient children's responses being more visual and less auditory than those of proficient children. Further analysis revealed that differences between visually clear and visually reduced conditions and between
Primary School Pupils' Response to Audio-Visual Learning Process in Port-Harcourt

Science.gov (United States)

Olube, Friday K.

2015-01-01

The purpose of this study is to examine primary school children's response on the use of audio-visual learning processes--a case study of Chokhmah International Academy, Port-Harcourt (owned by Salvation Ministries). It looked at the elements that enhance pupils' response to educational television programmes and their hindrances to these…
Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

DEFF Research Database (Denmark)

Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

2017-01-01

Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....
Unimodal Learning Enhances Crossmodal Learning in Robotic Audio-Visual Tracking

DEFF Research Database (Denmark)

Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

2018-01-01

Crossmodal sensory integration is a fundamental feature of the brain that aids in forming an coherent and unified representation of observed events in the world. Spatiotemporally correlated sensory stimuli brought about by rich sensorimotor experiences drive the development of crossmodal integrat...... a non-holonomic robotic agent towards a moving audio-visual target. Simulation results demonstrate that unimodal learning enhances crossmodal learning and improves both the overall accuracy and precision of multisensory orientation response....
Visual and cross-modal cues increase the identification of overlapping visual stimuli in Balint's syndrome.

Science.gov (United States)

D'Imperio, Daniela; Scandola, Michele; Gobbetto, Valeria; Bulgarelli, Cristina; Salgarello, Matteo; Avesani, Renato; Moro, Valentina

2017-10-01

Cross-modal interactions improve the processing of external stimuli, particularly when an isolated sensory modality is impaired. When information from different modalities is integrated, object recognition is facilitated probably as a result of bottom-up and top-down processes. The aim of this study was to investigate the potential effects of cross-modal stimulation in a case of simultanagnosia. We report a detailed analysis of clinical symptoms and an 18 F-fluorodeoxyglucose (FDG) brain positron emission tomography/computed tomography (PET/CT) study of a patient affected by Balint's syndrome, a rare and invasive visual-spatial disorder following bilateral parieto-occipital lesions. An experiment was conducted to investigate the effects of visual and nonvisual cues on performance in tasks involving the recognition of overlapping pictures. Four modalities of sensory cues were used: visual, tactile, olfactory, and auditory. Data from neuropsychological tests showed the presence of ocular apraxia, optic ataxia, and simultanagnosia. The results of the experiment indicate a positive effect of the cues on the recognition of overlapping pictures, not only in the identification of the congruent valid-cued stimulus (target) but also in the identification of the other, noncued stimuli. All the sensory modalities analyzed (except the auditory stimulus) were efficacious in terms of increasing visual recognition. Cross-modal integration improved the patient's ability to recognize overlapping figures. However, while in the visual unimodal modality both bottom-up (priming, familiarity effect, disengagement of attention) and top-down processes (mental representation and short-term memory, the endogenous orientation of attention) are involved, in the cross-modal integration it is semantic representations that mainly activate visual recognition processes. These results are potentially useful for the design of rehabilitation training for attentional and visual-perceptual deficits.
Audio visual information materials for risk communication

International Nuclear Information System (INIS)

Gunji, Ikuko; Tabata, Rimiko; Ohuchi, Naomi

2005-07-01

Japan Nuclear Cycle Development Institute (JNC), Tokai Works set up the Risk Communication Study Team in January, 2001 to promote mutual understanding between the local residents and JNC. The Team has studied risk communication from various viewpoints and developed new methods of public relations which are useful for the local residents' risk perception toward nuclear issues. We aim to develop more effective risk communication which promotes a better mutual understanding of the local residents, by providing the risk information of the nuclear fuel facilities such a Reprocessing Plant and other research and development facilities. We explain the development process of audio visual information materials which describe our actual activities and devices for the risk management in nuclear fuel facilities, and our discussion through the effectiveness measurement. (author)
Perceptual stimulus-A Bayesian-based integration of multi-visual-cue approach and its application

Institute of Scientific and Technical Information of China (English)

XUE JianRu; ZHENG NanNing; ZHONG XiaoPin; PING LinJiang

2008-01-01

With the view that visual cue could be taken as a kind of stimulus, the study of the mechanism in the visual perception process by using visual cues in their probabilistic representation eventually leads to a class of statistical integration of multiple visual cues (IMVC) methods which have been applied widely in perceptual grouping, video analysis, and other basic problems in computer vision. In this paper, a survey on the basic ideas and recent advances of IMVC methods is presented, and much focus is on the models and algorithms of IMVC for video analysis within the framework of Bayesian estimation. Furthermore, two typical problems in video analysis, robust visual tracking and "switching problem" in multi-target tracking (MTT) are taken as test beds to verify a series of Bayesian-based IMVC methods proposed by the authors. Furthermore, the relations between the statistical IMVC and the visual per-ception process, as well as potential future research work for IMVC, are discussed.
Slow changing postural cues cancel visual field dependence on self-tilt detection.

Science.gov (United States)

Scotto Di Cesare, C; Macaluso, T; Mestre, D R; Bringoux, L

2015-01-01

Interindividual differences influence the multisensory integration process involved in spatial perception. Here, we assessed the effect of visual field dependence on self-tilt detection relative to upright, as a function of static vs. slow changing visual or postural cues. To that aim, we manipulated slow rotations (i.e., 0.05° s(-1)) of the body and/or the visual scene in pitch. Participants had to indicate whether they felt being tilted forward at successive angles. Results show that thresholds for self-tilt detection substantially differed between visual field dependent/independent subjects, when only the visual scene was rotated. This difference was no longer present when the body was actually rotated, whatever the visual scene condition (i.e., absent, static or rotated relative to the observer). These results suggest that the cancellation of visual field dependence by dynamic postural cues may rely on a multisensory reweighting process, where slow changing vestibular/somatosensory inputs may prevail over visual inputs. Copyright © 2014 Elsevier B.V. All rights reserved.
APLIKASI MEDIA AUDIO-VISUAL DALAM PEMBELAJARAN SPEAKING SKILL DENGAN PENDEKATAN AUDIOLINGUAL: Studi Kasus di MAN Batang

Directory of Open Access Journals (Sweden)

Slamet Untung

2012-10-01

Full Text Available The research to study the application of audio and visual medium in order to learn speaking skill by audiolingual approach is a good contribution to educational world of senior high school and the Islamic one, particularly, in finding a way to improving the learning component relating directly to the medium and method of learning speaking skill. This research is to find out its significance and relevance. The main variable of this research includes the whole activities of the application of audio and visual medium in learning speaking skill by audio-lingual approach. The data were collected through observation, interview, questionnaire and documentation. This research took place in state Islamic senior high school of Batang in Central Java. The result shows that the application helps the students to speak English correctly and accurately and stresses the message of the speaking skill learning.
Knowledge-assisted cross-media analysis of audio-visual content in the news domain

NARCIS (Netherlands)

Mezaris, Vasileios; Gidaros, Spyros; Papadopoulos, Georgios Th.; Kasper, Walter; Ordelman, Roeland J.F.; de Jong, Franciska M.G.; Kompatsiaris, Ioannis

In this paper, a complete architecture for knowledge-assisted cross-media analysis of News-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio,
I can see what you are saying: Auditory labels reduce visual search times.

Science.gov (United States)

Cho, Kit W

2016-10-01

The present study explored the self-directed-speech effect, the finding that relative to silent reading of a label (e.g., DOG), saying it aloud reduces visual search reaction times (RTs) for locating a target picture among distractors. Experiment 1 examined whether this effect is due to a confound in the differences in the number of cues in self-directed speech (two) vs. silent reading (one) and tested whether self-articulation is required for the effect. The results showed that self-articulation is not required and that merely hearing the auditory label reduces visual search RTs relative to silent reading. This finding also rules out the number of cues confound. Experiment 2 examined whether hearing an auditory label activates more prototypical features of the label's referent and whether the auditory-label benefit is moderated by the target's imagery concordance (the degree to which the target picture matches the mental picture that is activated by a written label for the target). When the target imagery concordance was high, RTs following the presentation of a high prototypicality picture or auditory cue were comparable and shorter than RTs following a visual label or low prototypicality picture cue. However, when the target imagery concordance was low, RTs following an auditory cue were shorter than the comparable RTs following the picture cues and visual-label cue. The results suggest that an auditory label activates both prototypical and atypical features of a concept and can facilitate visual search RTs even when compared to picture primes. Copyright © 2016 Elsevier B.V. All rights reserved.
Audio Visual Media Components in Educational Game for Elementary Students

Directory of Open Access Journals (Sweden)

Meilani Hartono

2016-12-01

Full Text Available The purpose of this research was to review and implement interactive audio visual media used in an educational game to improve elementary students’ interest in learning mathematics. The game was developed for desktop platform. The art of the game was set as 2D cartoon art with animation and audio in order to make students more interest. There were four mini games developed based on the researches on mathematics study. Development method used was Multimedia Development Life Cycle (MDLC that consists of requirement, design, development, testing, and implementation phase. Data collection methods used are questionnaire, literature study, and interview. The conclusion is elementary students interest with educational game that has fun and active (moving objects, with fast tempo of music, and carefree color like blue. This educational game is hoped to be an alternative teaching tool combined with conventional teaching method.
Effects of Temporal Congruity Between Auditory and Visual Stimuli Using Rapid Audio-Visual Serial Presentation.

Science.gov (United States)

An, Xingwei; Tang, Jiabei; Liu, Shuang; He, Feng; Qi, Hongzhi; Wan, Baikun; Ming, Dong

2016-10-01

Combining visual and auditory stimuli in event-related potential (ERP)-based spellers gained more attention in recent years. Few of these studies notice the difference of ERP components and system efficiency caused by the shifting of visual and auditory onset. Here, we aim to study the effect of temporal congruity of auditory and visual stimuli onset on bimodal brain-computer interface (BCI) speller. We designed five visual and auditory combined paradigms with different visual-to-auditory delays (-33 to +100 ms). Eleven participants attended in this study. ERPs were acquired and aligned according to visual and auditory stimuli onset, respectively. ERPs of Fz, Cz, and PO7 channels were studied through the statistical analysis of different conditions both from visual-aligned ERPs and audio-aligned ERPs. Based on the visual-aligned ERPs, classification accuracy was also analyzed to seek the effects of visual-to-auditory delays. The latencies of ERP components depended mainly on the visual stimuli onset. Auditory stimuli onsets influenced mainly on early component accuracies, whereas visual stimuli onset determined later component accuracies. The latter, however, played a dominate role in overall classification. This study is important for further studies to achieve better explanations and ultimately determine the way to optimize the bimodal BCI application.
Cueing listeners to attend to a target talker progressively improves word report as the duration of the cue-target interval lengthens to 2,000 ms.

Science.gov (United States)

Holmes, Emma; Kitterick, Padraig T; Summerfield, A Quentin

2018-04-25

Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1,000, and 2,000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker's spatial location or their gender. Participants directed attention to location and gender simultaneously ("objects") at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2,000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2,000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2,000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2,000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities.

The benefit obtained from visually displayed text from an automatic speech recognizer during listening to speech presented in noise

NARCIS (Netherlands)

Zekveld, A.A.; Kramer, S.E.; Kessens, J.M.; Vlaming, M.S.M.G.; Houtgast, T.

2008-01-01

OBJECTIVES: The aim of this study was to evaluate the benefit that listeners obtain from visually presented output from an automatic speech recognition (ASR) system during listening to speech in noise. DESIGN: Auditory-alone and audiovisual speech reception thresholds (SRTs) were measured. The SRT
Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled

NARCIS (Netherlands)

Huijbregts, M.A.H.

2008-01-01

In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions
Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

Science.gov (United States)

Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

2010-01-01

A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
Unsupervised topic modelling on South African parliament audio data

CSIR Research Space (South Africa)

Kleynhans, N

2014-11-01

Full Text Available Using a speech recognition system to convert spoken audio to text can enable the structuring of large collections of spoken audio data. A convenient means to summarise or cluster spoken data is to identify the topic under discussion. There are many...
Impact of oral health education by audio aids, braille and tactile models on the oral health status of visually impaired children of Bhopal City.

Science.gov (United States)

Gautam, Anjali; Bhambal, Ajay; Moghe, Swapnil

2018-01-01

Children with special needs face unique challenges in day-to-day practice. They are dependent on their close ones for everything. To improve oral hygiene in such visually impaired children, undue training and education are required. Braille is an important language for reading and writing for the visually impaired. It helps them understand and visualize the world via touch. Audio aids are being used to impart health education to the visually impaired. Tactile models help them perceive things which they cannot visualize and hence are an important learning tool. This study aimed to assess the improvement in oral hygiene by audio aids and Braille and tactile models in visually impaired children aged 6-16 years of Bhopal city. This was a prospective study. Sixty visually impaired children aged 6-16 years were selected and randomly divided into three groups (20 children each). Group A: audio aids + Braille, Group B: audio aids + tactile models, and Group C: audio aids + Braille + tactile models. Instructions were given for maintaining good oral hygiene and brushing techniques were explained to all children. After 3 months' time, the oral hygiene status was recorded and compared using plaque and gingival index. ANNOVA test was used. The present study showed a decrease in the mean plaque and gingival scores at all time intervals in individual group as compared to that of the baseline that was statistically significant. The study depicts that the combination of audio aids, Braille and tactile models is an effective way to provide oral health education and improve oral health status of visually impaired children.
The Galker test of speech reception in noise

DEFF Research Database (Denmark)

Lauritsen, Maj-Britt Glenn; Söderström, Margareta; Kreiner, Svend

2016-01-01

PURPOSE: We tested "the Galker test", a speech reception in noise test developed for primary care for Danish preschool children, to explore if the children's ability to hear and understand speech was associated with gender, age, middle ear status, and the level of background noise. METHODS......: The Galker test is a 35-item audio-visual, computerized word discrimination test in background noise. Included were 370 normally developed children attending day care center. The children were examined with the Galker test, tympanometry, audiometry, and the Reynell test of verbal comprehension. Parents...... and daycare teachers completed questionnaires on the children's ability to hear and understand speech. As most of the variables were not assessed using interval scales, non-parametric statistics (Goodman-Kruskal's gamma) were used for analyzing associations with the Galker test score. For comparisons...
Cannabis cue-induced brain activation correlates with drug craving in limbic and visual salience regions: Preliminary results

Science.gov (United States)

Charboneau, Evonne J.; Dietrich, Mary S.; Park, Sohee; Cao, Aize; Watkins, Tristan J; Blackford, Jennifer U; Benningfield, Margaret M.; Martin, Peter R.; Buchowski, Maciej S.; Cowan, Ronald L.

2013-01-01

Craving is a major motivator underlying drug use and relapse but the neural correlates of cannabis craving are not well understood. This study sought to determine whether visual cannabis cues increase cannabis craving and whether cue-induced craving is associated with regional brain activation in cannabis-dependent individuals. Cannabis craving was assessed in 16 cannabis-dependent adult volunteers while they viewed cannabis cues during a functional MRI (fMRI) scan. The Marijuana Craving Questionnaire was administered immediately before and after each of three cannabis cue-exposure fMRI runs. FMRI blood-oxygenation-level-dependent (BOLD) signal intensity was determined in regions activated by cannabis cues to examine the relationship of regional brain activation to cannabis craving. Craving scores increased significantly following exposure to visual cannabis cues. Visual cues activated multiple brain regions, including inferior orbital frontal cortex, posterior cingulate gyrus, parahippocampal gyrus, hippocampus, amygdala, superior temporal pole, and occipital cortex. Craving scores at baseline and at the end of all three runs were significantly correlated with brain activation during the first fMRI run only, in the limbic system (including amygdala and hippocampus) and paralimbic system (superior temporal pole), and visual regions (occipital cortex). Cannabis cues increased craving in cannabis-dependent individuals and this increase was associated with activation in the limbic, paralimbic, and visual systems during the first fMRI run, but not subsequent fMRI runs. These results suggest that these regions may mediate visually cued aspects of drug craving. This study provides preliminary evidence for the neural basis of cue-induced cannabis craving and suggests possible neural targets for interventions targeted at treating cannabis dependence. PMID:24035535
A technology prototype system for rating therapist empathy from audio recordings in addiction counseling.

Science.gov (United States)

Xiao, Bo; Huang, Chewei; Imel, Zac E; Atkins, David C; Georgiou, Panayiotis; Narayanan, Shrikanth S

2016-04-01

Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy-a key therapy quality index-from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist's language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training.
A technology prototype system for rating therapist empathy from audio recordings in addiction counseling

Directory of Open Access Journals (Sweden)

Bo Xiao

2016-04-01

Full Text Available Scaling up psychotherapy services such as for addiction counseling is a critical societal need. One challenge is ensuring quality of therapy, due to the heavy cost of manual observational assessment. This work proposes a speech technology-based system to automate the assessment of therapist empathy—a key therapy quality index—from audio recordings of the psychotherapy interactions. We designed a speech processing system that includes voice activity detection and diarization modules, and an automatic speech recognizer plus a speaker role matching module to extract the therapist’s language cues. We employed Maximum Entropy models, Maximum Likelihood language models, and a Lattice Rescoring method to characterize high vs. low empathic language. We estimated therapy-session level empathy codes using utterance level evidence obtained from these models. Our experiments showed that the fully automated system achieved a correlation of 0.643 between expert annotated empathy codes and machine-derived estimations, and an accuracy of 81% in classifying high vs. low empathy, in comparison to a 0.721 correlation and 86% accuracy in the oracle setting using manual transcripts. The results show that the system provides useful information that can contribute to automatic quality insurance and therapist training.
Perceived Audio Quality Analysis in Digital Audio Broadcasting Plus System Based on PEAQ

Directory of Open Access Journals (Sweden)

K. Ulovec

2018-04-01

Full Text Available Broadcasters need to decide on bitrates of the services in the multiplex transmitted via Digital Audio Broadcasting Plus system. The bitrate should be set as low as possible for maximal number of services, but with high quality, not lower than in conventional analog systems. In this paper, the objective method Perceptual Evaluation of Audio Quality is used to analyze the perceived audio quality for appropriate codecs --- MP2 and AAC offering three profiles. The main aim is to determine dependencies on the type of signal --- music and speech, the number of channels --- stereo and mono, and the bitrate. Results indicate that only MP2 codec and AAC Low Complexity profile reach imperceptible quality loss. The MP2 codec needs higher bitrate than AAC Low Complexity profile for the same quality. For the both versions of AAC High-Efficiency profiles, the limit bitrates are determined above which less complex profiles outperform the more complex ones and higher bitrates above these limits are not worth using. It is shown that stereo music has worse quality than stereo speech generally, whereas for mono, the dependencies vary upon the codec/profile. Furthermore, numbers of services satisfying various quality criteria are presented.
Audiovisual integration for speech during mid-childhood: Electrophysiological evidence

Science.gov (United States)

Kaganovich, Natalya; Schumaker, Jennifer

2014-01-01

Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7–8-year-olds and 10–11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception. PMID:25463815
The presentation of expert testimony via live audio-visual communication.

Science.gov (United States)

Miller, R D

1991-01-01

As part of a national effort to improve efficiency in court procedures, the American Bar Association has recommended, on the basis of a number of pilot studies, increased use of current audio-visual technology, such as telephone and live video communication, to eliminate delays caused by unavailability of participants in both civil and criminal procedures. Although these recommendations were made to facilitate court proceedings, and for the convenience of attorneys and judges, they also have the potential to save significant time for clinical expert witnesses as well. The author reviews the studies of telephone testimony that were done by the American Bar Association and other legal research groups, as well as the experience in one state forensic evaluation and treatment center. He also reviewed the case law on the issue of remote testimony. He then presents data from a national survey of state attorneys general concerning the admissibility of testimony via audio-visual means, including video depositions. Finally, he concludes that the option to testify by telephone provides a significant savings in precious clinical time for forensic clinicians in public facilities, and urges that such clinicians work actively to convince courts and/or legislatures in states that do not permit such testimony (currently the majority), to consider accepting it, to improve the effective use of scarce clinical resources in public facilities.
Are multiple visual short-term memory storages necessary to explain the retro-cue effect?

Science.gov (United States)

Makovski, Tal

2012-06-01

Recent research has shown that change detection performance is enhanced when, during the retention interval, attention is cued to the location of the upcoming test item. This retro-cue advantage has led some researchers to suggest that visual short-term memory (VSTM) is divided into a durable, limited-capacity storage and a more fragile, high-capacity storage. Consequently, performance is poor on the no-cue trials because fragile VSTM is overwritten by the test display and only durable VSTM is accessible under these conditions. In contrast, performance is improved in the retro-cue condition because attention keeps fragile VSTM accessible. The aim of the present study was to test the assumptions underlying this two-storage account. Participants were asked to encode an array of colors for a change detection task involving no-cue and retro-cue trials. A retro-cue advantage was found even when the cue was presented after a visual (Experiment 1) or a central (Experiment 2) interference. Furthermore, the magnitude of the interference was comparable between the no-cue and retro-cue trials. These data undermine the main empirical support for the two-storage account and suggest that the presence of a retro-cue benefit cannot be used to differentiate between different VSTM storages.
Effects of sensorineural hearing loss on visually guided attention in a multitalker environment.

Science.gov (United States)

Best, Virginia; Marrone, Nicole; Mason, Christine R; Kidd, Gerald; Shinn-Cunningham, Barbara G

2009-03-01

This study asked whether or not listeners with sensorineural hearing loss have an impaired ability to use top-down attention to enhance speech intelligibility in the presence of interfering talkers. Listeners were presented with a target string of spoken digits embedded in a mixture of five spatially separated speech streams. The benefit of providing simple visual cues indicating when and/or where the target would occur was measured in listeners with hearing loss, listeners with normal hearing, and a control group of listeners with normal hearing who were tested at a lower target-to-masker ratio to equate their baseline (no cue) performance with the hearing-loss group. All groups received robust benefits from the visual cues. The magnitude of the spatial-cue benefit, however, was significantly smaller in listeners with hearing loss. Results suggest that reduced utility of selective attention for resolving competition between simultaneous sounds contributes to the communication difficulties experienced by listeners with hearing loss in everyday listening situations.
Treating speech subsystems in childhood apraxia of speech with tactual input: the PROMPT approach.

Science.gov (United States)

Dale, Philip S; Hayden, Deborah A

2013-11-01

Prompts for Restructuring Oral Muscular Phonetic Targets (PROMPT; Hayden, 2004; Hayden, Eigen, Walker, & Olsen, 2010)-a treatment approach for the improvement of speech sound disorders in children-uses tactile-kinesthetic- proprioceptive (TKP) cues to support and shape movements of the oral articulators. No research to date has systematically examined the efficacy of PROMPT for children with childhood apraxia of speech (CAS). Four children (ages 3;6 [years;months] to 4;8), all meeting the American Speech-Language-Hearing Association (2007) criteria for CAS, were treated using PROMPT. All children received 8 weeks of 2 × per week treatment, including at least 4 weeks of full PROMPT treatment that included TKP cues. During the first 4 weeks, 2 of the 4 children received treatment that included all PROMPT components except TKP cues. This design permitted both between-subjects and within-subjects comparisons to evaluate the effect of TKP cues. Gains in treatment were measured by standardized tests and by criterion-referenced measures based on the production of untreated probe words, reflecting change in speech movements and auditory perceptual accuracy. All 4 children made significant gains during treatment, but measures of motor speech control and untreated word probes provided evidence for more gain when TKP cues were included. PROMPT as a whole appears to be effective for treating children with CAS, and the inclusion of TKP cues appears to facilitate greater effect.
Audio-Visual Feedback for Self-monitoring Posture in Ballet Training

DEFF Research Database (Denmark)

Knudsen, Esben Winther; Hølledig, Malte Lindholm; Bach-Nielsen, Sebastian Siem

2017-01-01

An application for ballet training is presented that monitors the posture position (straightness of the spine and rotation of the pelvis) deviation from the ideal position in real-time. The human skeletal data is acquired through a Microsoft Kinect v2. The movement of the student is mirrored......-coded. In an experiment with 9-12 year-old dance students from a ballet school, comparing the audio-visual feedback modality with no feedback leads to an increase in posture accuracy (p
A Causal Inference Model Explains Perception of the McGurk Effect and Other Incongruent Audiovisual Speech.

Directory of Open Access Journals (Sweden)

John F Magnotti

2017-02-01

Full Text Available Audiovisual speech integration combines information from auditory speech (talker's voice and visual speech (talker's mouth movements to improve perceptual accuracy. However, if the auditory and visual speech emanate from different talkers, integration decreases accuracy. Therefore, a key step in audiovisual speech perception is deciding whether auditory and visual speech have the same source, a process known as causal inference. A well-known illusion, the McGurk Effect, consists of incongruent audiovisual syllables, such as auditory "ba" + visual "ga" (AbaVga, that are integrated to produce a fused percept ("da". This illusion raises two fundamental questions: first, given the incongruence between the auditory and visual syllables in the McGurk stimulus, why are they integrated; and second, why does the McGurk effect not occur for other, very similar syllables (e.g., AgaVba. We describe a simplified model of causal inference in multisensory speech perception (CIMS that predicts the perception of arbitrary combinations of auditory and visual speech. We applied this model to behavioral data collected from 60 subjects perceiving both McGurk and non-McGurk incongruent speech stimuli. The CIMS model successfully predicted both the audiovisual integration observed for McGurk stimuli and the lack of integration observed for non-McGurk stimuli. An identical model without causal inference failed to accurately predict perception for either form of incongruent speech. The CIMS model uses causal inference to provide a computational framework for studying how the brain performs one of its most important tasks, integrating auditory and visual speech cues to allow us to communicate with others.
Numerosity estimation in visual stimuli in the absence of luminance-based cues.

Directory of Open Access Journals (Sweden)

Peter Kramer

2011-02-01

Full Text Available Numerosity estimation is a basic preverbal ability that humans share with many animal species and that is believed to be foundational of numeracy skills. It is notoriously difficult, however, to establish whether numerosity estimation is based on numerosity itself, or on one or more non-numerical cues like-in visual stimuli-spatial extent and density. Frequently, different non-numerical cues are held constant on different trials. This strategy, however, still allows numerosity estimation to be based on a combination of non-numerical cues rather than on any particular one by itself.Here we introduce a novel method, based on second-order (contrast-based visual motion, to create stimuli that exclude all first-order (luminance-based cues to numerosity. We show that numerosities can be estimated almost as well in second-order motion as in first-order motion.The results show that numerosity estimation need not be based on first-order spatial filtering, first-order density perception, or any other processing of luminance-based cues to numerosity. Our method can be used as an effective tool to control non-numerical variables in studies of numerosity estimation.
Visual cues in low-level flight - Implications for pilotage, training, simulation, and enhanced/synthetic vision systems

Science.gov (United States)

Foyle, David C.; Kaiser, Mary K.; Johnson, Walter W.

1992-01-01

This paper reviews some of the sources of visual information that are available in the out-the-window scene and describes how these visual cues are important for routine pilotage and training, as well as the development of simulator visual systems and enhanced or synthetic vision systems for aircraft cockpits. It is shown how these visual cues may change or disappear under environmental or sensor conditions, and how the visual scene can be augmented by advanced displays to capitalize on the pilot's excellent ability to extract visual information from the visual scene.
Impact of oral health education by audio aids, braille and tactile models on the oral health status of visually impaired children of Bhopal City

Directory of Open Access Journals (Sweden)

Anjali Gautam

2018-01-01

Full Text Available Context: Children with special needs face unique challenges in day-to-day practice. They are dependent on their close ones for everything. To improve oral hygiene in such visually impaired children, undue training and education are required. Braille is an important language for reading and writing for the visually impaired. It helps them understand and visualize the world via touch. Audio aids are being used to impart health education to the visually impaired. Tactile models help them perceive things which they cannot visualize and hence are an important learning tool. Aim: This study aimed to assess the improvement in oral hygiene by audio aids and Braille and tactile models in visually impaired children aged 6–16 years of Bhopal city. Settings and Design: This was a prospective study. Materials and Methods: Sixty visually impaired children aged 6–16 years were selected and randomly divided into three groups (20 children each. Group A: audio aids + Braille, Group B: audio aids + tactile models, and Group C: audio aids + Braille + tactile models. Instructions were given for maintaining good oral hygiene and brushing techniques were explained to all children. After 3 months' time, the oral hygiene status was recorded and compared using plaque and gingival index. Statistical Analysis Used: ANNOVA test was used. Results: The present study showed a decrease in the mean plaque and gingival scores at all time intervals in individual group as compared to that of the baseline that was statistically significant. Conclusions: The study depicts that the combination of audio aids, Braille and tactile models is an effective way to provide oral health education and improve oral health status of visually impaired children.

Task-specific visual cues for improving process model understanding

NARCIS (Netherlands)

Petrusel, Razvan; Mendling, Jan; Reijers, Hajo A.

2016-01-01

Context Business process models support various stakeholders in managing business processes and designing process-aware information systems. In order to make effective use of these models, they have to be readily understandable. Objective Prior research has emphasized the potential of visual cues to
Designing between Pedagogies and Cultures: Audio-Visual Chinese Language Resources for Australian Schools

Science.gov (United States)

Yuan, Yifeng; Shen, Huizhong

2016-01-01

This design-based study examines the creation and development of audio-visual Chinese language teaching and learning materials for Australian schools by incorporating users' feedback and content writers' input that emerged in the designing process. Data were collected from workshop feedback of two groups of Chinese-language teachers from primary…
The role of temporal synchrony as a binding cue for visual persistence in early visual areas: an fMRI study.

Science.gov (United States)

Wong, Yvonne J; Aldcroft, Adrian J; Large, Mary-Ellen; Culham, Jody C; Vilis, Tutis

2009-12-01

We examined the role of temporal synchrony-the simultaneous appearance of visual features-in the perceptual and neural processes underlying object persistence. When a binding cue (such as color or motion) momentarily exposes an object from a background of similar elements, viewers remain aware of the object for several seconds before it perceptually fades into the background, a phenomenon known as object persistence. We showed that persistence from temporal stimulus synchrony, like that arising from motion and color, is associated with activation in the lateral occipital (LO) area, as measured by functional magnetic resonance imaging. We also compared the distribution of occipital cortex activity related to persistence to that of iconic visual memory. Although activation related to iconic memory was largely confined to LO, activation related to object persistence was present across V1 to LO, peaking in V3 and V4, regardless of the binding cue (temporal synchrony, motion, or color). Although persistence from motion cues was not associated with higher activation in the MT+ motion complex, persistence from color cues was associated with increased activation in V4. Taken together, these results demonstrate that although persistence is a form of visual memory, it relies on neural mechanisms different from those of iconic memory. That is, persistence not only activates LO in a cue-independent manner, it also recruits visual areas that may be necessary to maintain binding between object elements.
Spatial Hearing with Incongruent Visual or Auditory Room Cues

Science.gov (United States)

Gil-Carvajal, Juan C.; Cubick, Jens; Santurette, Sébastien; Dau, Torsten

2016-11-01

In day-to-day life, humans usually perceive the location of sound sources as outside their heads. This externalized auditory spatial perception can be reproduced through headphones by recreating the sound pressure generated by the source at the listener’s eardrums. This requires the acoustical features of the recording environment and listener’s anatomy to be recorded at the listener’s ear canals. Although the resulting auditory images can be indistinguishable from real-world sources, their externalization may be less robust when the playback and recording environments differ. Here we tested whether a mismatch between playback and recording room reduces perceived distance, azimuthal direction, and compactness of the auditory image, and whether this is mostly due to incongruent auditory cues or to expectations generated from the visual impression of the room. Perceived distance ratings decreased significantly when collected in a more reverberant environment than the recording room, whereas azimuthal direction and compactness remained room independent. Moreover, modifying visual room-related cues had no effect on these three attributes, while incongruent auditory room-related cues between the recording and playback room did affect distance perception. Consequently, the external perception of virtual sounds depends on the degree of congruency between the acoustical features of the environment and the stimuli.
Auditory and visual cueing modulate cycling speed of older adults and persons with Parkinson's disease in a Virtual Cycling (V-Cycle) system.

Science.gov (United States)

Gallagher, Rosemary; Damodaran, Harish; Werner, William G; Powell, Wendy; Deutsch, Judith E

2016-08-19

Evidence based virtual environments (VEs) that incorporate compensatory strategies such as cueing may change motor behavior and increase exercise intensity while also being engaging and motivating. The purpose of this study was to determine if persons with Parkinson's disease and aged matched healthy adults responded to auditory and visual cueing embedded in a bicycling VE as a method to increase exercise intensity. We tested two groups of participants, persons with Parkinson's disease (PD) (n = 15) and age-matched healthy adults (n = 13) as they cycled on a stationary bicycle while interacting with a VE. Participants cycled under two conditions: auditory cueing (provided by a metronome) and visual cueing (represented as central road markers in the VE). The auditory condition had four trials in which auditory cues or the VE were presented alone or in combination. The visual condition had five trials in which the VE and visual cue rate presentation was manipulated. Data were analyzed by condition using factorial RMANOVAs with planned t-tests corrected for multiple comparisons. There were no differences in pedaling rates between groups for both the auditory and visual cueing conditions. Persons with PD increased their pedaling rate in the auditory (F 4.78, p = 0.029) and visual cueing (F 26.48, p auditory (F = 24.72, p visual cueing (F = 40.69, p visual condition in age-matched healthy adults showed a step-wise increase in pedaling rate (p = 0.003 to p visual cues (p visual cues in order to obtain an increase in cycling intensity. The combination of the VE and auditory cues was neither additive nor interfering. These data serve as preliminary evidence that embedding auditory and visual cues to alter cycling speed in a VE as method to increase exercise intensity that may promote fitness.
Semantic Context Detection Using Audio Event Fusion

Directory of Open Access Journals (Sweden)

Cheng Wen-Huang

2006-01-01

Full Text Available Semantic-level content analysis is a crucial issue in achieving efficient content retrieval and management. We propose a hierarchical approach that models audio events over a time series in order to accomplish semantic context detection. Two levels of modeling, audio event and semantic context modeling, are devised to bridge the gap between physical audio features and semantic concepts. In this work, hidden Markov models (HMMs are used to model four representative audio events, that is, gunshot, explosion, engine, and car braking, in action movies. At the semantic context level, generative (ergodic hidden Markov model and discriminative (support vector machine (SVM approaches are investigated to fuse the characteristics and correlations among audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the effectiveness of the proposed approaches and provide a preliminary framework for information mining by using audio characteristics.
Experience-Dependency of Reliance on Local Visual and Idiothetic Cues for Spatial Representations Created in the Absence of Distal Information

Directory of Open Access Journals (Sweden)

Fabian Draht

2017-06-01

Full Text Available Spatial encoding in the hippocampus is based on a range of different input sources. To generate spatial representations, reliable sensory cues from the external environment are integrated with idiothetic cues, derived from self-movement, that enable path integration and directional perception. In this study, we examined to what extent idiothetic cues significantly contribute to spatial representations and navigation: we recorded place cells while rodents navigated towards two visually identical chambers in 180° orientation via two different paths in darkness and in the absence of reliable auditory or olfactory cues. Our goal was to generate a conflict between local visual and direction-specific information, and then to assess which strategy was prioritized in different learning phases. We observed that, in the absence of distal cues, place fields are initially controlled by local visual cues that override idiothetic cues, but that with multiple exposures to the paradigm, spaced at intervals of days, idiothetic cues become increasingly implemented in generating an accurate spatial representation. Taken together, these data support that, in the absence of distal cues, local visual cues are prioritized in the generation of context-specific spatial representations through place cells, whereby idiothetic cues are deemed unreliable. With cumulative exposures to the environments, the animal learns to attend to subtle idiothetic cues to resolve the conflict between visual and direction-specific information.
Experience-Dependency of Reliance on Local Visual and Idiothetic Cues for Spatial Representations Created in the Absence of Distal Information.

Science.gov (United States)

Draht, Fabian; Zhang, Sijie; Rayan, Abdelrahman; Schönfeld, Fabian; Wiskott, Laurenz; Manahan-Vaughan, Denise

2017-01-01

Spatial encoding in the hippocampus is based on a range of different input sources. To generate spatial representations, reliable sensory cues from the external environment are integrated with idiothetic cues, derived from self-movement, that enable path integration and directional perception. In this study, we examined to what extent idiothetic cues significantly contribute to spatial representations and navigation: we recorded place cells while rodents navigated towards two visually identical chambers in 180° orientation via two different paths in darkness and in the absence of reliable auditory or olfactory cues. Our goal was to generate a conflict between local visual and direction-specific information, and then to assess which strategy was prioritized in different learning phases. We observed that, in the absence of distal cues, place fields are initially controlled by local visual cues that override idiothetic cues, but that with multiple exposures to the paradigm, spaced at intervals of days, idiothetic cues become increasingly implemented in generating an accurate spatial representation. Taken together, these data support that, in the absence of distal cues, local visual cues are prioritized in the generation of context-specific spatial representations through place cells, whereby idiothetic cues are deemed unreliable. With cumulative exposures to the environments, the animal learns to attend to subtle idiothetic cues to resolve the conflict between visual and direction-specific information.
Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition.

Science.gov (United States)

Stevenson, Ryan A; Nelms, Caitlin E; Baum, Sarah H; Zurkovsky, Lilia; Barense, Morgan D; Newhouse, Paul A; Wallace, Mark T

2015-01-01

Over the next 2 decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio. For whole-word recognition, older adults relative to younger adults showed greater multisensory gains at intermediate SNRs but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as signal-to-noise ratio decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy-aging populations and that deficits begin to emerge only at the more complex word-recognition level of speech signals. Copyright © 2015 Elsevier Inc. All rights reserved.
Tuning Neural Phase Entrainment to Speech.

Science.gov (United States)

Falk, Simone; Lanzilotti, Cosima; Schön, Daniele

2017-08-01

Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.
Multimodal indexing of digital audio-visual documents: A case study for cultural heritage data

NARCIS (Netherlands)

Carmichael, J.; Larson, M.; Marlow, J.; Newman, E.; Clough, P.; Oomen, J.; Sav, S.

2008-01-01

This paper describes a multimedia multimodal information access sub-system (MIAS) for digital audio-visual documents, typically presented in streaming media format. The system is designed to provide both professional and general users with entry points into video documents that are relevant to their
[Visual cues as a therapeutic tool in Parkinson's disease. A systematic review].

Science.gov (United States)

Muñoz-Hellín, Elena; Cano-de-la-Cuerda, Roberto; Miangolarra-Page, Juan Carlos

2013-01-01

Sensory stimuli or sensory cues are being used as a therapeutic tool for improving gait disorders in Parkinson's disease patients, but most studies seem to focus on auditory stimuli. The aim of this study was to conduct a systematic review regarding the use of visual cues over gait disorders, dual tasks during gait, freezing and the incidence of falls in patients with Parkinson to obtain therapeutic implications. We conducted a systematic review in main databases such as Cochrane Database of Systematic Reviews, TripDataBase, PubMed, Ovid MEDLINE, Ovid EMBASE and Physiotherapy Evidence Database, during 2005 to 2012, according to the recommendations of the Consolidated Standards of Reporting Trials, evaluating the quality of the papers included with the Downs & Black Quality Index. 21 articles were finally included in this systematic review (with a total of 892 participants) with variable methodological quality, achieving an average of 17.27 points in the Downs and Black Quality Index (range: 11-21). Visual cues produce improvements over temporal-spatial parameters in gait, turning execution, reducing the appearance of freezing and falls in Parkinson's disease patients. Visual cues appear to benefit dual tasks during gait, reducing the interference of the second task. Further studies are needed to determine the preferred type of stimuli for each stage of the disease. Copyright © 2012 SEGG. Published by Elsevier Espana. All rights reserved.
Focus on Hinduism: Audio-Visual Resources for Teaching Religion. Occasional Publication No. 23.

Science.gov (United States)

Dell, David; And Others

The guide presents annotated lists of audio and visual materials about the Hindu religion. The authors point out that Hinduism cannot be comprehended totally by reading books; thus the resources identified in this guide will enhance understanding based on reading. The guide is intended for use by high school and college students, teachers,…
Visual and Proprioceptive Cue Weighting in Children with Developmental Coordination Disorder, Autism Spectrum Disorder and Typical Development

Directory of Open Access Journals (Sweden)

L Miller

2013-10-01

Full Text Available Accurate movement of the body and the perception of the body's position in space usually rely on both visual and proprioceptive cues. These cues are weighted differently depending on task, visual conditions and neurological factors. Children with Developmental Coordination Disorder (DCD and often also children with Autism Spectrum Disorder (ASD have movement deficits, and there is evidence that cue weightings may differ between these groups. It is often reported that ASD is linked to an increased reliance on proprioceptive information at the expense of visual information (Haswell et al, 2009; Gepner et al, 1995. The inverse appears to be true for DCD (Wann et al, 1998; Biancotto et al, 2011. I will report experiments comparing, for the first time, relative weightings of visual and proprioceptive information in children aged 8-14 with ASD, DCD and typical development. Children completed the Movement Assessment Battery for Children (MABC-II to assess motor ability and a visual-proprioceptive matching task to assess relative cue weighting. Results from the movement battery provided evidence for movement deficits in ASD similar to those in DCD. Cue weightings in the matching task did not differentiate the clinical groups, however those children with ASD with relatively spared movement skills tended to weight visual cues less heavily than those with DCD-like movement deficits. These findings will be discussed with reference to previous DSM-IV diagnostic criteria and also relevant revisions in the DSM-V.
Use of audio-visual methods in radiology and physics courses

Energy Technology Data Exchange (ETDEWEB)

Holmberg, P

1987-03-15

Today's medicine utilizes sophisticated equipment for radiological, biochemical and microbiological investigation procedures and analyses. Hence it is necessary that physicans have adequate scientific and technical knowledge of the apparatus they are using so that the equipment can be used in the most effective way. Partly this knowledge is obtained from science-orientated courses in the preclinical stage of the study program for medical students. To increase the motivation to study science-courses (medical physics) audio-visual methods are used to describe diagnostic and therapeutic procedures in the clinical routines.
Integration of visual and inertial cues in perceived heading of self-motion

NARCIS (Netherlands)

Winkel, K.N. de; Weesie, H.M.; Werkhoven, P.J.; Groen, E.L.

2010-01-01

In the present study, we investigated whether the perception of heading of linear self-motion can be explained by Maximum Likelihood Integration (MLI) of visual and non-visual sensory cues. MLI predicts smaller variance for multisensory judgments compared to unisensory judgments. Nine participants
Gait parameter control timing with dynamic manual contact or visual cues

Science.gov (United States)

Shi, Peter; Werner, William

2016-01-01

We investigated the timing of gait parameter changes (stride length, peak toe velocity, and double-, single-support, and complete step duration) to control gait speed. Eleven healthy participants adjusted their gait speed on a treadmill to maintain a constant distance between them and a fore-aft oscillating cue (a place on a conveyor belt surface). The experimental design balanced conditions of cue modality (vision: eyes-open; manual contact: eyes-closed while touching the cue); treadmill speed (0.2, 0.4, 0.85, and 1.3 m/s); and cue motion (none, ±10 cm at 0.09, 0.11, and 0.18 Hz). Correlation analyses revealed a number of temporal relationships between gait parameters and cue speed. The results suggest that neural control ranged from feedforward to feedback. Specifically, step length preceded cue velocity during double-support duration suggesting anticipatory control. Peak toe velocity nearly coincided with its most-correlated cue velocity during single-support duration. The toe-off concluding step and double-support durations followed their most-correlated cue velocity, suggesting feedback control. Cue-tracking accuracy and cue velocity correlations with timing parameters were higher with the manual contact cue than visual cue. The cue/gait timing relationships generalized across cue modalities, albeit with greater delays of step-cycle events relative to manual contact cue velocity. We conclude that individual kinematic parameters of gait are controlled to achieve a desired velocity at different specific times during the gait cycle. The overall timing pattern of instantaneous cue velocities associated with different gait parameters is conserved across cues that afford different performance accuracies. This timing pattern may be temporally shifted to optimize control. Different cue/gait parameter latencies in our nonadaptation paradigm provide general-case evidence of the independent control of gait parameters previously demonstrated in gait adaptation paradigms
Gait parameter control timing with dynamic manual contact or visual cues.

Science.gov (United States)

Rabin, Ely; Shi, Peter; Werner, William

2016-06-01

We investigated the timing of gait parameter changes (stride length, peak toe velocity, and double-, single-support, and complete step duration) to control gait speed. Eleven healthy participants adjusted their gait speed on a treadmill to maintain a constant distance between them and a fore-aft oscillating cue (a place on a conveyor belt surface). The experimental design balanced conditions of cue modality (vision: eyes-open; manual contact: eyes-closed while touching the cue); treadmill speed (0.2, 0.4, 0.85, and 1.3 m/s); and cue motion (none, ±10 cm at 0.09, 0.11, and 0.18 Hz). Correlation analyses revealed a number of temporal relationships between gait parameters and cue speed. The results suggest that neural control ranged from feedforward to feedback. Specifically, step length preceded cue velocity during double-support duration suggesting anticipatory control. Peak toe velocity nearly coincided with its most-correlated cue velocity during single-support duration. The toe-off concluding step and double-support durations followed their most-correlated cue velocity, suggesting feedback control. Cue-tracking accuracy and cue velocity correlations with timing parameters were higher with the manual contact cue than visual cue. The cue/gait timing relationships generalized across cue modalities, albeit with greater delays of step-cycle events relative to manual contact cue velocity. We conclude that individual kinematic parameters of gait are controlled to achieve a desired velocity at different specific times during the gait cycle. The overall timing pattern of instantaneous cue velocities associated with different gait parameters is conserved across cues that afford different performance accuracies. This timing pattern may be temporally shifted to optimize control. Different cue/gait parameter latencies in our nonadaptation paradigm provide general-case evidence of the independent control of gait parameters previously demonstrated in gait adaptation paradigms
Peripheral Visual Cues: Their Fate in Processing and Effects on Attention and Temporal-Order Perception.

Science.gov (United States)

Tünnermann, Jan; Scharlau, Ingrid

2016-01-01

Peripheral visual cues lead to large shifts in psychometric distributions of temporal-order judgments. In one view, such shifts are attributed to attention speeding up processing of the cued stimulus, so-called prior entry. However, sometimes these shifts are so large that it is unlikely that they are caused by attention alone. Here we tested the prevalent alternative explanation that the cue is sometimes confused with the target on a perceptual level, bolstering the shift of the psychometric function. We applied a novel model of cued temporal-order judgments, derived from Bundesen's Theory of Visual Attention. We found that cue-target confusions indeed contribute to shifting psychometric functions. However, cue-induced changes in the processing rates of the target stimuli play an important role, too. At smaller cueing intervals, the cue increased the processing speed of the target. At larger intervals, inhibition of return was predominant. Earlier studies of cued TOJs were insensitive to these effects because in psychometric distributions they are concealed by the conjoint effects of cue-target confusions and processing rate changes.
End-to-end visual speech recognition with LSTMS

NARCIS (Netherlands)

Petridis, Stavros; Li, Zuwei; Pantic, Maja

2017-01-01

Traditional visual speech recognition systems consist of two stages, feature extraction and classification. Recently, several deep learning approaches have been presented which automatically extract features from the mouth images and aim to replace the feature extraction stage. However, research on

Interactive Football-Training Based on Rebounders with Hit Position Sensing and Audio-Visual Feedback

DEFF Research Database (Denmark)

Jensen, Mads Møller; Grønbæk, Kaj; Thomassen, Nikolaj

2014-01-01

. However, most of these tools are created with a single goal, either to measure or train, and are often used and tested in very controlled settings. In this paper, we present an interactive football-training platform, called Football Lab, featuring sensor- mounted rebounders as well as audio-visual...
Visual unimodal grouping mediates auditory attentional bias in visuo-spatial working memory.

Science.gov (United States)

Botta, Fabiano; Lupiáñez, Juan; Sanabria, Daniel

2013-09-01

Audiovisual links in spatial attention have been reported in many previous studies. However, the effectiveness of auditory spatial cues in biasing the information encoding into visuo-spatial working memory (VSWM) is still relatively unknown. In this study, we addressed this issue by combining a cuing paradigm with a change detection task in VSWM. Moreover, we manipulated the perceptual organization of the to-be-remembered visual stimuli. We hypothesized that the auditory effect on VSWM would depend on the perceptual association between the auditory cue and the visual probe. Results showed, for the first time, a significant auditory attentional bias in VSWM. However, the effect was observed only when the to-be-remembered visual stimuli were organized in two distinctive visual objects. We propose that these results shed new light on audio-visual crossmodal links in spatial attention suggesting that, apart from the spatio-temporal contingency, the likelihood of perceptual association between the auditory cue and the visual target can have a large impact on crossmodal attentional biases. Copyright © 2013 Elsevier B.V. All rights reserved.
Forgotten but not gone: Retro-cue costs and benefits in a double-cueing paradigm suggest multiple states in visual short-term memory

NARCIS (Netherlands)

van Moorselaar, D.; Olivers, C.N.L.; Theeuwes, J.; Lamme, V.A.F.; Sligte, I.G.

2015-01-01

Visual short-term memory (VSTM) performance is enhanced when the to-be-tested item is cued after encoding. This so-called retro-cue benefit is typically accompanied by a cost for the noncued items, suggesting that information is lost from VSTM upon presentation of a retrospective cue. Here we
Forgotten but not gone: retro-cue cost and benefits in a double-cueing paradigm suggest multiple states in visual short-term memory

NARCIS (Netherlands)

van Moorselaar, D.; Olivers, C.N.L.; Theeuwes, J.; Lamme, V.A.F.; Sligte, I.G.

2015-01-01

Visual short-term memory (VSTM) performance is enhanced when the to-be-tested item is cued after encoding. This so-called retro-cue benefit is typically accompanied by a cost for the noncued items, suggesting that information is lost from VSTM upon presentation of a retrospective cue. Here we
Non-conscious visual cues related to affect and action alter perception of effort and endurance performance

Directory of Open Access Journals (Sweden)

Anthony William Blanchfield

2014-12-01

Full Text Available The psychobiological model of endurance performance proposes that endurance performance is determined by a decision-making process based on perception of effort and potential motivation. Recent research has reported that effort-based decision-making during cognitive tasks can be altered by non-conscious visual cues relating to affect and action. The effect of these non-conscious visual cues on effort and performance during physical tasks is however unknown. We report two experiments investigating the effect of subliminal priming with visual cues related to affect and action on perception of effort and endurance performance. In Experiment 1 thirteen individuals were subliminally primed with happy or sad faces as they cycled to exhaustion in a counterbalanced and randomized crossover design. A paired t-test (happy vs. sad faces revealed that individuals cycled for significantly longer (178 s, p = .04 when subliminally primed with happy faces. A 2 x 5 (condition x iso-time ANOVA also revealed a significant main effect of condition on rating of perceived exertion (RPE during the time to exhaustion (TTE test with lower RPE when subjects were subliminally primed with happy faces (p = .04. In Experiment 2, a single-subject randomization tests design found that subliminal priming with action words facilitated a significantly longer (399 s, p = .04 TTE in comparison to inaction words (p = .04. Like Experiment 1, this greater TTE was accompanied by a significantly lower RPE (p = .03. These experiments are the first to show that subliminal visual cues relating to affect and action can alter perception of effort and endurance performance. Non-conscious visual cues may therefore influence the effort-based decision-making process that is proposed to determine endurance performance. Accordingly, the findings raise notable implications for individuals who may encounter such visual cues during endurance competitions, training, or health related exercise.
Cueing and Anxiety in a Visual Concept Learning Task.

Science.gov (United States)

Turner, Philip M.

This study investigated the relationship of two anxiety measures (the State-Trait Anxiety Inventory-Trait Form and the S-R Inventory of Anxiousness-Exam Form) to performance on a visual concept-learning task with embedded criterial information. The effect on anxiety reduction of cueing criterial information was also examined, and two levels of…
Acceptance of online audio-visual cultural heritage archive services: a study of the general public

NARCIS (Netherlands)

Ongena, G.; van de Wijngaert, Lidwien; Huizer, E.

2013-01-01

Introduction. This study examines the antecedents of user acceptance of an audio-visual heritage archive for a wider audience (i.e., the general public) by extending the technology acceptance model with the concepts of perceived enjoyment, nostalgia proneness and personal innovativeness. Method. A
Audio scene segmentation for video with generic content

Science.gov (United States)

Niu, Feng; Goela, Naveen; Divakaran, Ajay; Abdel-Mottaleb, Mohamed

2008-01-01

In this paper, we present a content-adaptive audio texture based method to segment video into audio scenes. The audio scene is modeled as a semantically consistent chunk of audio data. Our algorithm is based on "semantic audio texture analysis." At first, we train GMM models for basic audio classes such as speech, music, etc. Then we define the semantic audio texture based on those classes. We study and present two types of scene changes, those corresponding to an overall audio texture change and those corresponding to a special "transition marker" used by the content creator, such as a short stretch of music in a sitcom or silence in dramatic content. Unlike prior work using genre specific heuristics, such as some methods presented for detecting commercials, we adaptively find out if such special transition markers are being used and if so, which of the base classes are being used as markers without any prior knowledge about the content. Our experimental results show that our proposed audio scene segmentation works well across a wide variety of broadcast content genres.
Prosody production networks are modulated by sensory cues and social context.

Science.gov (United States)

Klasen, Martin; von Marschall, Clara; Isman, Güldehen; Zvyagintsev, Mikhail; Gur, Ruben C; Mathiak, Klaus

2018-03-05

The neurobiology of emotional prosody production is not well investigated. In particular, the effects of cues and social context are not known. The present study sought to differentiate cued from free emotion generation and the effect of social feedback from a human listener. Online speech filtering enabled fMRI during prosodic communication in 30 participants. Emotional vocalizations were a) free, b) auditorily cued, c) visually cued, or d) with interactive feedback. In addition to distributed language networks, cued emotions increased activity in auditory and - in case of visual stimuli - visual cortex. Responses were larger in pSTG at the right hemisphere and the ventral striatum when participants were listened to and received feedback from the experimenter. Sensory, language, and reward networks contributed to prosody production and were modulated by cues and social context. The right pSTG is a central hub for communication in social interactions - in particular for interpersonal evaluation of vocal emotions.
Effect of visual cues on the resolution of perceptual ambiguity in Parkinson's disease and normal aging.

Science.gov (United States)

Díaz-Santos, Mirella; Cao, Bo; Mauro, Samantha A; Yazdanbakhsh, Arash; Neargarder, Sandy; Cronin-Golomb, Alice

2015-02-01

Parkinson's disease (PD) and normal aging have been associated with changes in visual perception, including reliance on external cues to guide behavior. This raises the question of the extent to which these groups use visual cues when disambiguating information. Twenty-seven individuals with PD, 23 normal control adults (NC), and 20 younger adults (YA) were presented a Necker cube in which one face was highlighted by thickening the lines defining the face. The hypothesis was that the visual cues would help PD and NC to exert better control over bistable perception. There were three conditions, including passive viewing and two volitional-control conditions (hold one percept in front; and switch: speed up the alternation between the two). In the Hold condition, the cue was either consistent or inconsistent with task instructions. Mean dominance durations (time spent on each percept) under passive viewing were comparable in PD and NC, and shorter in YA. PD and YA increased dominance durations in the Hold cue-consistent condition relative to NC, meaning that appropriate cues helped PD but not NC hold one perceptual interpretation. By contrast, in the Switch condition, NC and YA decreased dominance durations relative to PD, meaning that the use of cues helped NC but not PD in expediting the switch between percepts. Provision of low-level cues has effects on volitional control in PD that are different from in normal aging, and only under task-specific conditions does the use of such cues facilitate the resolution of perceptual ambiguity.
Acute and Chronic Effect of Acoustic and Visual Cues on Gait Training in Parkinson’s Disease: A Randomized, Controlled Study

Directory of Open Access Journals (Sweden)

Roberto De Icco

2015-01-01

Full Text Available In this randomized controlled study we analyse and compare the acute and chronic effects of visual and acoustic cues on gait performance in Parkinson’s Disease (PD. We enrolled 46 patients with idiopathic PD who were assigned to 3 different modalities of gait training: (1 use of acoustic cues, (2 use of visual cues, or (3 overground training without cues. All patients were tested with kinematic analysis of gait at baseline (T0, at the end of the 4-week rehabilitation programme (T1, and 3 months later (T2. Regarding the acute effect, acoustic cues increased stride length and stride duration, while visual cues reduced the number of strides and normalized the stride/stance distribution but also reduced gait speed. As regards the chronic effect of cues, we recorded an improvement in some gait parameters in all 3 groups of patients: all 3 types of training improved gait speed; visual cues also normalized the stance/swing ratio, acoustic cues reduced the number of strides and increased stride length, and overground training improved stride length. The changes were not retained at T2 in any of the experimental groups. Our findings support and characterize the usefulness of cueing strategies in the rehabilitation of gait in PD.
Multiengine Speech Processing Using SNR Estimator in Variable Noisy Environments

Directory of Open Access Journals (Sweden)

Ahmad R. Abu-El-Quran

2012-01-01

Full Text Available We introduce a multiengine speech processing system that can detect the location and the type of audio signal in variable noisy environments. This system detects the location of the audio source using a microphone array; the system examines the audio first, determines if it is speech/nonspeech, then estimates the value of the signal to noise (SNR using a Discrete-Valued SNR Estimator. Using this SNR value, instead of trying to adapt the speech signal to the speech processing system, we adapt the speech processing system to the surrounding environment of the captured speech signal. In this paper, we introduced the Discrete-Valued SNR Estimator and a multiengine classifier, using Multiengine Selection or Multiengine Weighted Fusion. Also we use the SI as example of the speech processing. The Discrete-Valued SNR Estimator achieves an accuracy of 98.4% in characterizing the environment's SNR. Compared to a conventional single engine SI system, the improvement in accuracy was as high as 9.0% and 10.0% for the Multiengine Selection and Multiengine Weighted Fusion, respectively.
Superwideband Bandwidth Extension Using Normalized MDCT Coefficients for Scalable Speech and Audio Coding

Directory of Open Access Journals (Sweden)

Young Han Lee

2013-01-01

Full Text Available A bandwidth extension (BWE algorithm from wideband to superwideband (SWB is proposed for a scalable speech/audio codec that uses modified discrete cosine transform (MDCT coefficients as spectral parameters. The superwideband is first split into several subbands that are represented as gain parameters and normalized MDCT coefficients in the proposed BWE algorithm. We then estimate normalized MDCT coefficients of the wideband to be fetched for the superwideband and quantize the fetch indices. After that, we quantize gain parameters by using relative ratios between adjacent subbands. The proposed BWE algorithm is embedded into a standard superwideband codec, the SWB extension of G.729.1 Annex E, and its bitrate and quality are compared with those of the BWE algorithm already employed in the standard superwideband codec. It is shown from the comparison that the proposed BWE algorithm relatively reduces the bitrate by around 19% with better quality, compared to the BWE algorithm in the SWB extension of G.729.1 Annex E.
UNDERSTANDING PROSE THROUGH TASK ORIENTED AUDIO-VISUAL ACTIVITY: AN AMERICAN MODERN PROSE COURSE AT THE FACULTY OF LETTERS, PETRA CHRISTIAN UNIVERSITY

Directory of Open Access Journals (Sweden)

Sarah Prasasti

2001-01-01

Full Text Available The method presented here provides the basis for a course in American prose for EFL students. Understanding and appreciation of American prose is a difficult task for the students because they come into contact with works that are full of cultural baggage and far apart from their own world. The audio visual aid is one of the alternatives to sensitize the students to the topic and the cultural background. Instead of proving the ready-made audio visual aids, teachers can involve students to actively engage in a more task oriented audiovisual project. Here, the teachers encourage their students to create their own audio visual aids using colors, pictures, sound and gestures as a point of initiation for further discussion. The students can use color that has become a strong element of fiction to help them calling up a forceful visual representation. Pictures can also stimulate the students to build their mental image. Sound and silence, which are a part of the fabric of literature, may also help them to increase the emotional impact.
Memory under pressure: secondary-task effects on contextual cueing of visual search.

Science.gov (United States)

Annac, Efsun; Manginelli, Angela A; Pollmann, Stefan; Shi, Zhuanghua; Müller, Hermann J; Geyer, Thomas

2013-11-04

Repeated display configurations improve visual search. Recently, the question has arisen whether this contextual cueing effect (Chun & Jiang, 1998) is itself mediated by attention, both in terms of selectivity and processing resources deployed. While it is accepted that selective attention modulates contextual cueing (Jiang & Leung, 2005), there is an ongoing debate whether the cueing effect is affected by a secondary working memory (WM) task, specifically at which stage WM influences the cueing effect: the acquisition of configural associations (e.g., Travis, Mattingley, & Dux, 2013) versus the expression of learned associations (e.g., Manginelli, Langer, Klose, & Pollmann, 2013). The present study re-investigated this issue. Observers performed a visual search in combination with a spatial WM task. The latter was applied on either early or late search trials--so as to examine whether WM load hampers the acquisition of or retrieval from contextual memory. Additionally, the WM and search tasks were performed either temporally in parallel or in succession--so as to permit the effects of spatial WM load to be dissociated from those of executive load. The secondary WM task was found to affect cueing in late, but not early, experimental trials--though only when the search and WM tasks were performed in parallel. This pattern suggests that contextual cueing involves a spatial WM resource, with spatial WM providing a workspace linking the current search array with configural long-term memory; as a result, occupying this workspace by a secondary WM task hampers the expression of learned configural associations.
Cue competition affects temporal dynamics of edge-assignment in human visual cortex.

Science.gov (United States)

Brooks, Joseph L; Palmer, Stephen E

2011-03-01

Edge-assignment determines the perception of relative depth across an edge and the shape of the closer side. Many cues determine edge-assignment, but relatively little is known about the neural mechanisms involved in combining these cues. Here, we manipulated extremal edge and attention cues to bias edge-assignment such that these two cues either cooperated or competed. To index their neural representations, we flickered figure and ground regions at different frequencies and measured the corresponding steady-state visual-evoked potentials (SSVEPs). Figural regions had stronger SSVEP responses than ground regions, independent of whether they were attended or unattended. In addition, competition and cooperation between the two edge-assignment cues significantly affected the temporal dynamics of edge-assignment processes. The figural SSVEP response peaked earlier when the cues causing it cooperated than when they competed, but sustained edge-assignment effects were equivalent for cooperating and competing cues, consistent with a winner-take-all outcome. These results provide physiological evidence that figure-ground organization involves competitive processes that can affect the latency of figural assignment.
Open-Loop Audio-Visual Stimulation (AVS): A Useful Tool for Management of Insomnia?

Science.gov (United States)

Tang, Hsin-Yi Jean; Riegel, Barbara; McCurry, Susan M; Vitiello, Michael V

2016-03-01

Audio Visual Stimulation (AVS), a form of neurofeedback, is a non-pharmacological intervention that has been used for both performance enhancement and symptom management. We review the history of AVS, its two sub-types (close- and open-loop), and discuss its clinical implications. We also describe a promising new application of AVS to improve sleep, and potentially decrease pain. AVS research can be traced back to the late 1800s. AVS's efficacy has been demonstrated for both performance enhancement and symptom management. Although AVS is commonly used in clinical settings, there is limited literature evaluating clinical outcomes and mechanisms of action. One of the challenges to AVS research is the lack of standardized terms, which makes systematic review and literature consolidation difficult. Future studies using AVS as an intervention should; (1) use operational definitions that are consistent with the existing literature, such as AVS, Audio-visual Entrainment, or Light and Sound Stimulation, (2) provide a clear rationale for the chosen training frequency modality, (3) use a randomized controlled design, and (4) follow the Consolidated Standards of Reporting Trials and/or related guidelines when disseminating results.
Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals

Science.gov (United States)

Lidestam, Björn; Rönnberg, Jerker

2016-01-01

The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context. PMID:27317667
Design and realisation of an audiovisual speech activity detector

NARCIS (Netherlands)

Van Bree, K.C.

2006-01-01

For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will givefalse positives when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach
Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect.

Science.gov (United States)

Van Engen, Kristin J; Xie, Zilong; Chandrasekaran, Bharath

2017-02-01

In noisy situations, visual information plays a critical role in the success of speech communication: listeners are better able to understand speech when they can see the speaker. Visual influence on auditory speech perception is also observed in the McGurk effect, in which discrepant visual information alters listeners' auditory perception of a spoken syllable. When hearing /ba/ while seeing a person saying /ga/, for example, listeners may report hearing /da/. Because these two phenomena have been assumed to arise from a common integration mechanism, the McGurk effect has often been used as a measure of audiovisual integration in speech perception. In this study, we test whether this assumed relationship exists within individual listeners. We measured participants' susceptibility to the McGurk illusion as well as their ability to identify sentences in noise across a range of signal-to-noise ratios in audio-only and audiovisual modalities. Our results do not show a relationship between listeners' McGurk susceptibility and their ability to use visual cues to understand spoken sentences in noise, suggesting that McGurk susceptibility may not be a valid measure of audiovisual integration in everyday speech processing.

Word segmentation with universal prosodic cues.

Science.gov (United States)

Endress, Ansgar D; Hauser, Marc D

2010-09-01

When listening to speech from one's native language, words seem to be well separated from one another, like beads on a string. When listening to a foreign language, in contrast, words seem almost impossible to extract, as if there was only one bead on the same string. This contrast reveals that there are language-specific cues to segmentation. The puzzle, however, is that infants must be endowed with a language-independent mechanism for segmentation, as they ultimately solve the segmentation problem for any native language. Here, we approach the acquisition problem by asking whether there are language-independent cues to segmentation that might be available to even adult learners who have already acquired a native language. We show that adult learners recognize words in connected speech when only prosodic cues to word-boundaries are given from languages unfamiliar to the participants. In both artificial and natural speech, adult English speakers, with no prior exposure to the test languages, readily recognized words in natural languages with critically different prosodic patterns, including French, Turkish and Hungarian. We suggest that, even though languages differ in their sound structures, they carry universal prosodic characteristics. Further, these language-invariant prosodic cues provide a universally accessible mechanism for finding words in connected speech. These cues may enable infants to start acquiring words in any language even before they are fine-tuned to the sound structure of their native language. Copyright © 2010. Published by Elsevier Inc.
Inner Sound: Altered States of Consciousness in Electronic Music and Audio-Visual Media

DEFF Research Database (Denmark)

Weinel, Jonathan

Over the last century, developments in electronic music and art have enabled new possibilities for creating audio and audio-visual artworks. With this new potential has come the possibility for representing subjective internal conscious states, such as the experience of hallucinations, using...... the creative influence of ASCs, from Amazonian chicha festivals to the synaesthetic assaults of neon raves; and from an immersive outdoor electroacoustic performance on an Athenian hilltop to a mushroom trip on a tropical island in virtual reality. Beginning with a discussion of consciousness, the book...... explores how our subjective realities may change during states of dream, psychedelic experience, meditation, and trance. Taking a broad view across a wide range of genres, Inner Sound draws connections between shamanic art and music, and the modern technoshamanism of psychedelic rock, electronic dance...
Relationship between age at menarche and exposure to sexual content in audio-visual media and other factors in Islamic junior high school girls

Directory of Open Access Journals (Sweden)

Tity Wulandari

2018-01-01

Full Text Available Background In recent decades, girls have experienced menarche at earlier ages, which may have negative effects on health. Exposure to audio-visual media and other factors may influence the age at menarche, although past studies have produced inconsistent results. Objective To assess for relationships between the age at menarche and audio-visual media exposure, socio-economic status, nutritional status, physical activity, and psychosocial dysfunction in adolescent girls. Methods This cross-sectional study was conducted from August to October 2015 in students from two integrated Islamic junior high schools in Medan, North Sumatera. There were 216 students who met the inclusion criteria: aged 10-16 years and experienced menarche. They were asked to fill out questionnaires that had been previously validated, regarding their history of exposure to audio-visual media, physical activity, and psychosocial dysfunction. The data were analyzed by Chi-square and Fisher’s exact tests in order to assess for relationships between audio-visual media exposure and other potential factors with the age at menarche. Results Of 261 female students at the two schools, 216 had undergone menarche, with a mean age at menarche of 11.6 (SD 1.13 years. There was no significant relationship between age at menarche and audio-visual media exposure (P=0.68. Also, there were no significant relationships between factors such as socio-economic and psychosocial status with age at menarche (P=0.64 and P=0.28, respectively. However, there were significant relationships between earlier age at menarche and overweight/obese nutritional status (P=0.02 as well as low physical activity (P=0.01. Multivariate logistic regression analysis showed that low physical activity had the strongest influence on early menarche (RP=2.40; 95%CI 0.92 to 6.24. Conclusion Age at menarche is not significantly associated with sexual content of audio-visual media exposure. However, there were significant
The language used in describing autobiographical memories prompted by life period visually presented verbal cues, event-specific visually presented verbal cues and short musical clips of popular music.

Science.gov (United States)

Zator, Krysten; Katz, Albert N

2017-07-01

Here, we examined linguistic differences in the reports of memories produced by three cueing methods. Two groups of young adults were cued visually either by words representing events or popular cultural phenomena that took place when they were 5, 10, or 16 years of age, or by words referencing a general lifetime period word cue directing them to that period in their life. A third group heard 30-second long musical clips of songs popular during the same three time periods. In each condition, participants typed a specific event memory evoked by the cue and these typed memories were subjected to analysis by the Linguistic Inquiry and Word Count (LIWC) program. Differences in the reports produced indicated that listening to music evoked memories embodied in motor-perceptual systems more so than memories evoked by our word-cueing conditions. Additionally, relative to music cues, lifetime period word cues produced memories with reliably more uses of personal pronouns, past tense terms, and negative emotions. The findings provide evidence for the embodiment of autobiographical memories, and how those differ when the cues emphasise different aspects of the encoded events.
Paper-Based Textbooks with Audio Support for Print-Disabled Students.

Science.gov (United States)

Fujiyoshi, Akio; Ohsawa, Akiko; Takaira, Takuya; Tani, Yoshiaki; Fujiyoshi, Mamoru; Ota, Yuko

2015-01-01

Utilizing invisible 2-dimensional codes and digital audio players with a 2-dimensional code scanner, we developed paper-based textbooks with audio support for students with print disabilities, called "multimodal textbooks." Multimodal textbooks can be read with the combination of the two modes: "reading printed text" and "listening to the speech of the text from a digital audio player with a 2-dimensional code scanner." Since multimodal textbooks look the same as regular textbooks and the price of a digital audio player is reasonable (about 30 euro), we think multimodal textbooks are suitable for students with print disabilities in ordinary classrooms.
Forgotten but Not Gone: Retro-Cue Costs and Benefits in a Double-Cueing Paradigm Suggest Multiple States in Visual Short-Term Memory

Science.gov (United States)

van Moorselaar, Dirk; Olivers, Christian N. L.; Theeuwes, Jan; Lamme, Victor A. F.; Sligte, Ilja G.

2015-01-01

Visual short-term memory (VSTM) performance is enhanced when the to-be-tested item is cued after encoding. This so-called retro-cue benefit is typically accompanied by a cost for the noncued items, suggesting that information is lost from VSTM upon presentation of a retrospective cue. Here we assessed whether noncued items can be restored to VSTM…
Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

Directory of Open Access Journals (Sweden)

Patterson Eric K

2002-01-01

Full Text Available Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are
Audiovisual Perception of Noise Vocoded Speech in Dyslexic and Non-Dyslexic Adults: The Role of Low-Frequency Visual Modulations

Science.gov (United States)

Megnin-Viggars, Odette; Goswami, Usha

2013-01-01

Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…
The use of audio-visual methods in radiology and physics courses

International Nuclear Information System (INIS)

Holmberg, P.

1987-01-01

Today's medicine utilizes sophisticated equipment for radiological, biochemical and microbiological investigation procedures and analyses. Hence it is necessary that physicans have adequate scientific and technical knowledge of the apparatus they are using so that the equipment can be used in the most effective way. Partly this knowledge is obtained from science-orientated courses in the preclinical stage of the study program for medical students. To increase the motivation to study science-courses (medical physics) audio-visual methods are used to describe diagnostic and therapeutic procedures in the clinical routines. (orig.)
Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

Science.gov (United States)

Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

2018-05-01

Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Seeing the talker’s face supports executive processing of speech in steady state noise

OpenAIRE

Sushmit eMishra; Thomas eLunner; Thomas eLunner; Thomas eLunner; Stefan eStenfelt; Stefan eStenfelt; Jerker eRönnberg; Mary eRudner

2013-01-01

Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC) can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013) along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-st...
The identification and modeling of visual cue usage in manual control task experiments

Science.gov (United States)

Sweet, Barbara Townsend

Many fields of endeavor require humans to conduct manual control tasks while viewing a perspective scene. Manual control refers to tasks in which continuous, or nearly continuous, control adjustments are required. Examples include flying an aircraft, driving a car, and riding a bicycle. Perspective scenes can arise through natural viewing of the world, simulation of a scene (as in flight simulators), or through imaging devices (such as the cameras on an unmanned aerospace vehicle). Designers frequently have some degree of control over the content and characteristics of a perspective scene; airport designers can choose runway markings, vehicle designers can influence the size and shape of windows, as well as the location of the pilot, and simulator database designers can choose scene complexity and content. Little theoretical framework exists to help designers determine the answers to questions related to perspective scene content. An empirical approach is most commonly used to determine optimum perspective scene configurations. The goal of the research effort described in this dissertation has been to provide a tool for modeling the characteristics of human operators conducting manual control tasks with perspective-scene viewing. This is done for the purpose of providing an algorithmic, as opposed to empirical, method for analyzing the effects of changing perspective scene content for closed-loop manual control tasks. The dissertation contains the development of a model of manual control using a perspective scene, called the Visual Cue Control (VCC) Model. Two forms of model were developed: one model presumed that the operator obtained both position and velocity information from one visual cue, and the other model presumed that the operator used one visual cue for position, and another for velocity. The models were compared and validated in two experiments. The results show that the two-cue VCC model accurately characterizes the output of the human operator with a
Effect of Performing a Boundary-Avoidance Tracking Task on the Perception of Coherence Between Visual and Inertial Cues

NARCIS (Netherlands)

Valente Pais, A.R.; Van Paassen, M.M.; Mulder, M.; Wentink, M.

2011-01-01

During flight simulation, the inertial and visual stimuli provided to the pilot differ considerably. For successful design of motion cueing algorithms it is necessary to gather knowledge on how pilots perceive the difference between visual and inertial cues. Some of the work done on this topic has
Audio Description as a Pedagogical Tool

Directory of Open Access Journals (Sweden)

Georgina Kleege

2015-05-01

Full Text Available Audio description is the process of translating visual information into words for people who are blind or have low vision. Typically such description has focused on films, museum exhibitions, images and video on the internet, and live theater. Because it allows people with visual impairments to experience a variety of cultural and educational texts that would otherwise be inaccessible, audio description is a mandated aspect of disability inclusion, although it remains markedly underdeveloped and underutilized in our classrooms and in society in general. Along with increasing awareness of disability, audio description pushes students to practice close reading of visual material, deepen their analysis, and engage in critical discussions around the methodology, standards and values, language, and role of interpretation in a variety of academic disciplines. We outline a few pedagogical interventions that can be customized to different contexts to develop students' writing and critical thinking skills through guided description of visual material.
Do cattle (Bos taurus) retain an association of a visual cue with a food reward for a year?

Science.gov (United States)

Hirata, Masahiko; Takeno, Nozomi

2014-06-01

Use of visual cues to locate specific food resources from a distance is a critical ability of animals foraging in a spatially heterogeneous environment. However, relatively little is known about how long animals can retain the learned cue-reward association without reinforcement. We compared feeding behavior of experienced and naive Japanese Black cows (Bos taurus) in discovering food locations in a pasture. Experienced animals had been trained to respond to a visual cue (plastic washtub) for a preferred food (grain-based concentrate) 1 year prior to the experiment, while naive animals had no exposure to the cue. Cows were tested individually in a test arena including tubs filled with the concentrate on three successive days (Days 1-3). Experienced cows located the first tub more quickly and visited more tubs than naive cows on Day 1 (usually P visual cue with a food reward within a day and retain the association for 1 year despite a slight decay. © 2014 Japanese Society of Animal Science.
Retrospective Cues Based on Object Features Improve Visual Working Memory Performance in Older Adults

OpenAIRE

Gilchrist, Amanda L.; Duarte, Audrey; Verhaeghen, Paul

2015-01-01

Research with younger adults has shown that retrospective cues can be used to orient top-down attention toward relevant items in working memory. We examined whether older adults could take advantage of these cues to improve memory performance. Younger and older adults were presented with visual arrays of five colored shapes; during maintenance, participants were either presented with an informative cue based on an object feature (here, object shape or color) that would be probed, or with an u...
An integrated audio-visual impact tool for wind turbine installations

International Nuclear Information System (INIS)

Lymberopoulos, N.; Belessis, M.; Wood, M.; Voutsinas, S.

1996-01-01

An integrated software tool was developed for the design of wind parks that takes into account their visual and audio impact. The application is built on a powerful hardware platform and is fully operated through a graphic user interface. The topography, the wind turbines and the daylight conditions are realised digitally. The wind park can be animated in real time and the user can take virtual walks in it while the set-up of the park can be altered interactively. In parallel, the wind speed levels on the terrain, the emitted noise intensity, the annual energy output and the cash flow can be estimated at any stage of the session and prompt the user for rearrangements. The tool has been used to visually simulate existing wind parks in St. Breok, UK and Andros Island, Greece. The results lead to the conclusion that such a tool can assist to the public acceptance and licensing procedures of wind parks. (author)
"Slight" of hand: the processing of visually degraded gestures with speech.

Science.gov (United States)

Kelly, Spencer D; Hansen, Bruce C; Clark, David T

2012-01-01

Co-speech hand gestures influence language comprehension. The present experiment explored what part of the visual processing system is optimized for processing these gestures. Participants viewed short video clips of speech and gestures (e.g., a person saying "chop" or "twist" while making a chopping gesture) and had to determine whether the two modalities were congruent or incongruent. Gesture videos were designed to stimulate the parvocellular or magnocellular visual pathways by filtering out low or high spatial frequencies (HSF versus LSF) at two levels of degradation severity (moderate and severe). Participants were less accurate and slower at processing gesture and speech at severe versus moderate levels of degradation. In addition, they were slower for LSF versus HSF stimuli, and this difference was most pronounced in the severely degraded condition. However, exploratory item analyses showed that the HSF advantage was modulated by the range of motion and amount of motion energy in each video. The results suggest that hand gestures exploit a wide range of spatial frequencies, and depending on what frequencies carry the most motion energy, parvocellular or magnocellular visual pathways are maximized to quickly and optimally extract meaning.
Feasibility study to assess clinical applications of 3-T cine MRI coupled with synchronous audio recording during speech in evaluation of velopharyngeal insufficiency in children.

Science.gov (United States)

Sagar, Pallavi; Nimkin, Katherine

2015-02-01

In the past decade, there has been increased utilization of magnetic resonance imaging (MRI) in evaluating and understanding velopharyngeal insufficiency (VPI). To our knowledge, none of the prior studies with MRI has simultaneously linked the audio recordings of speech during cine MRI acquisition with the corresponding images and created a video for evaluating VPI. To develop an MRI protocol with static and cine sequences during phonation to evaluate for VPI in children and compare the findings to nasopharyngoscopy and videofluoroscopy. Five children, ages 8-16 years, with known VPI, who had previously undergone nasopharyngoscopy and videofluoroscopy, were included. MRI examination was performed on a 3-T Siemens scanner. Anatomical data was obtained using an isotropic T2-weighted 3-D SPACE sequence with multiplanar reformation capability. Dynamic data was obtained using 2-D FLASH cine sequences of the airway in three imaging planes during phonation. Audio recordings were captured by a MRI compatible optical microphone. All five cases had MRI and nasopharyngoscopy and four had videofluoroscopy performed. VPI was identified by MRI in all five patients. The location and severity of the velopharyngeal gap, closure pattern, velar size and shape and levator veli palatini (LVP) muscle were identified in all patients. MRI was superior in visualizing the integrity of the LVP muscle. MRI was unable to identify hemipalatal weakness in one case. In a case of stress-induced VPI, occurring only during clarinet playing, cine MRI demonstrated discordant findings of a velopharyngeal gap during phonatory tasks but not with instrument playing. Overall, there was satisfactory correlation among MRI, nasopharyngoscopy and videofluoroscopy findings. Cine MRI of the airway during speech is a noninvasive, well-tolerated diagnostic imaging tool that has the potential to serve as a guide prior to and after surgical correction of VPI. MRI provided superior anatomical detail of the levator
Feasibility study to assess clinical applications of 3-T cine MRI coupled with synchronous audio recording during speech in evaluation of velopharyngeal insufficiency in children

International Nuclear Information System (INIS)

Sagar, Pallavi; Nimkin, Katherine

2015-01-01

In the past decade, there has been increased utilization of magnetic resonance imaging (MRI) in evaluating and understanding velopharyngeal insufficiency (VPI). To our knowledge, none of the prior studies with MRI has simultaneously linked the audio recordings of speech during cine MRI acquisition with the corresponding images and created a video for evaluating VPI. To develop an MRI protocol with static and cine sequences during phonation to evaluate for VPI in children and compare the findings to nasopharyngoscopy and videofluoroscopy. Five children, ages 8-16 years, with known VPI, who had previously undergone nasopharyngoscopy and videofluoroscopy, were included. MRI examination was performed on a 3-T Siemens scanner. Anatomical data was obtained using an isotropic T2-weighted 3-D SPACE sequence with multiplanar reformation capability. Dynamic data was obtained using 2-D FLASH cine sequences of the airway in three imaging planes during phonation. Audio recordings were captured by a MRI compatible optical microphone. All five cases had MRI and nasopharyngoscopy and four had videofluoroscopy performed. VPI was identified by MRI in all five patients. The location and severity of the velopharyngeal gap, closure pattern, velar size and shape and levator veli palatini (LVP) muscle were identified in all patients. MRI was superior in visualizing the integrity of the LVP muscle. MRI was unable to identify hemipalatal weakness in one case. In a case of stress-induced VPI, occurring only during clarinet playing, cine MRI demonstrated discordant findings of a velopharyngeal gap during phonatory tasks but not with instrument playing. Overall, there was satisfactory correlation among MRI, nasopharyngoscopy and videofluoroscopy findings. Cine MRI of the airway during speech is a noninvasive, well-tolerated diagnostic imaging tool that has the potential to serve as a guide prior to and after surgical correction of VPI. MRI provided superior anatomical detail of the levator

Feasibility study to assess clinical applications of 3-T cine MRI coupled with synchronous audio recording during speech in evaluation of velopharyngeal insufficiency in children

Energy Technology Data Exchange (ETDEWEB)

Sagar, Pallavi; Nimkin, Katherine [Massachusetts General Hospital, Department of Radiology, Division of Pediatric Radiology, Boston, MA (United States)

2014-08-16

In the past decade, there has been increased utilization of magnetic resonance imaging (MRI) in evaluating and understanding velopharyngeal insufficiency (VPI). To our knowledge, none of the prior studies with MRI has simultaneously linked the audio recordings of speech during cine MRI acquisition with the corresponding images and created a video for evaluating VPI. To develop an MRI protocol with static and cine sequences during phonation to evaluate for VPI in children and compare the findings to nasopharyngoscopy and videofluoroscopy. Five children, ages 8-16 years, with known VPI, who had previously undergone nasopharyngoscopy and videofluoroscopy, were included. MRI examination was performed on a 3-T Siemens scanner. Anatomical data was obtained using an isotropic T2-weighted 3-D SPACE sequence with multiplanar reformation capability. Dynamic data was obtained using 2-D FLASH cine sequences of the airway in three imaging planes during phonation. Audio recordings were captured by a MRI compatible optical microphone. All five cases had MRI and nasopharyngoscopy and four had videofluoroscopy performed. VPI was identified by MRI in all five patients. The location and severity of the velopharyngeal gap, closure pattern, velar size and shape and levator veli palatini (LVP) muscle were identified in all patients. MRI was superior in visualizing the integrity of the LVP muscle. MRI was unable to identify hemipalatal weakness in one case. In a case of stress-induced VPI, occurring only during clarinet playing, cine MRI demonstrated discordant findings of a velopharyngeal gap during phonatory tasks but not with instrument playing. Overall, there was satisfactory correlation among MRI, nasopharyngoscopy and videofluoroscopy findings. Cine MRI of the airway during speech is a noninvasive, well-tolerated diagnostic imaging tool that has the potential to serve as a guide prior to and after surgical correction of VPI. MRI provided superior anatomical detail of the levator
Audiovisual Discrimination between Laughter and Speech

NARCIS (Netherlands)

Petridis, Stavros; Pantic, Maja

Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audiovisual approach to distinguishing laughter from speech and we show that integrating the information from audio and video leads to an improved reliability of audiovisual approach in
All I saw was the cake. Hunger effects on attentional capture by visual food cues.

Science.gov (United States)

Piech, Richard M; Pastorino, Michael T; Zald, David H

2010-06-01

While effects of hunger on motivation and food reward value are well-established, far less is known about the effects of hunger on cognitive processes. Here, we deployed the emotional blink of attention paradigm to investigate the impact of visual food cues on attentional capture under conditions of hunger and satiety. Participants were asked to detect targets which appeared in a rapid visual stream after different types of task irrelevant distractors. We observed that food stimuli acquired increased power to capture attention and prevent target detection when participants were hungry. This occurred despite monetary incentives to perform well. Our findings suggest an attentional mechanism through which hunger heightens perception of food cues. As an objective behavioral marker of the attentional sensitivity to food cues, the emotional attentional blink paradigm may provide a useful technique for studying individual differences, and state manipulations in the sensitivity to food cues. Published by Elsevier Ltd.
Atypical Visual Orienting to Gaze- and Arrow-Cues in Adults with High Functioning Autism

Science.gov (United States)

Vlamings, Petra H. J. M.; Stauder, Johannes E. A.; van Son, Ilona A. M.; Mottron, Laurent

2005-01-01

The present study investigates visual orienting to directional cues (arrow or eyes) in adults with high functioning autism (n = 19) and age matched controls (n = 19). A choice reaction time paradigm is used in which eye-or arrow direction correctly (congruent) or incorrectly (incongruent) cues target location. In typically developing participants,…
Infants' Selective Attention to Reliable Visual Cues in the Presence of Salient Distractors

Science.gov (United States)

Tummeltshammer, Kristen Swan; Mareschal, Denis; Kirkham, Natasha Z.

2014-01-01

With many features competing for attention in their visual environment, infants must learn to deploy attention toward informative cues while ignoring distractions. Three eye tracking experiments were conducted to investigate whether 6- and 8-month-olds (total N = 102) would shift attention away from a distractor stimulus to learn a cue-reward…
Audio-Visual Integration Modifies Emotional Judgment in Music

Directory of Open Access Journals (Sweden)

Shen-Yuan Su

2011-10-01

Full Text Available The conventional view that perceived emotion in music is derived mainly from auditory signals has led to neglect of the contribution of visual image. In this study, we manipulated mode (major vs. minor and examined the influence of a video image on emotional judgment in music. Melodies in either major or minor mode were controlled for tempo and rhythm and played to the participants. We found that Taiwanese participants, like Westerners, judged major melodies as expressing positive, and minor melodies negative, emotions. The major or minor melodies were then paired with video images of the singers, which were either emotionally congruent or incongruent with their modes. Results showed that participants perceived stronger positive or negative emotions with congruent audio-visual stimuli. Compared to listening to music alone, stronger emotions were perceived when an emotionally congruent video image was added and weaker emotions were perceived when an incongruent image was added. We therefore demonstrate that mode is important to perceive the emotional valence in music and that treating musical art as a purely auditory event might lose the enhanced emotional strength perceived in music, since going to a concert may lead to stronger perceived emotion than listening to the CD at home.
Forgotten but not gone: Retro-cue costs and benefits in a double-cueing paradigm suggest multiple states in visual short-term memory.

Science.gov (United States)

van Moorselaar, Dirk; Olivers, Christian N L; Theeuwes, Jan; Lamme, Victor A F; Sligte, Ilja G

2015-11-01

Visual short-term memory (VSTM) performance is enhanced when the to-be-tested item is cued after encoding. This so-called retro-cue benefit is typically accompanied by a cost for the noncued items, suggesting that information is lost from VSTM upon presentation of a retrospective cue. Here we assessed whether noncued items can be restored to VSTM when made relevant again by a subsequent second cue. We presented either 1 or 2 consecutive retro-cues (80% valid) during the retention interval of a change-detection task. Relative to no cue, a valid cue increased VSTM capacity by 2 items, while an invalid cue decreased capacity by 2. Importantly, when a second, valid cue followed an invalid cue, capacity regained 2 items, so that performance was back on par. In addition, when the second cue was also invalid, there was no extra loss of information from VSTM, suggesting that those items that survived a first invalid cue, automatically also survived a second. We conclude that these results are in support of a very versatile VSTM system, in which memoranda adopt different representational states depending on whether they are deemed relevant now, in the future, or not at all. We discuss a neural model that is consistent with this conclusion. (c) 2015 APA, all rights reserved).
A Bit Stream Scalable Speech/Audio Coder Combining Enhanced Regular Pulse Excitation and Parametric Coding

Directory of Open Access Journals (Sweden)

Albertus C. den Brinker

2007-01-01

Full Text Available This paper introduces a new audio and speech broadband coding technique based on the combination of a pulse excitation coder and a standardized parametric coder, namely, MPEG-4 high-quality parametric coder. After presenting a series of enhancements to regular pulse excitation (RPE to make it suitable for the modeling of broadband signals, it is shown how pulse and parametric codings complement each other and how they can be merged to yield a layered bit stream scalable coder able to operate at different points in the quality bit rate plane. The performance of the proposed coder is evaluated in a listening test. The major result is that the extra functionality of the bit stream scalability does not come at the price of a reduced performance since the coder is competitive with standardized coders (MP3, AAC, SSC.
A Bit Stream Scalable Speech/Audio Coder Combining Enhanced Regular Pulse Excitation and Parametric Coding

Science.gov (United States)

Riera-Palou, Felip; den Brinker, Albertus C.

2007-12-01

This paper introduces a new audio and speech broadband coding technique based on the combination of a pulse excitation coder and a standardized parametric coder, namely, MPEG-4 high-quality parametric coder. After presenting a series of enhancements to regular pulse excitation (RPE) to make it suitable for the modeling of broadband signals, it is shown how pulse and parametric codings complement each other and how they can be merged to yield a layered bit stream scalable coder able to operate at different points in the quality bit rate plane. The performance of the proposed coder is evaluated in a listening test. The major result is that the extra functionality of the bit stream scalability does not come at the price of a reduced performance since the coder is competitive with standardized coders (MP3, AAC, SSC).
Awareness in contextual cueing of visual search as measured with concurrent access- and phenomenal-consciousness tasks.

Science.gov (United States)

Schlagbauer, Bernhard; Müller, Hermann J; Zehetleitner, Michael; Geyer, Thomas

2012-10-25

In visual search, context information can serve as a cue to guide attention to the target location. When observers repeatedly encounter displays with identical target-distractor arrangements, reaction times (RTs) are faster for repeated relative to nonrepeated displays, the latter containing novel configurations. This effect has been termed "contextual cueing." The present study asked whether information about the target location in repeated displays is "explicit" (or "conscious") in nature. To examine this issue, observers performed a test session (after an initial training phase in which RTs to repeated and nonrepeated displays were measured) in which the search stimuli were presented briefly and terminated by visual masks; following this, observers had to make a target localization response (with accuracy as the dependent measure) and indicate their visual experience and confidence associated with the localization response. The data were examined at the level of individual displays, i.e., in terms of whether or not a repeated display actually produced contextual cueing. The results were that (a) contextual cueing was driven by only a very small number of about four actually learned configurations; (b) localization accuracy was increased for learned relative to nonrepeated displays; and (c) both consciousness measures were enhanced for learned compared to nonrepeated displays. It is concluded that contextual cueing is driven by only a few repeated displays and the ability to locate the target in these displays is associated with increased visual experience.
Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus.

Science.gov (United States)

Zhu, Lin L; Beauchamp, Michael S

2017-03-08

Cortex in and around the human posterior superior temporal sulcus (pSTS) is known to be critical for speech perception. The pSTS responds to both the visual modality (especially biological motion) and the auditory modality (especially human voices). Using fMRI in single subjects with no spatial smoothing, we show that visual and auditory selectivity are linked. Regions of the pSTS were identified that preferred visually presented moving mouths (presented in isolation or as part of a whole face) or moving eyes. Mouth-preferring regions responded strongly to voices and showed a significant preference for vocal compared with nonvocal sounds. In contrast, eye-preferring regions did not respond to either vocal or nonvocal sounds. The converse was also true: regions of the pSTS that showed a significant response to speech or preferred vocal to nonvocal sounds responded more strongly to visually presented mouths than eyes. These findings can be explained by environmental statistics. In natural environments, humans see visual mouth movements at the same time as they hear voices, while there is no auditory accompaniment to visual eye movements. The strength of a voxel's preference for visual mouth movements was strongly correlated with the magnitude of its auditory speech response and its preference for vocal sounds, suggesting that visual and auditory speech features are coded together in small populations of neurons within the pSTS. SIGNIFICANCE STATEMENT Humans interacting face to face make use of auditory cues from the talker's voice and visual cues from the talker's mouth to understand speech. The human posterior superior temporal sulcus (pSTS), a brain region known to be important for speech perception, is complex, with some regions responding to specific visual stimuli and others to specific auditory stimuli. Using BOLD fMRI, we show that the natural statistics of human speech, in which voices co-occur with mouth movements, are reflected in the neural architecture of
Perception and the temporal properties of speech

Science.gov (United States)

Gordon, Peter C.

1991-11-01

Four experiments addressing the role of attention in phonetic perception are reported. The first experiment shows that the relative importance of two cues to the voicing distinction changes when subjects must perform an arithmetic distractor task at the same time as identifying a speech stimulus. The voice onset time cue loses phonetic significance when subjects are distracted, while the F0 onset frequency cue does not. The second experiment shows a similar pattern for two cues to the distinction between the vowels /i/ (as in 'beat') and /I/ (as in 'bit'). Together these experiments indicate that careful attention to speech perception is necessary for strong acoustic cues to achieve their full phonetic impact, while weaker acoustic cues achieve their full phonetic impact without close attention. Experiment 3 shows that this pattern is obtained when the distractor task places little demand on verbal short term memory. Experiment 4 provides a large data set for testing formal models of the role of attention in speech perception. Attention is shown to influence the signal to noise ratio in phonetic encoding. This principle is instantiated in a network model in which the role of attention is to reduce noise in the phonetic encoding of acoustic cues. Implications of this work for understanding speech perception and general theories of the role of attention in perception are discussed.
Online Dissection Audio-Visual Resources for Human Anatomy: Undergraduate Medical Students' Usage and Learning Outcomes

Science.gov (United States)

Choi-Lundberg, Derek L.; Cuellar, William A.; Williams, Anne-Marie M.

2016-01-01

In an attempt to improve undergraduate medical student preparation for and learning from dissection sessions, dissection audio-visual resources (DAVR) were developed. Data from e-learning management systems indicated DAVR were accessed by 28% ± 10 (mean ± SD for nine DAVR across three years) of students prior to the corresponding dissection…
Usability of Three-dimensional Augmented Visual Cues Delivered by Smart Glasses on (Freezing of Gait in Parkinson’s Disease

Directory of Open Access Journals (Sweden)

Sabine Janssen

2017-06-01

Full Text Available External cueing is a potentially effective strategy to reduce freezing of gait (FOG in persons with Parkinson’s disease (PD. Case reports suggest that three-dimensional (3D cues might be more effective in reducing FOG than two-dimensional cues. We investigate the usability of 3D augmented reality visual cues delivered by smart glasses in comparison to conventional 3D transverse bars on the floor and auditory cueing via a metronome in reducing FOG and improving gait parameters. In laboratory experiments, 25 persons with PD and FOG performed walking tasks while wearing custom-made smart glasses under five conditions, at the end-of-dose. For two conditions, augmented visual cues (bars/staircase were displayed via the smart glasses. The control conditions involved conventional 3D transverse bars on the floor, auditory cueing via a metronome, and no cueing. The number of FOG episodes and percentage of time spent on FOG were rated from video recordings. The stride length and its variability, cycle time and its variability, cadence, and speed were calculated from motion data collected with a motion capture suit equipped with 17 inertial measurement units. A total of 300 FOG episodes occurred in 19 out of 25 participants. There were no statistically significant differences in number of FOG episodes and percentage of time spent on FOG across the five conditions. The conventional bars increased stride length, cycle time, and stride length variability, while decreasing cadence and speed. No effects for the other conditions were found. Participants preferred the metronome most, and the augmented staircase least. They suggested to improve the comfort, esthetics, usability, field of view, and stability of the smart glasses on the head and to reduce their weight and size. In their current form, augmented visual cues delivered by smart glasses are not beneficial for persons with PD and FOG. This could be attributable to distraction, blockage of visual
The Influence of Visual and Auditory Information on the Perception of Speech and Non-Speech Oral Movements in Patients with Left Hemisphere Lesions

Science.gov (United States)

Schmid, Gabriele; Thielmann, Anke; Ziegler, Wolfram

2009-01-01

Patients with lesions of the left hemisphere often suffer from oral-facial apraxia, apraxia of speech, and aphasia. In these patients, visual features often play a critical role in speech and language therapy, when pictured lip shapes or the therapist's visible mouth movements are used to facilitate speech production and articulation. This demands…
A Comparison of Television and Audio Presentations of the MLA French Listening Examination

Science.gov (United States)

Stallings, William M.

1972-01-01

Although nonverbal cues are often available in real-life communication, listening is usually tested by aural stimuli broadcast from an audio-tape. It would seem that testing listening comprehension might be improved by using television to offer nonverbal cues in addition to aural stimuli. (Author)
Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception

Science.gov (United States)

Wilson, Amanda H.; Alsius, Agnès; Parè, Martin; Munhall, Kevin G.

2016-01-01

Purpose: The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method: We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent…
Neural response to visual sexual cues in dopamine treatment-linked hypersexuality in Parkinson's disease.

Science.gov (United States)

Politis, Marios; Loane, Clare; Wu, Kit; O'Sullivan, Sean S; Woodhead, Zoe; Kiferle, Lorenzo; Lawrence, Andrew D; Lees, Andrew J; Piccini, Paola

2013-02-01

Hypersexuality with compulsive sexual behaviour is a significant source of morbidity for patients with Parkinson's disease receiving dopamine replacement therapies. We know relatively little about the pathophysiology of hypersexuality in Parkinson's disease, and it is unknown how visual sexual stimuli, similar to the portrayals of sexuality in the mainstream mass media may affect the brain and behaviour in such susceptible individuals. Here, we have studied a group of 12 patients with Parkinson's disease with hypersexuality using a functional magnetic resonance imaging block design exposing participants to both sexual, other reward-related and neutral visual cues. We hypothesized that exposure to visual sexual cues would trigger increased sexual desire in patients with Parkinson's disease with hypersexuality that would correspond to changes in brain activity in regions linked to dopaminergically stimulated sexual motivation. Patients with Parkinson's disease with hypersexuality were scanned ON and OFF dopamine drugs, and their results were compared with a group of 12 Parkinson's disease control patients without hypersexuality or other impulse control disorders. Exposure to sexual cues significantly increased sexual desire and hedonic responses in the Parkinson's disease hypersexuality group compared with the Parkinson's disease control patients. These behavioural changes corresponded to significant blood oxygen level-dependent signal changes in regions within limbic, paralimbic, temporal, occipital, somatosensory and prefrontal cortices that correspond to emotional, cognitive, autonomic, visual and motivational processes. The functional imaging data showed that the hypersexuality patients' increased sexual desire correlated with enhanced activations in the ventral striatum, and cingulate and orbitofrontal cortices. When the patients with Parkinson's disease with hypersexuality were OFF medication, the functional imaging data showed decreases in activation during
PHYSIOLOGICAL MONITORING OPERATORS ACS IN AUDIO-VISUAL SIMULATION OF AN EMERGENCY

Directory of Open Access Journals (Sweden)

S. S. Aleksanin

2010-01-01

Full Text Available In terms of ship simulator automated control systems we have investigated the information content of physiological monitoring cardiac rhythm to assess the reliability and noise immunity of operators of various specializations with audio-visual simulation of an emergency. In parallel, studied the effectiveness of protection against the adverse effects of electromagnetic fields. Monitoring of cardiac rhythm in a virtual crash it is possible to differentiate the degree of voltage regulation systems of body functions of operators on specialization and note the positive effect of the use of means of protection from exposure of electromagnetic fields.
Multimodal warnings to enhance risk communication and safety

NARCIS (Netherlands)

Haas, E.C.; Erp, J.B.F. van

2014-01-01

Multimodal warnings incorporate audio and/or skin-based (tactile) cues to supplement or replace visual cues in environments where the user’s visual perception is busy, impaired, or nonexistent. This paper describes characteristics of audio, tactile, and multimodal warning displays and their role in

The Improvement of Students’ Leadership Ethic in Studying History by Using Baratayuda Audio Visual Media

Directory of Open Access Journals (Sweden)

Wendhy Rachmadhany

2018-04-01

Full Text Available The purpose of this research is to know the improvement of students’ leadership ethic in studying History after the implementation of Baratayuda Audio Visual Media. The population of this research is XI-Social Science-1 Class of SMAN 1 Pare, Kediri Regency, in academic year 2016/2017, consisted of 39 students. This Classroom Action Research (CAR is arranged by Pre-test, Cycle-1 and Cycle-2 which consisted by some steps, such like; planning, implementation, observation, and reflection. Collecting the data is by using questionnaire of leadership ethic, interview, and documentation. The method of data analysis in this research is descriptive analysis by comparing the improvement from one cycle to another. The result of the research is showing that: There is an improvement of leadership ethic in studying History after the implementation of Baratayuda Audio Visual media. It is shown by the results as follows; Pre-test indicates that the passing score is about 17, 95%. On Cycle-1 indicates 46, 1% and on Cycle-2 indicates a significant improvement about 71, 83%.
Head-body ratio as a visual cue for stature in people and sculptural art

OpenAIRE

Mather, George

2010-01-01

Body size is crucial for determining the outcome of competition for resources and mates. Many species use acoustic cues to measure caller body size. Vision is the pre-eminent sense for humans, but visual depth cues are of limited utility in judgments of absolute body size. The reliability of internal body proportion as a potential cue to stature was assessed with a large sample of anthropometric data, and the ratio of head height to body height (HBR) was found to be highly correlated with sta...
The effect of viewing speech on auditory speech processing is different in the left and right hemispheres.

Science.gov (United States)

Davis, Chris; Kislyuk, Daniel; Kim, Jeesun; Sams, Mikko

2008-11-25

We used whole-head magnetoencephalograpy (MEG) to record changes in neuromagnetic N100m responses generated in the left and right auditory cortex as a function of the match between visual and auditory speech signals. Stimuli were auditory-only (AO) and auditory-visual (AV) presentations of /pi/, /ti/ and /vi/. Three types of intensity matched auditory stimuli were used: intact speech (Normal), frequency band filtered speech (Band) and speech-shaped white noise (Noise). The behavioural task was to detect the /vi/ syllables which comprised 12% of stimuli. N100m responses were measured to averaged /pi/ and /ti/ stimuli. Behavioural data showed that identification of the stimuli was faster and more accurate for Normal than for Band stimuli, and for Band than for Noise stimuli. Reaction times were faster for AV than AO stimuli. MEG data showed that in the left hemisphere, N100m to both AO and AV stimuli was largest for the Normal, smaller for Band and smallest for Noise stimuli. In the right hemisphere, Normal and Band AO stimuli elicited N100m responses of quite similar amplitudes, but N100m amplitude to Noise was about half of that. There was a reduction in N100m for the AV compared to the AO conditions. The size of this reduction for each stimulus type was same in the left hemisphere but graded in the right (being largest to the Normal, smaller to the Band and smallest to the Noise stimuli). The N100m decrease for the Normal stimuli was significantly larger in the right than in the left hemisphere. We suggest that the effect of processing visual speech seen in the right hemisphere likely reflects suppression of the auditory response based on AV cues for place of articulation.
Using EEG and stimulus context to probe the modelling of auditory-visual speech.

Science.gov (United States)

Paris, Tim; Kim, Jeesun; Davis, Chris

2016-02-01

We investigated whether internal models of the relationship between lip movements and corresponding speech sounds [Auditory-Visual (AV) speech] could be updated via experience. AV associations were indexed by early and late event related potentials (ERPs) and by oscillatory power and phase locking. Different AV experience was produced via a context manipulation. Participants were presented with valid (the conventional pairing) and invalid AV speech items in either a 'reliable' context (80% AVvalid items) or an 'unreliable' context (80% AVinvalid items). The results showed that for the reliable context, there was N1 facilitation for AV compared to auditory only speech. This N1 facilitation was not affected by AV validity. Later ERPs showed a difference in amplitude between valid and invalid AV speech and there was significant enhancement of power for valid versus invalid AV speech. These response patterns did not change over the context manipulation, suggesting that the internal models of AV speech were not updated by experience. The results also showed that the facilitation of N1 responses did not vary as a function of the salience of visual speech (as previously reported); in post-hoc analyses, it appeared instead that N1 facilitation varied according to the relative time of the acoustic onset, suggesting for AV events N1 may be more sensitive to the relationship of AV timing than form. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.
Improving user-friendliness by using visually supported speech recognition

NARCIS (Netherlands)

Waals, J.A.J.S.; Kooi, F.L.; Kriekaard, J.J.

2002-01-01

While speech recognition in principle may be one of the most natural interfaces, in practice it is not due to the lack of user-friendliness. Words are regularly interpreted wrong, and subjects tend to articulate in an exaggerated manner. We explored the potential of visually supported error
Concurrent Unimodal Learning Enhances Multisensory Responses of Bi-Directional Crossmodal Learning in Robotic Audio-Visual Tracking

DEFF Research Database (Denmark)

Shaikh, Danish; Bodenhagen, Leon; Manoonpong, Poramate

2018-01-01

modalities to independently update modality-specific neural weights on a moment-by-moment basis, in response to dynamic changes in noisy sensory stimuli. The circuit is embodied as a non-holonomic robotic agent that must orient a towards a moving audio-visual target. The circuit continuously learns the best...
Spatially valid proprioceptive cues improve the detection of a visual stimulus

DEFF Research Database (Denmark)

Jackson, Carl P T; Miall, R Chris; Balslev, Daniela

2010-01-01

, which has been demonstrated for other modality pairings. The aim of this study was to test whether proprioceptive signals can spatially cue a visual target to improve its detection. Participants were instructed to use a planar manipulandum in a forward reaching action and determine during this movement...
A Decision-Tree-Based Algorithm for Speech/Music Classification and Segmentation

Directory of Open Access Journals (Sweden)

Lavner Yizhar

2009-01-01

Full Text Available We present an efficient algorithm for segmentation of audio signals into speech or music. The central motivation to our study is consumer audio applications, where various real-time enhancements are often applied. The algorithm consists of a learning phase and a classification phase. In the learning phase, predefined training data is used for computing various time-domain and frequency-domain features, for speech and music signals separately, and estimating the optimal speech/music thresholds, based on the probability density functions of the features. An automatic procedure is employed to select the best features for separation. In the test phase, initial classification is performed for each segment of the audio signal, using a three-stage sieve-like approach, applying both Bayesian and rule-based methods. To avoid erroneous rapid alternations in the classification, a smoothing technique is applied, averaging the decision on each segment with past segment decisions. Extensive evaluation of the algorithm, on a database of more than 12 hours of speech and more than 22 hours of music showed correct identification rates of 99.4% and 97.8%, respectively, and quick adjustment to alternating speech/music sections. In addition to its accuracy and robustness, the algorithm can be easily adapted to different audio types, and is suitable for real-time operation.
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users

Science.gov (United States)

Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y.

2018-01-01

Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband “ripple” stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects’ spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. PMID:28601530
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users.

Science.gov (United States)

Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y

2017-08-01

Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband "ripple" stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects' spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. Copyright © 2017 Elsevier B.V. All rights reserved.
How task demands shape brain responses to visual food cues.

Science.gov (United States)

Pohl, Tanja Maria; Tempelmann, Claus; Noesselt, Toemme

2017-06-01

Several previous imaging studies have aimed at identifying the neural basis of visual food cue processing in humans. However, there is little consistency of the functional magnetic resonance imaging (fMRI) results across studies. Here, we tested the hypothesis that this variability across studies might - at least in part - be caused by the different tasks employed. In particular, we assessed directly the influence of task set on brain responses to food stimuli with fMRI using two tasks (colour vs. edibility judgement, between-subjects design). When participants judged colour, the left insula, the left inferior parietal lobule, occipital areas, the left orbitofrontal cortex and other frontal areas expressed enhanced fMRI responses to food relative to non-food pictures. However, when judging edibility, enhanced fMRI responses to food pictures were observed in the superior and middle frontal gyrus and in medial frontal areas including the pregenual anterior cingulate cortex and ventromedial prefrontal cortex. This pattern of results indicates that task sets can significantly alter the neural underpinnings of food cue processing. We propose that judging low-level visual stimulus characteristics - such as colour - triggers stimulus-related representations in the visual and even in gustatory cortex (insula), whereas discriminating abstract stimulus categories activates higher order representations in both the anterior cingulate and prefrontal cortex. Hum Brain Mapp 38:2897-2912, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
The Impact of Visual Cues and Service Behavior on the Consumer Retail Experience

OpenAIRE

Bjerk, Taylor

2015-01-01

With product differentiation low in the retail industry, businesses need to create strong brand images and increase customer loyalty in order to remain competitive. Visual merchandising is one tool that businesses have to communicate their message in a compelling and strategic manner. Within the scope of visual merchandising there are a number of atmospherics, or cues, which include visual, tactile, and auditory, that can be used in conjunction with one another to influence consumer behavior....
Precision Scaling of Neural Networks for Efficient Audio Processing

OpenAIRE

Ko, Jong Hwan; Fromm, Josh; Philipose, Matthai; Tashev, Ivan; Zarar, Shuayb

2017-01-01

While deep neural networks have shown powerful performance in many audio applications, their large computation and memory demand has been a challenge for real-time processing. In this paper, we study the impact of scaling the precision of neural networks on the performance of two common audio processing tasks, namely, voice-activity detection and single-channel speech enhancement. We determine the optimal pair of weight/neuron bit precision by exploring its impact on both the performance and ...
Coherence of structural visual cues and pictorial gravity paves the way for interceptive actions.

Science.gov (United States)

Zago, Myrka; La Scaleia, Barbara; Miller, William L; Lacquaniti, Francesco

2011-09-20

Dealing with upside-down objects is difficult and takes time. Among the cues that are critical for defining object orientation, the visible influence of gravity on the object's motion has received limited attention. Here, we manipulated the alignment of visible gravity and structural visual cues between each other and relative to the orientation of the observer and physical gravity. Participants pressed a button triggering a hitter to intercept a target accelerated by a virtual gravity. A factorial design assessed the effects of scene orientation (normal or inverted) and target gravity (normal or inverted). We found that interception was significantly more successful when scene direction was concordant with target gravity direction, irrespective of whether both were upright or inverted. This was so independent of the hitter type and when performance feedback to the participants was either available (Experiment 1) or unavailable (Experiment 2). These results show that the combined influence of visible gravity and structural visual cues can outweigh both physical gravity and viewer-centered cues, leading to rely instead on the congruence of the apparent physical forces acting on people and objects in the scene.
Temporal predictive mechanisms modulate motor reaction time during initiation and inhibition of speech and hand movement.

Science.gov (United States)

Johari, Karim; Behroozmand, Roozbeh

2017-08-01

Skilled movement is mediated by motor commands executed with extremely fine temporal precision. The question of how the brain incorporates temporal information to perform motor actions has remained unanswered. This study investigated the effect of stimulus temporal predictability on response timing of speech and hand movement. Subjects performed a randomized vowel vocalization or button press task in two counterbalanced blocks in response to temporally-predictable and unpredictable visual cues. Results indicated that speech and hand reaction time was decreased for predictable compared with unpredictable stimuli. This finding suggests that a temporal predictive code is established to capture temporal dynamics of sensory cues in order to produce faster movements in responses to predictable stimuli. In addition, results revealed a main effect of modality, indicating faster hand movement compared with speech. We suggest that this effect is accounted for by the inherent complexity of speech production compared with hand movement. Lastly, we found that movement inhibition was faster than initiation for both hand and speech, suggesting that movement initiation requires a longer processing time to coordinate activities across multiple regions in the brain. These findings provide new insights into the mechanisms of temporal information processing during initiation and inhibition of speech and hand movement. Copyright © 2017 Elsevier B.V. All rights reserved.
A review of visual cues associated with food on food acceptance and consumption.

Science.gov (United States)

Wadhera, Devina; Capaldi-Phillips, Elizabeth D

2014-01-01

Several sensory cues affect food intake including appearance, taste, odor, texture, temperature, and flavor. Although taste is an important factor regulating food intake, in most cases, the first sensory contact with food is through the eyes. Few studies have examined the effects of the appearance of a food portion on food acceptance and consumption. The purpose of this review is to identify the various visual factors associated with food such as proximity, visibility, color, variety, portion size, height, shape, number, volume, and the surface area and their effects on food acceptance and consumption. We suggest some ways that visual cues can be used to increase fruit and vegetable intake in children and decrease excessive food intake in adults. In addition, we discuss the need for future studies that can further establish the relationship between several unexplored visual dimensions of food (specifically shape, number, size, and surface area) and food intake. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Exploring determinants of early user acceptance for an audio-visual heritage archive service using the vignette method

NARCIS (Netherlands)

Ongena, G.; van de Wijngaert, Lidwien; Huizer, E.

2013-01-01

The purpose of this study is to investigate factors, which explain the behavioural intention of the use of a new audio-visual cultural heritage archive service. An online survey in combination with a factorial survey is utilised to investigate the predictable strength of technological, individual
How musical expertise shapes speech perception: evidence from auditory classification images.

Science.gov (United States)

Varnet, Léo; Wang, Tianyun; Peter, Chloe; Meunier, Fanny; Hoen, Michel

2015-09-24

It is now well established that extensive musical training percolates to higher levels of cognition, such as speech processing. However, the lack of a precise technique to investigate the specific listening strategy involved in speech comprehension has made it difficult to determine how musicians' higher performance in non-speech tasks contributes to their enhanced speech comprehension. The recently developed Auditory Classification Image approach reveals the precise time-frequency regions used by participants when performing phonemic categorizations in noise. Here we used this technique on 19 non-musicians and 19 professional musicians. We found that both groups used very similar listening strategies, but the musicians relied more heavily on the two main acoustic cues, at the first formant onset and at the onsets of the second and third formants onsets. Additionally, they responded more consistently to stimuli. These observations provide a direct visualization of auditory plasticity resulting from extensive musical training and shed light on the level of functional transfer between auditory processing and speech perception.
P2-13: Location word Cues' Effect on Location Discrimination Task: Cross-Modal Study

Directory of Open Access Journals (Sweden)

Satoko Ohtsuka

2012-10-01

Full Text Available As is well known, participants are slower and make more errors in responding to the display color of an incongruent color word than a congruent one. This traditional stroop effect is often accounted for with relatively automatic and dominant word processing. Although the word dominance account has been widely supported, it is not clear in what extent of perceptual tasks it is valid. Here we aimed to examine whether the word dominance effect is observed in location stroop tasks and in audio-visual situations. The participants were required to press a key according to the location of visual (Experiment 1 and audio (Experiment 2 targets, left or right, as soon as possible. A cue of written (Experiments 1a and 2a or spoken (Experiments 1b and 2b location words, “left” or “right”, was presented on the left or right side of the fixation with cue lead times (CLT of 200 ms and 1200 ms. Reaction time from target presentation to key press was recorded as a dependent variable. The results were that the location validity effect was marked in within-modality but less so in cross-modality trials. The word validity effect was strong in within- but not in cross-modality trials. The CLT gave some effect of inhibition of return. So the word dominance could be less effective in location tasks and in cross-modal situations. The spatial correspondence seems to overcome the word effect.
Audiovisual alignment of co-speech gestures to speech supports word learning in 2-year-olds.

Science.gov (United States)

Jesse, Alexandra; Johnson, Elizabeth K

2016-05-01

Analyses of caregiver-child communication suggest that an adult tends to highlight objects in a child's visual scene by moving them in a manner that is temporally aligned with the adult's speech productions. Here, we used the looking-while-listening paradigm to examine whether 25-month-olds use audiovisual temporal alignment to disambiguate and learn novel word-referent mappings in a difficult word-learning task. Videos of two equally interesting and animated novel objects were simultaneously presented to children, but the movement of only one of the objects was aligned with an accompanying object-labeling audio track. No social cues (e.g., pointing, eye gaze, touch) were available to the children because the speaker was edited out of the videos. Immediately afterward, toddlers were presented with still images of the two objects and asked to look at one or the other. Toddlers looked reliably longer to the labeled object, demonstrating their acquisition of the novel word-referent mapping. A control condition showed that children's performance was not solely due to the single unambiguous labeling that had occurred at experiment onset. We conclude that the temporal link between a speaker's utterances and the motion they imposed on the referent object helps toddlers to deduce a speaker's intended reference in a difficult word-learning scenario. In combination with our previous work, these findings suggest that intersensory redundancy is a source of information used by language users of all ages. That is, intersensory redundancy is not just a word-learning tool used by young infants. Copyright © 2015 Elsevier Inc. All rights reserved.

Steganalysis of recorded speech

Science.gov (United States)

Johnson, Micah K.; Lyu, Siwei; Farid, Hany

2005-03-01

Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Visual search and contextual cueing: differential effects in 10-year-old children and adults.

Science.gov (United States)

Couperus, Jane W; Hunt, Ruskin H; Nelson, Charles A; Thomas, Kathleen M

2011-02-01

The development of contextual cueing specifically in relation to attention was examined in two experiments. Adult and 10-year-old participants completed a context cueing visual search task (Jiang & Chun, The Quarterly Journal of Experimental Psychology, 54A(4), 1105-1124, 2001) containing stimuli presented in an attended (e.g., red) and unattended (e.g., green) color. When the spatial configuration of stimuli in the attended and unattended color was invariant and consistently paired with the target location, adult reaction times improved, demonstrating learning. Learning also occurred if only the attended stimuli's configuration remained fixed. In contrast, while 10 year olds, like adults, showed incrementally slower reaction times as the number of attended stimuli increased, they did not show learning in the standard paradigm. However, they did show learning when the ratio of attended to unattended stimuli was high, irrespective of the total number of attended stimuli. Findings suggest children show efficient attentional guidance by color in visual search but differences in contextual cueing.
Technology-Assisted Rehabilitation of Writing Skills in Parkinson’s Disease: Visual Cueing versus Intelligent Feedback

Directory of Open Access Journals (Sweden)

Evelien Nackaerts

2017-01-01

Full Text Available Recent research showed that visual cueing can have both beneficial and detrimental effects on handwriting of patients with Parkinson’s disease (PD and healthy controls depending on the circumstances. Hence, using other sensory modalities to deliver cueing or feedback may be a valuable alternative. Therefore, the current study compared the effects of short-term training with either continuous visual cues or intermittent intelligent verbal feedback. Ten PD patients and nine healthy controls were randomly assigned to one of these training modes. To assess transfer of learning, writing performance was assessed in the absence of cueing and feedback on both trained and untrained writing sequences. The feedback pen and a touch-sensitive writing tablet were used for testing. Both training types resulted in improved writing amplitudes for the trained and untrained sequences. In conclusion, these results suggest that the feedback pen is a valuable tool to implement writing training in a tailor-made fashion for people with PD. Future studies should include larger sample sizes and different subgroups of PD for long-term training with the feedback pen.
Development of a Low-Cost, Noninvasive, Portable Visual Speech Recognition Program.

Science.gov (United States)

Kohlberg, Gavriel D; Gal, Ya'akov Kobi; Lalwani, Anil K

2016-09-01

Loss of speech following tracheostomy and laryngectomy severely limits communication to simple gestures and facial expressions that are largely ineffective. To facilitate communication in these patients, we seek to develop a low-cost, noninvasive, portable, and simple visual speech recognition program (VSRP) to convert articulatory facial movements into speech. A Microsoft Kinect-based VSRP was developed to capture spatial coordinates of lip movements and translate them into speech. The articulatory speech movements associated with 12 sentences were used to train an artificial neural network classifier. The accuracy of the classifier was then evaluated on a separate, previously unseen set of articulatory speech movements. The VSRP was successfully implemented and tested in 5 subjects. It achieved an accuracy rate of 77.2% (65.0%-87.6% for the 5 speakers) on a 12-sentence data set. The mean time to classify an individual sentence was 2.03 milliseconds (1.91-2.16). We have demonstrated the feasibility of a low-cost, noninvasive, portable VSRP based on Kinect to accurately predict speech from articulation movements in clinically trivial time. This VSRP could be used as a novel communication device for aphonic patients. © The Author(s) 2016.
Integration of visual and non-visual self-motion cues during voluntary head movements in the human brain.

Science.gov (United States)

Schindler, Andreas; Bartels, Andreas

2018-05-15

Our phenomenological experience of the stable world is maintained by continuous integration of visual self-motion with extra-retinal signals. However, due to conventional constraints of fMRI acquisition in humans, neural responses to visuo-vestibular integration have only been studied using artificial stimuli, in the absence of voluntary head-motion. We here circumvented these limitations and let participants to move their heads during scanning. The slow dynamics of the BOLD signal allowed us to acquire neural signal related to head motion after the observer's head was stabilized by inflatable aircushions. Visual stimuli were presented on head-fixed display goggles and updated in real time as a function of head-motion that was tracked using an external camera. Two conditions simulated forward translation of the participant. During physical head rotation, the congruent condition simulated a stable world, whereas the incongruent condition added arbitrary lateral motion. Importantly, both conditions were precisely matched in visual properties and head-rotation. By comparing congruent with incongruent conditions we found evidence consistent with the multi-modal integration of visual cues with head motion into a coherent "stable world" percept in the parietal operculum and in an anterior part of parieto-insular cortex (aPIC). In the visual motion network, human regions MST, a dorsal part of VIP, the cingulate sulcus visual area (CSv) and a region in precuneus (Pc) showed differential responses to the same contrast. The results demonstrate for the first time neural multimodal interactions between precisely matched congruent versus incongruent visual and non-visual cues during physical head-movement in the human brain. The methodological approach opens the path to a new class of fMRI studies with unprecedented temporal and spatial control over visuo-vestibular stimulation. Copyright © 2018 Elsevier Inc. All rights reserved.
Automatic Speech Acquisition and Recognition for Spacesuit Audio Systems

Science.gov (United States)

Ye, Sherry

2015-01-01

NASA has a widely recognized but unmet need for novel human-machine interface technologies that can facilitate communication during astronaut extravehicular activities (EVAs), when loud noises and strong reverberations inside spacesuits make communication challenging. WeVoice, Inc., has developed a multichannel signal-processing method for speech acquisition in noisy and reverberant environments that enables automatic speech recognition (ASR) technology inside spacesuits. The technology reduces noise by exploiting differences between the statistical nature of signals (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, ASR accuracy can be improved to the level at which crewmembers will find the speech interface useful. System components and features include beam forming/multichannel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, and ASR decoding. Arithmetic complexity models were developed and will help designers of real-time ASR systems select proper tasks when confronted with constraints in computational resources. In Phase I of the project, WeVoice validated the technology. The company further refined the technology in Phase II and developed a prototype for testing and use by suited astronauts.
Linking attentional processes and conceptual problem solving: visual cues facilitate the automaticity of extracting relevant information from diagrams.

Science.gov (United States)

Rouinfar, Amy; Agra, Elise; Larson, Adam M; Rebello, N Sanjay; Loschky, Lester C

2014-01-01

This study investigated links between visual attention processes and conceptual problem solving. This was done by overlaying visual cues on conceptual physics problem diagrams to direct participants' attention to relevant areas to facilitate problem solving. Participants (N = 80) individually worked through four problem sets, each containing a diagram, while their eye movements were recorded. Each diagram contained regions that were relevant to solving the problem correctly and separate regions related to common incorrect responses. Problem sets contained an initial problem, six isomorphic training problems, and a transfer problem. The cued condition saw visual cues overlaid on the training problems. Participants' verbal responses were used to determine their accuracy. This study produced two major findings. First, short duration visual cues which draw attention to solution-relevant information and aid in the organizing and integrating of it, facilitate both immediate problem solving and generalization of that ability to new problems. Thus, visual cues can facilitate re-representing a problem and overcoming impasse, enabling a correct solution. Importantly, these cueing effects on problem solving did not involve the solvers' attention necessarily embodying the solution to the problem, but were instead caused by solvers attending to and integrating relevant information in the problems into a solution path. Second, this study demonstrates that when such cues are used across multiple problems, solvers can automatize the extraction of problem-relevant information extraction. These results suggest that low-level attentional selection processes provide a necessary gateway for relevant information to be used in problem solving, but are generally not sufficient for correct problem solving. Instead, factors that lead a solver to an impasse and to organize and integrate problem information also greatly facilitate arriving at correct solutions.
Man-systems evaluation of moving base vehicle simulation motion cues. [human acceleration perception involving visual feedback

Science.gov (United States)

Kirkpatrick, M.; Brye, R. G.

1974-01-01

A motion cue investigation program is reported that deals with human factor aspects of high fidelity vehicle simulation. General data on non-visual motion thresholds and specific threshold values are established for use as washout parameters in vehicle simulation. A general purpose similator is used to test the contradictory cue hypothesis that acceleration sensitivity is reduced during a vehicle control task involving visual feedback. The simulator provides varying acceleration levels. The method of forced choice is based on the theory of signal detect ability.
Gender differences in identifying emotions from auditory and visual stimuli.

Science.gov (United States)

Waaramaa, Teija

2017-12-01

The present study focused on gender differences in emotion identification from auditory and visual stimuli produced by two male and two female actors. Differences in emotion identification from nonsense samples, language samples and prolonged vowels were investigated. It was also studied whether auditory stimuli can convey the emotional content of speech without visual stimuli, and whether visual stimuli can convey the emotional content of speech without auditory stimuli. The aim was to get a better knowledge of vocal attributes and a more holistic understanding of the nonverbal communication of emotion. Females tended to be more accurate in emotion identification than males. Voice quality parameters played a role in emotion identification in both genders. The emotional content of the samples was best conveyed by nonsense sentences, better than by prolonged vowels or shared native language of the speakers and participants. Thus, vocal non-verbal communication tends to affect the interpretation of emotion even in the absence of language. The emotional stimuli were better recognized from visual stimuli than auditory stimuli by both genders. Visual information about speech may not be connected to the language; instead, it may be based on the human ability to understand the kinetic movements in speech production more readily than the characteristics of the acoustic cues.
The Acceptability of Speech with Radio Interference

DEFF Research Database (Denmark)

Baykaner, K.; Hummersone, H.; Mason, R.

2014-01-01

A listening test was conducted to investigate the acceptability of audio-on-audio interference for radio programs featuring speech as the target. Twenty-one subjects, including naïve and expert listeners, were presented with 200 randomly assigned pairs of stimuli and asked to report, for each trial...
The Auditory-Visual Speech Benefit on Working Memory in Older Adults with Hearing Impairment

OpenAIRE

Frtusova, Jana B.; Phillips, Natalie A.

2016-01-01

This study examined the effect of auditory-visual (AV) speech stimuli on working memory in older adults with poorer-hearing (PH) in comparison to age- and education-matched older adults with better hearing (BH). Participants completed a working memory n-back task (0- to 2-back) in which sequences of digits were presented in visual-only (i.e., speech-reading), auditory-only (A-only), and AV conditions. Auditory event-related potentials (ERP) were collected to assess the relationship between pe...
Response of hatchling Komodo Dragons (Varanus komodoensis) at Denver Zoo to visual and chemical cues arising from prey.

Science.gov (United States)

Chiszar, David; Krauss, Susan; Shipley, Bryon; Trout, Tim; Smith, Hobart M

2009-01-01

Five hatchling Komodo Dragons (Varanus komodoensis) at Denver Zoo were observed in two experiments that studied the effects of visual and chemical cues arising from prey. Rate of tongue flicking was recorded in Experiment 1, and amount of time the lizards spent interacting with stimuli was recorded in Experiment 2. Our hypothesis was that young V. komodoensis would be more dependent upon vision than chemoreception, especially when dealing with live, moving, prey. Although visual cues, including prey motion, had a significant effect, chemical cues had a far stronger effect. Implications of this falsification of our initial hypothesis are discussed.
Visual Cues Contribute Differentially to Audiovisual Perception of Consonants and Vowels in Improving Recognition and Reducing Cognitive Demands in Listeners With Hearing Impairment Using Hearing Aids.

Science.gov (United States)

Moradi, Shahram; Lidestam, Björn; Danielsson, Henrik; Ng, Elaine Hoi Ning; Rönnberg, Jerker

2017-09-18

We sought to examine the contribution of visual cues in audiovisual identification of consonants and vowels-in terms of isolation points (the shortest time required for correct identification of a speech stimulus), accuracy, and cognitive demands-in listeners with hearing impairment using hearing aids. The study comprised 199 participants with hearing impairment (mean age = 61.1 years) with bilateral, symmetrical, mild-to-severe sensorineural hearing loss. Gated Swedish consonants and vowels were presented aurally and audiovisually to participants. Linear amplification was adjusted for each participant to assure audibility. The reading span test was used to measure participants' working memory capacity. Audiovisual presentation resulted in shortened isolation points and improved accuracy for consonants and vowels relative to auditory-only presentation. This benefit was more evident for consonants than vowels. In addition, correlations and subsequent analyses revealed that listeners with higher scores on the reading span test identified both consonants and vowels earlier in auditory-only presentation, but only vowels (not consonants) in audiovisual presentation. Consonants and vowels differed in terms of the benefits afforded from their associative visual cues, as indicated by the degree of audiovisual benefit and reduction in cognitive demands linked to the identification of consonants and vowels presented audiovisually.
Cross-modal cueing in audiovisual spatial attention

DEFF Research Database (Denmark)

Blurton, Steven Paul; Greenlee, Mark W.; Gondan, Matthias

2015-01-01

effects have been reported for endogenous visual cues while exogenous cues seem to be mostly ineffective. In three experiments, we investigated cueing effects on the processing of audiovisual signals. In Experiment 1 we used endogenous cues to investigate their effect on the detection of auditory, visual......, and audiovisual targets presented with onset asynchrony. Consistent cueing effects were found in all target conditions. In Experiment 2 we used exogenous cues and found cueing effects only for visual target detection, but not auditory target detection. In Experiment 3 we used predictive exogenous cues to examine...
The Effect of Onset Asynchrony in Audio Visual Speech and the Uncanny Valley in Virtual Characters

DEFF Research Database (Denmark)

Tinwell, Angela; Grimshaw, Mark; Abdel Nabi, Deborah

2015-01-01

This study investigates if the Uncanny Valley phenomenon is increased for realistic, human-like characters with an asynchrony of lip movement during speech. An experiment was conducted in which 113 participants rated, a human and a realistic, talking-head, human-like, virtual character over a ran...
Familiar units prevail over statistical cues in word segmentation.

Science.gov (United States)

Poulin-Charronnat, Bénédicte; Perruchet, Pierre; Tillmann, Barbara; Peereman, Ronald

2017-09-01

In language acquisition research, the prevailing position is that listeners exploit statistical cues, in particular transitional probabilities between syllables, to discover words of a language. However, other cues are also involved in word discovery. Assessing the weight learners give to these different cues leads to a better understanding of the processes underlying speech segmentation. The present study evaluated whether adult learners preferentially used known units or statistical cues for segmenting continuous speech. Before the exposure phase, participants were familiarized with part-words of a three-word artificial language. This design allowed the dissociation of the influence of statistical cues and familiar units, with statistical cues favoring word segmentation and familiar units favoring (nonoptimal) part-word segmentation. In Experiment 1, performance in a two-alternative forced choice (2AFC) task between words and part-words revealed part-word segmentation (even though part-words were less cohesive in terms of transitional probabilities and less frequent than words). By contrast, an unfamiliarized group exhibited word segmentation, as usually observed in standard conditions. Experiment 2 used a syllable-detection task to remove the likely contamination of performance by memory and strategy effects in the 2AFC task. Overall, the results suggest that familiar units overrode statistical cues, ultimately questioning the need for computation mechanisms of transitional probabilities (TPs) in natural language speech segmentation.
Term clouds as surrogates for user generated speech

NARCIS (Netherlands)

Tsagkias, M.; Larson, M.; de Rijke, M.; Myaeng, S.-H.; Oard, D.W.; Sebastiani, F.; Chua, T.-S.; Leong, M.-K.

2008-01-01

User generated spoken audio remains a challenge for Automatic Speech Recognition (ASR) technology and content-based audio surrogates derived from ASR-transcripts must be error robust. An investigation of the use of term clouds as surrogates for podcasts demonstrates that ASR term clouds closely
Visual cues given by humans are not sufficient for Asian elephants (Elephas maximus to find hidden food.

Directory of Open Access Journals (Sweden)

Joshua M Plotnik

Full Text Available Recent research suggests that domesticated species--due to artificial selection by humans for specific, preferred behavioral traits--are better than wild animals at responding to visual cues given by humans about the location of hidden food. \\Although this seems to be supported by studies on a range of domesticated (including dogs, goats and horses and wild (including wolves and chimpanzees animals, there is also evidence that exposure to humans positively influences the ability of both wild and domesticated animals to follow these same cues. Here, we test the performance of Asian elephants (Elephas maximus on an object choice task that provides them with visual-only cues given by humans about the location of hidden food. Captive elephants are interesting candidates for investigating how both domestication and human exposure may impact cue-following as they represent a non-domesticated species with almost constant human interaction. As a group, the elephants (n = 7 in our study were unable to follow pointing, body orientation or a combination of both as honest signals of food location. They were, however, able to follow vocal commands with which they were already familiar in a novel context, suggesting the elephants are able to follow cues if they are sufficiently salient. Although the elephants' inability to follow the visual cues provides partial support for the domestication hypothesis, an alternative explanation is that elephants may rely more heavily on other sensory modalities, specifically olfaction and audition. Further research will be needed to rule out this alternative explanation.
Visual cues given by humans are not sufficient for Asian elephants (Elephas maximus) to find hidden food.

Science.gov (United States)

Plotnik, Joshua M; Pokorny, Jennifer J; Keratimanochaya, Titiporn; Webb, Christine; Beronja, Hana F; Hennessy, Alice; Hill, James; Hill, Virginia J; Kiss, Rebecca; Maguire, Caitlin; Melville, Beckett L; Morrison, Violet M B; Seecoomar, Dannah; Singer, Benjamin; Ukehaxhaj, Jehona; Vlahakis, Sophia K; Ylli, Dora; Clayton, Nicola S; Roberts, John; Fure, Emilie L; Duchatelier, Alicia P; Getz, David

2013-01-01

Recent research suggests that domesticated species--due to artificial selection by humans for specific, preferred behavioral traits--are better than wild animals at responding to visual cues given by humans about the location of hidden food. \\Although this seems to be supported by studies on a range of domesticated (including dogs, goats and horses) and wild (including wolves and chimpanzees) animals, there is also evidence that exposure to humans positively influences the ability of both wild and domesticated animals to follow these same cues. Here, we test the performance of Asian elephants (Elephas maximus) on an object choice task that provides them with visual-only cues given by humans about the location of hidden food. Captive elephants are interesting candidates for investigating how both domestication and human exposure may impact cue-following as they represent a non-domesticated species with almost constant human interaction. As a group, the elephants (n = 7) in our study were unable to follow pointing, body orientation or a combination of both as honest signals of food location. They were, however, able to follow vocal commands with which they were already familiar in a novel context, suggesting the elephants are able to follow cues if they are sufficiently salient. Although the elephants' inability to follow the visual cues provides partial support for the domestication hypothesis, an alternative explanation is that elephants may rely more heavily on other sensory modalities, specifically olfaction and audition. Further research will be needed to rule out this alternative explanation.
Generating physical symptoms from visual cues: An experimental study

OpenAIRE

Ogden, J; Zoukas, S

2010-01-01

This experimental study explored whether the physical symptoms of cold, pain and itchiness could be generated by visual cues, whether they varied in the ease with which they could be generated and whether they were related to negative affect. Participants were randomly allocated by group to watch one of three videos relating to cold (e.g. ice, snow, wind), pain (e.g. sporting injuries, tattoos) or itchiness (e.g. head lice, scratching). They then rated their self-reported symptoms of cold, pa...

Audio-visual synchronization in reading while listening to texts: Effects on visual behavior and verbal learning

OpenAIRE

Gerbier , Emilie; Bailly , Gérard; Bosse , Marie-Line

2018-01-01

International audience; Reading while listening to texts (RWL) is a promising way to improve the learning benefits provided by a reading experience. In an exploratory study, we investigated the effect of synchronizing the highlighting of words (visual) with their auditory (speech) counterpart during a RWL task. Forty French children from 3rd to 5th grade read short stories in their native language while hearing the story spoken by a narrator. In the non-synchronized (S-) condition the text wa...
Sensory modality of smoking cues modulates neural cue reactivity.

Science.gov (United States)

Yalachkov, Yavor; Kaiser, Jochen; Görres, Andreas; Seehaus, Arne; Naumer, Marcus J

2013-01-01

Behavioral experiments have demonstrated that the sensory modality of presentation modulates drug cue reactivity. The present study on nicotine addiction tested whether neural responses to smoking cues are modulated by the sensory modality of stimulus presentation. We measured brain activation using functional magnetic resonance imaging (fMRI) in 15 smokers and 15 nonsmokers while they viewed images of smoking paraphernalia and control objects and while they touched the same objects without seeing them. Haptically presented, smoking-related stimuli induced more pronounced neural cue reactivity than visual cues in the left dorsal striatum in smokers compared to nonsmokers. The severity of nicotine dependence correlated positively with the preference for haptically explored smoking cues in the left inferior parietal lobule/somatosensory cortex, right fusiform gyrus/inferior temporal cortex/cerebellum, hippocampus/parahippocampal gyrus, posterior cingulate cortex, and supplementary motor area. These observations are in line with the hypothesized role of the dorsal striatum for the expression of drug habits and the well-established concept of drug-related automatized schemata, since haptic perception is more closely linked to the corresponding object-specific action pattern than visual perception. Moreover, our findings demonstrate that with the growing severity of nicotine dependence, brain regions involved in object perception, memory, self-processing, and motor control exhibit an increasing preference for haptic over visual smoking cues. This difference was not found for control stimuli. Considering the sensory modality of the presented cues could serve to develop more reliable fMRI-specific biomarkers, more ecologically valid experimental designs, and more effective cue-exposure therapies of addiction.
Heads First: Visual Aftereffects Reveal Hierarchical Integration of Cues to Social Attention.

Directory of Open Access Journals (Sweden)

Sarah Cooney

Full Text Available Determining where another person is attending is an important skill for social interaction that relies on various visual cues, including the turning direction of the head and body. This study reports a novel high-level visual aftereffect that addresses the important question of how these sources of information are combined in gauging social attention. We show that adapting to images of heads turned 25° to the right or left produces a perceptual bias in judging the turning direction of subsequently presented bodies. In contrast, little to no change in the judgment of head orientation occurs after adapting to extremely oriented bodies. The unidirectional nature of the aftereffect suggests that cues from the human body signaling social attention are combined in a hierarchical fashion and is consistent with evidence from single-cell recording studies in nonhuman primates showing that information about head orientation can override information about body posture when both are visible.
Literary Genres in Social Life: A Narrative, Audio-visual and Poetic Approach

Directory of Open Access Journals (Sweden)

Luis Felipe González Gutiérrez

2008-05-01

Full Text Available The proposal, "Literary Genres in Social Life: a Narrative, Audio-visual and Poetic Approach", attempts, by objective, to present/display to the academic psychology community and compatible social science disciplines the main contributions of literary genre theory through a social constructionist understanding of narrations and daily stories, and by means of an interactive construction of narrative collage. This work, sustained by an investigation financed by the University Santo Tomás in Bogota, Colombia, "Understanding of structuralist literary theories in the development of the narrative 'I' within the social constructionist approach", tries to propose alternative spaces for the presentation of its investigative results through the expression of metaphors, visual narrative sequences and interactive artistic forms, which invite the spectator to share in and to include/understand important concepts in the consolidation of social forms of construction of the quotidian. URN: urn:nbn:de:0114-fqs0802373
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion

Science.gov (United States)

Schutz, Michael

2017-01-01

Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy”) pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech. PMID:29249997
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion.

Science.gov (United States)

Schutz, Michael

2017-01-01

Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor), a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally "happy") pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015). Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers "trade off" cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music-widely recognized for its artistic significance-complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.
Acoustic Constraints and Musical Consequences: Exploring Composers' Use of Cues for Musical Emotion

Directory of Open Access Journals (Sweden)

Michael Schutz

2017-11-01

Full Text Available Emotional communication in music is based in part on the use of pitch and timing, two cues effective in emotional speech. Corpus analyses of natural speech illustrate that happy utterances tend to be higher and faster than sad. Although manipulations altering melodies show that passages changed to be higher and faster sound happier, corpus analyses of unaltered music paralleling those of natural speech have proven challenging. This partly reflects the importance of modality (i.e., major/minor, a powerful musical cue whose use is decidedly imbalanced in Western music. This imbalance poses challenges for creating musical corpora analogous to existing speech corpora for purposes of analyzing emotion. However, a novel examination of music by Bach and Chopin balanced in modality illustrates that, consistent with predictions from speech, their major key (nominally “happy” pieces are approximately a major second higher and 29% faster than their minor key pieces (Poon and Schutz, 2015. Although this provides useful evidence for parallels in use of emotional cues between these domains, it raises questions about how composers “trade off” cue differentiation in music, suggesting interesting new potential research directions. This Focused Review places those results in a broader context, highlighting their connections with previous work on the natural use of cues for musical emotion. Together, these observational findings based on unaltered music—widely recognized for its artistic significance—complement previous experimental work systematically manipulating specific parameters. In doing so, they also provide a useful musical counterpart to fruitful studies of the acoustic cues for emotion found in natural speech.
Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

Directory of Open Access Journals (Sweden)

Vincent Aubanel

2016-08-01

Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.
Visual Search and Target Cueing: A Comparison of Head-Mounted Versus Hand-Held Displays on the Allocation of Visual Attention

National Research Council Canada - National Science Library

Yeh, Michelle; Wickens, Christopher D

1998-01-01

We conducted a study to examine the effects of target cueing and conformality with a hand-held or head-mounted display to determine their effects on visual search tasks requiring focused and divided attention...
Integration of Audio Visual Multimedia for Special Education Pre-Service Teachers' Self Reflections in Developing Teaching Competencies

Science.gov (United States)

Sediyani, Tri; Yufiarti; Hadi, Eko

2017-01-01

This study aims to develop a model of learning by integrating multimedia and audio-visual self-reflective learners. This multimedia was developed as a tool for prospective teachers as learners in the education of children with special needs to reflect on their teaching competencies before entering the world of education. Research methods to…
Digitisation of the CERN Audio Archives

CERN Multimedia

Maximilien Brice

2006-01-01

Since the creation of CERN in 1954 until mid 1980s, the audiovisual service has recorded hundreds of hours of moments of life at CERN on audio tapes. These moments range from inaugurations of new facilities to VIP speeches and general interest cultural seminars The preservation process started in June 2005 On these pictures, we see Waltraud Hug working on an open-reel tape.
Object-based implicit learning in visual search: perceptual segmentation constrains contextual cueing.

Science.gov (United States)

Conci, Markus; Müller, Hermann J; von Mühlenen, Adrian

2013-07-09

In visual search, detection of a target is faster when it is presented within a spatial layout of repeatedly encountered nontarget items, indicating that contextual invariances can guide selective attention (contextual cueing; Chun & Jiang, 1998). However, perceptual regularities may interfere with contextual learning; for instance, no contextual facilitation occurs when four nontargets form a square-shaped grouping, even though the square location predicts the target location (Conci & von Mühlenen, 2009). Here, we further investigated potential causes for this interference-effect: We show that contextual cueing can reliably occur for targets located within the region of a segmented object, but not for targets presented outside of the object's boundaries. Four experiments demonstrate an object-based facilitation in contextual cueing, with a modulation of context-based learning by relatively subtle grouping cues including closure, symmetry, and spatial regularity. Moreover, the lack of contextual cueing for targets located outside the segmented region was due to an absence of (latent) learning of contextual layouts, rather than due to an attentional bias towards the grouped region. Taken together, these results indicate that perceptual segmentation provides a basic structure within which contextual scene regularities are acquired. This in turn argues that contextual learning is constrained by object-based selection.
Designing Promotion Strategy of Malang Raya’s Tourism Destination Branding through Audio Visual Media

Directory of Open Access Journals (Sweden)

Chanira Nuansa

2014-04-01

Full Text Available This study examines the suitability concept of destination branding with existing models of Malang tourism promotion. This research is qualitative by taking the data directly in the form of existing promotional models of Malang, namely: information portal sites, blogs, social networking, and video via the Internet. This study used SWOT analysis to find strengths, weaknesses, opportunities, and threats on existing models of the tourism promotion. The data is analyzed based on destination branding’s concept indicators. Results of analysis are used as a basis in designing solutions for Malang tourism promotion through a new integrated tourism advertising model. Through the analysis we found that video is the most suitable media that used to promote Malang tourism in the form of advertisements. Videos are able to show the objectivity of the fact that intact better through audio-visual form, making it easier to associate the viewer thoughts on the phenomenon of destination. Moreover, video creation of Malang tourism as well as conceptualized ad is still rare. This is an opportunity, because later models of audio-visual advertisements made of this study is expected to be an example for concerned parties to conceptualize the next Malang tourism advertising.Keywords: Advertise, SWOT Analysis, Malang City, tourism promotion
Visual stimuli in intervention approaches for pre-schoolers diagnosed with phonological delay.

Science.gov (United States)

Pedro, Cassandra Ferreira; Lousada, Marisa; Hall, Andreia; Jesus, Luis M T

2018-04-01

The aim of this study was to develop and content validate specific speech and language intervention picture cards: The Letter-Sound (L&S) cards. The present study was also focused on assessing the influence of these cards on letter-sound correspondences and speech sound production. An expert panel of six speech and language therapists analysed and discussed the L&S cards based on several criteria previously established. A Speech and Language Therapist carried out a 6-week therapeutic intervention with a group of seven Portuguese phonologically delayed pre-schoolers aged 5;3 to 6;5. The modified Bland-Altman method revealed good agreement among evaluators, that is the majority of the values was between the agreement limits. Additional outcome measures were collected before and after the therapeutic intervention process. Results indicate that the L&S cards facilitate the acquisition of letter-sound correspondences. Regarding speech sound production, some improvements were also observed at word level. The L&S cards are therefore likely to give phonetic cues, which are crucial for the correct production of therapeutic targets. These visual cues seemed to have helped children with phonological delay develop the above-mentioned skills.
Central and Peripheral Vision Loss Differentially Affects Contextual Cueing in Visual Search

Science.gov (United States)

Geringswald, Franziska; Pollmann, Stefan

2015-01-01

Visual search for targets in repeated displays is more efficient than search for the same targets in random distractor layouts. Previous work has shown that this contextual cueing is severely impaired under central vision loss. Here, we investigated whether central vision loss, simulated with gaze-contingent displays, prevents the incidental…
Visual perception of dynamic properties: cue heuristics versus direct-perceptual competence.

Science.gov (United States)

Runeson, S; Juslin, P; Olsson, H

2000-07-01

Constructivist and Gibsonian approaches disagree over the possibility of direct perceptual use of advanced information. A trenchant instance concerns visual perception of underlying dynamic properties as specified by kinematic patterns of events. For the paradigmatic task of discrimination of relative mass in observed collisions, 2 mathematical models are developed, 1 model representing a direct, invariant-based approach, and 1 representing a cue-heuristic approach. The models enable a critical experimental design with distinct predictions concerning performance data and confidence ratings. Although pretraining results were mixed, the invariant-based model was empirically confirmed after a minimal amount of training: Competence entails the use of advanced kinematic information in a direct-perceptual ("sensory") mode of apprehension, in contrast to beginners' use of simpler cues in an inferential ("cognitive") mode.
Congruent and Incongruent Cues in Highly Familiar Audiovisual Action Sequences: An ERP Study

Directory of Open Access Journals (Sweden)

SM Wuerger

2012-07-01

Full Text Available In a previous fMRI study we found significant differences in BOLD responses for congruent and incongruent semantic audio-visual action sequences (whole-body actions and speech actions in bilateral pSTS, left SMA, left IFG, and IPL (Meyer, Greenlee, & Wuerger, JOCN, 2011. Here, we present results from a 128-channel ERP study that examined the time-course of these interactions using a one-back task. ERPs in response to congruent and incongruent audio-visual actions were compared to identify regions and latencies of differences. Responses to congruent and incongruent stimuli differed between 240–280 ms, 340–420 ms, and 460–660 ms after stimulus onset. A dipole analysis revealed that the difference around 250 ms can be partly explained by a modulation of sources in the vicinity of the superior temporal area, while the responses after 400 ms are consistent with sources in inferior frontal areas. Our results are in line with a model that postulates early recognition of congruent audiovisual actions in the pSTS, perhaps as a sensory memory buffer, and a later role of the IFG, perhaps in a generative capacity, in reconciling incongruent signals.
Ageing diminishes the modulation of human brain responses to visual food cues by meal ingestion.

Science.gov (United States)

Cheah, Y S; Lee, S; Ashoor, G; Nathan, Y; Reed, L J; Zelaya, F O; Brammer, M J; Amiel, S A

2014-09-01

Rates of obesity are greatest in middle age. Obesity is associated with altered activity of brain networks sensing food-related stimuli and internal signals of energy balance, which modulate eating behaviour. The impact of healthy mid-life ageing on these processes has not been characterised. We therefore aimed to investigate changes in brain responses to food cues, and the modulatory effect of meal ingestion on such evoked neural activity, from young adulthood to middle age. Twenty-four healthy, right-handed subjects, aged 19.5-52.6 years, were studied on separate days after an overnight fast, randomly receiving 50 ml water or 554 kcal mixed meal before functional brain magnetic resonance imaging while viewing visual food cues. Across the group, meal ingestion reduced food cue-evoked activity of amygdala, putamen, insula and thalamus, and increased activity in precuneus and bilateral parietal cortex. Corrected for body mass index, ageing was associated with decreasing food cue-evoked activation of right dorsolateral prefrontal cortex (DLPFC) and precuneus, and increasing activation of left ventrolateral prefrontal cortex (VLPFC), bilateral temporal lobe and posterior cingulate in the fasted state. Ageing was also positively associated with the difference in food cue-evoked activation between fed and fasted states in the right DLPFC, bilateral amygdala and striatum, and negatively associated with that of the left orbitofrontal cortex and VLPFC, superior frontal gyrus, left middle and temporal gyri, posterior cingulate and precuneus. There was an overall tendency towards decreasing modulatory effects of prior meal ingestion on food cue-evoked regional brain activity with increasing age. Healthy ageing to middle age is associated with diminishing sensitivity to meal ingestion of visual food cue-evoked activity in brain regions that represent the salience of food and direct food-associated behaviour. Reduced satiety sensing may have a role in the greater risk of
Visual-gustatory interaction: orbitofrontal and insular cortices mediate the effect of high-calorie visual food cues on taste pleasantness.

Science.gov (United States)

Ohla, Kathrin; Toepel, Ulrike; le Coutre, Johannes; Hudry, Julie

2012-01-01

Vision provides a primary sensory input for food perception. It raises expectations on taste and nutritional value and drives acceptance or rejection. So far, the impact of visual food cues varying in energy content on subsequent taste integration remains unexplored. Using electrical neuroimaging, we assessed whether high- and low-calorie food cues differentially influence the brain processing and perception of a subsequent neutral electric taste. When viewing high-calorie food images, participants reported the subsequent taste to be more pleasant than when low-calorie food images preceded the identical taste. Moreover, the taste-evoked neural activity was stronger in the bilateral insula and the adjacent frontal operculum (FOP) within 100 ms after taste onset when preceded by high- versus low-calorie cues. A similar pattern evolved in the anterior cingulate (ACC) and medial orbitofrontal cortex (OFC) around 180 ms, as well as, in the right insula, around 360 ms. The activation differences in the OFC correlated positively with changes in taste pleasantness, a finding that is an accord with the role of the OFC in the hedonic evaluation of taste. Later activation differences in the right insula likely indicate revaluation of interoceptive taste awareness. Our findings reveal previously unknown mechanisms of cross-modal, visual-gustatory, sensory interactions underlying food evaluation.
A speech reception in noise test for preschool children (the Galker-test)

DEFF Research Database (Denmark)

Lauritsen, Maj-Britt Glenn; Kreiner, Svend; Söderström, Margareta

2015-01-01

Purpose: This study evaluates initial validity and reliability of the “Galker test of speech reception in noise” developed for Danish preschool children suspected to have problems with hearing or understanding speech against strict psychometric standards and assesses acceptance by the children....... Methods:The Galker test is an audio-visual, computerised, word discrimination test in background noise, originally comprised of 50 word pairs. Three hundred and eighty eight children attending ordinary day care centres and aged 3–5 years were included. With multiple regression and the Rasch item response...... model it was examined whether the total score of the Galker test validly reflected item responses across subgroups defined by sex, age, bilingualism, tympanometry, audiometry and verbal comprehension. Results: A total of 370 children (95%) accepted testing and 339 (87%) completed all 50 items...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.