WorldWideScience

Sample records for audio-visual speech cue

  1. Dynamic Bayesian Networks for Audio-Visual Speech Recognition

    Directory of Open Access Journals (Sweden)

    Liang Luhong

    2002-01-01

    Full Text Available The use of visual features in audio-visual speech recognition (AVSR is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM and the factorial HMM (FHMM, and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.

  2. Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?

    Directory of Open Access Journals (Sweden)

    Magnus eAlm

    2015-07-01

    Full Text Available Gender and age have been found to affect adults’ audio-visual (AV speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20-30 years and middle-aged adults (50-60 years with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy towards more visually dominated responses.

  3. Speech and non-speech audio-visual illusions: a developmental study.

    Directory of Open Access Journals (Sweden)

    Corinne Tremblay

    Full Text Available It is well known that simultaneous presentation of incongruent audio and visual stimuli can lead to illusory percepts. Recent data suggest that distinct processes underlie non-specific intersensory speech as opposed to non-speech perception. However, the development of both speech and non-speech intersensory perception across childhood and adolescence remains poorly defined. Thirty-eight observers aged 5 to 19 were tested on the McGurk effect (an audio-visual illusion involving speech, the Illusory Flash effect and the Fusion effect (two audio-visual illusions not involving speech to investigate the development of audio-visual interactions and contrast speech vs. non-speech developmental patterns. Whereas the strength of audio-visual speech illusions varied as a direct function of maturational level, performance on non-speech illusory tasks appeared to be homogeneous across all ages. These data support the existence of independent maturational processes underlying speech and non-speech audio-visual illusory effects.

  4. Audio-Visual Speech Perception: A Developmental ERP Investigation

    Science.gov (United States)

    Knowland, Victoria C. P.; Mercure, Evelyne; Karmiloff-Smith, Annette; Dick, Fred; Thomas, Michael S. C.

    2014-01-01

    Being able to see a talking face confers a considerable advantage for speech perception in adulthood. However, behavioural data currently suggest that children fail to make full use of these available visual speech cues until age 8 or 9. This is particularly surprising given the potential utility of multiple informational cues during language…

  5. Intermodal timing relations and audio-visual speech recognition by normal-hearing adults.

    Science.gov (United States)

    McGrath, M; Summerfield, Q

    1985-02-01

    Audio-visual identification of sentences was measured as a function of audio delay in untrained observers with normal hearing; the soundtrack was replaced by rectangular pulses originally synchronized to the closing of the talker's vocal folds and then subjected to delay. When the soundtrack was delayed by 160 ms, identification scores were no better than when no acoustical information at all was provided. Delays of up to 80 ms had little effect on group-mean performance, but a separate analysis of a subgroup of better lipreaders showed a significant trend of reduced scores with increased delay in the range from 0-80 ms. A second experiment tested the interpretation that, although the main disruptive effect of the delay occurred on a syllabic time scale, better lipreaders might be attempting to use intermodal timing cues at a phonemic level. Normal-hearing observers determined whether a 120-Hz complex tone started before or after the opening of a pair of liplike Lissajou figures. Group-mean difference limens (70.7% correct DLs) were - 79 ms (sound leading) and + 138 ms (sound lagging), with no significant correlation between DLs and sentence lipreading scores. It was concluded that most observers, whether good lipreaders or not, possess insufficient sensitivity to intermodal timing cues in audio-visual speech for them to be used analogously to voice onset time in auditory speech perception. The results of both experiments imply that delays of up to about 40 ms introduced by signal-processing algorithms in aids to lipreading should not materially affect audio-visual speech understanding.

  6. APPLICATION OF PARTIAL LEAST SQUARES REGRESSION FOR AUDIO-VISUAL SPEECH PROCESSING AND MODELING

    Directory of Open Access Journals (Sweden)

    A. L. Oleinik

    2015-09-01

    Full Text Available Subject of Research. The paper deals with the problem of lip region image reconstruction from speech signal by means of Partial Least Squares regression. Such problems arise in connection with development of audio-visual speech processing methods. Audio-visual speech consists of acoustic and visual components (called modalities. Applications of audio-visual speech processing methods include joint modeling of voice and lips’ movement dynamics, synchronization of audio and video streams, emotion recognition, liveness detection. Method. Partial Least Squares regression was applied to solve the posed problem. This method extracts components of initial data with high covariance. These components are used to build regression model. Advantage of this approach lies in the possibility of achieving two goals: identification of latent interrelations between initial data components (e.g. speech signal and lip region image and approximation of initial data component as a function of another one. Main Results. Experimental research on reconstruction of lip region images from speech signal was carried out on VidTIMIT audio-visual speech database. Results of the experiment showed that Partial Least Squares regression is capable of solving reconstruction problem. Practical Significance. Obtained findings give the possibility to assert that Partial Least Squares regression is successfully applicable for solution of vast variety of audio-visual speech processing problems: from synchronization of audio and video streams to liveness detection.

  7. An audio-visual corpus for multimodal speech recognition in Dutch language

    NARCIS (Netherlands)

    Wojdel, J.; Wiggers, P.; Rothkrantz, L.J.M.

    2002-01-01

    This paper describes the gathering and availability of an audio-visual speech corpus for Dutch language. The corpus was prepared with the multi-modal speech recognition in mind and it is currently used in our research on lip-reading and bimodal speech recognition. It contains the prompts used also i

  8. Superior Temporal Activation in Response to Dynamic Audio-Visual Emotional Cues

    Science.gov (United States)

    Robins, Diana L.; Hunyadi, Elinora; Schultz, Robert T.

    2009-01-01

    Perception of emotion is critical for successful social interaction, yet the neural mechanisms underlying the perception of dynamic, audio-visual emotional cues are poorly understood. Evidence from language and sensory paradigms suggests that the superior temporal sulcus and gyrus (STS/STG) play a key role in the integration of auditory and visual…

  9. Effects of audio-visual information and mode of speech on listener perceptions of alaryngeal speakers.

    Science.gov (United States)

    Evitts, Paul M; Van Dine, Ami; Holler, Aline

    2009-01-01

    There is minimal research on listener perceptions of an individual with a laryngectomy (IWL) based on audio-visual information. The aim of this research was to provide preliminary insight into whether listeners have different perceptions of an individual with a laryngectomy based on mode of presentation (audio-only vs. audio-visual) and mode of speech (tracheoesophageal, oesophageal, electrolaryngeal, normal). Thirty-four naïve listeners were randomly presented with a standard reading passage produced by one typical speaker from each mode of speech in both audio-only and audio-visual presentation mode. Listeners used a visual analogue scale (10 cm line) to indicate their perceptions of each speaker's personality. A significant effect for mode of speech was present. There was no significant difference in listener perceptions between mode of presentation using individual ratings. However, principal component analysis showed ratings were more favourable in the audio-visual mode. Results of this study suggest that visual information may only have a minor impact on listener perceptions of a speakers' personality and that mode of speech and degree of speech proficiency may only play a small role in listener perceptions. However, results should be interpreted with caution as results are based on only one speaker per mode of speech.

  10. ANALYSIS OF MULTIMODAL FUSION TECHNIQUES FOR AUDIO-VISUAL SPEECH RECOGNITION

    Directory of Open Access Journals (Sweden)

    D.V. Ivanko

    2016-05-01

    Full Text Available The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV fusion (integration of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.

  11. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Koji Iwano

    2007-03-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  12. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Directory of Open Access Journals (Sweden)

    Iwano Koji

    2007-01-01

    Full Text Available This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.

  13. The effect of combined sensory and semantic components on audio-visual speech perception in older adults

    Directory of Open Access Journals (Sweden)

    Corrina eMaguinness

    2011-12-01

    Full Text Available Previous studies have found that perception in older people benefits from multisensory over uni-sensory information. As normal speech recognition is affected by both the auditory input and the visual lip-movements of the speaker, we investigated the efficiency of audio and visual integration in an older population by manipulating the relative reliability of the auditory and visual information in speech. We also investigated the role of the semantic context of the sentence to assess whether audio-visual integration is affected by top-down semantic processing. We presented participants with audio-visual sentences in which the visual component was either blurred or not blurred. We found that there was a greater cost in recall performance for semantically meaningless speech in the audio-visual blur compared to audio-visual no blur condition and this effect was specific to the older group. Our findings have implications for understanding how aging affects efficient multisensory integration for the perception of speech and suggests that multisensory inputs may benefit speech perception in older adults when the semantic content of the speech is unpredictable.

  14. Audio-visual speech timing sensitivity is enhanced in cluttered conditions.

    Directory of Open Access Journals (Sweden)

    Warrick Roseboom

    Full Text Available Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.

  15. Audio-visual perception of compressed speech by profoundly hearing-impaired subjects.

    Science.gov (United States)

    Drullman, R; Smoorenburg, G F

    1997-01-01

    For many people with profound hearing loss conventional hearing aids give only little support in speechreading. This study aims at optimizing the presentation of speech signals in the severely reduced dynamic range of the profoundly hearing impaired by means of multichannel compression and multichannel amplification. The speech signal in each of six 1-octave channels (125-4000 Hz) was compressed instantaneously, using compression ratios of 1, 2, 3, or 5, and a compression threshold of 35 dB below peak level. A total of eight conditions were composed in which the compression ratio varied per channel. Sentences were presented audio-visually to 16 profoundly hearing-impaired subjects and syllable intelligibility was measured. Results show that all auditory signals are valuable supplements to speechreading. No clear overall preference is found for any of the compression conditions, but relatively high compression ratios (> 3-5) have a significantly detrimental effect. Inspection of the individual results reveals that compression may be beneficial for one subject.

  16. Neural entrainment to rhythmically-presented auditory, visual and audio-visual speech in children

    Directory of Open Access Journals (Sweden)

    Alan James Power

    2012-07-01

    Full Text Available Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal ‘samples’ of information from the speech stream at different rates, phase-resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (‘phase locking’. Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase-locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically-developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate based on repetition of the syllable ba, presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a talking head. To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the ba stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a ba in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling

  17. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study.

    Science.gov (United States)

    Kumar, G Vinodh; Halder, Tamesh; Jaiswal, Amit K; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300-600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus, our

  18. Large Scale Functional Brain Networks Underlying Temporal Integration of Audio-Visual Speech Perception: An EEG Study

    Science.gov (United States)

    Kumar, G. Vinodh; Halder, Tamesh; Jaiswal, Amit K.; Mukherjee, Abhishek; Roy, Dipanjan; Banerjee, Arpan

    2016-01-01

    Observable lip movements of the speaker influence perception of auditory speech. A classical example of this influence is reported by listeners who perceive an illusory (cross-modal) speech sound (McGurk-effect) when presented with incongruent audio-visual (AV) speech stimuli. Recent neuroimaging studies of AV speech perception accentuate the role of frontal, parietal, and the integrative brain sites in the vicinity of the superior temporal sulcus (STS) for multisensory speech perception. However, if and how does the network across the whole brain participates during multisensory perception processing remains an open question. We posit that a large-scale functional connectivity among the neural population situated in distributed brain sites may provide valuable insights involved in processing and fusing of AV speech. Varying the psychophysical parameters in tandem with electroencephalogram (EEG) recordings, we exploited the trial-by-trial perceptual variability of incongruent audio-visual (AV) speech stimuli to identify the characteristics of the large-scale cortical network that facilitates multisensory perception during synchronous and asynchronous AV speech. We evaluated the spectral landscape of EEG signals during multisensory speech perception at varying AV lags. Functional connectivity dynamics for all sensor pairs was computed using the time-frequency global coherence, the vector sum of pairwise coherence changes over time. During synchronous AV speech, we observed enhanced global gamma-band coherence and decreased alpha and beta-band coherence underlying cross-modal (illusory) perception compared to unisensory perception around a temporal window of 300–600 ms following onset of stimuli. During asynchronous speech stimuli, a global broadband coherence was observed during cross-modal perception at earlier times along with pre-stimulus decreases of lower frequency power, e.g., alpha rhythms for positive AV lags and theta rhythms for negative AV lags. Thus

  19. A Novel Algorithm for Acoustic and Visual Classifiers Decision Fusion in Audio-Visual Speech Recognition System

    Directory of Open Access Journals (Sweden)

    P.S. Sathidevi

    2010-03-01

    Full Text Available Audio-visual speech recognition (AVSR using acoustic and visual signals of speech have received attention recently because of its robustness in noisy environments. Perceptual studies also support this approach by emphasizing the importance of visual information for speech recognition in humans. An important issue in decision fusion based AVSR system is how to obtain the appropriate integration weight for the speech modalities to integrate and ensure the combined AVSR system’s performances better than that of the audio-only and visual-only systems under various noise conditions. To solve this issue, we present a genetic algorithm (GA based optimization scheme to obtain the appropriate integration weight from the relative reliability of each modality. The performance of the proposed GA optimized reliability-ratio based weight estimation scheme is demonstrated via single speaker, mobile functions isolated word recognition experiments. The results show that the proposed scheme improves robust recognition accuracy over the conventional unimodal systems and the baseline reliability ratio-based AVSR system under various signal to noise ratio conditions.

  20. The Effect of Onset Asynchrony in Audio Visual Speech and the Uncanny Valley in Virtual Characters

    DEFF Research Database (Denmark)

    Tinwell, Angela; Grimshaw, Mark; Abdel Nabi, Deborah

    2015-01-01

    This study investigates if the Uncanny Valley phenomenon is increased for realistic, human-like characters with an asynchrony of lip movement during speech. An experiment was conducted in which 113 participants rated, a human and a realistic, talking-head, human-like, virtual character over a range...... of onset asynchronies for both perceived familiarity and human-likeness. The results show that virtual characters were regarded as more uncanny (less familiar and human-like) than humans and that increasing levels of asynchrony increased perception of uncanniness. Interestingly, participants were more...... sensitive to the uncanny in characters when the audio stream preceded the visual stream than with asynchronous footage where the video stream preceded the audio stream. This paper considers possible psychological explanations as to why the magnitude and direction of an asynchrony of speech dictates...

  1. Audio-Visual Perception of Gender by Infants Emerges Earlier for Adult-Directed Speech

    Science.gov (United States)

    Richoz, Anne-Raphaëlle; Quinn, Paul C.; Hillairet de Boisferon, Anne; Berger, Carole; Loevenbruck, Hélène; Lewkowicz, David J.; Lee, Kang; Dole, Marjorie; Caldara, Roberto; Pascalis, Olivier

    2017-01-01

    Early multisensory perceptual experiences shape the abilities of infants to perform socially-relevant visual categorization, such as the extraction of gender, age, and emotion from faces. Here, we investigated whether multisensory perception of gender is influenced by infant-directed (IDS) or adult-directed (ADS) speech. Six-, 9-, and 12-month-old infants saw side-by-side silent video-clips of talking faces (a male and a female) and heard either a soundtrack of a female or a male voice telling a story in IDS or ADS. Infants participated in only one condition, either IDS or ADS. Consistent with earlier work, infants displayed advantages in matching female relative to male faces and voices. Moreover, the new finding that emerged in the current study was that extraction of gender from face and voice was stronger at 6 months with ADS than with IDS, whereas at 9 and 12 months, matching did not differ for IDS versus ADS. The results indicate that the ability to perceive gender in audiovisual speech is influenced by speech manner. Our data suggest that infants may extract multisensory gender information developmentally earlier when looking at adults engaged in conversation with other adults (i.e., ADS) than when adults are directly talking to them (i.e., IDS). Overall, our findings imply that the circumstances of social interaction may shape early multisensory abilities to perceive gender. PMID:28060872

  2. Audio-Visual and Meaningful Semantic Context Enhancements in Older and Younger Adults.

    Science.gov (United States)

    Smayda, Kirsten E; Van Engen, Kristin J; Maddox, W Todd; Chandrasekaran, Bharath

    2016-01-01

    Speech perception is critical to everyday life. Oftentimes noise can degrade a speech signal; however, because of the cues available to the listener, such as visual and semantic cues, noise rarely prevents conversations from continuing. The interaction of visual and semantic cues in aiding speech perception has been studied in young adults, but the extent to which these two cues interact for older adults has not been studied. To investigate the effect of visual and semantic cues on speech perception in older and younger adults, we recruited forty-five young adults (ages 18-35) and thirty-three older adults (ages 60-90) to participate in a speech perception task. Participants were presented with semantically meaningful and anomalous sentences in audio-only and audio-visual conditions. We hypothesized that young adults would outperform older adults across SNRs, modalities, and semantic contexts. In addition, we hypothesized that both young and older adults would receive a greater benefit from a semantically meaningful context in the audio-visual relative to audio-only modality. We predicted that young adults would receive greater visual benefit in semantically meaningful contexts relative to anomalous contexts. However, we predicted that older adults could receive a greater visual benefit in either semantically meaningful or anomalous contexts. Results suggested that in the most supportive context, that is, semantically meaningful sentences presented in the audiovisual modality, older adults performed similarly to young adults. In addition, both groups received the same amount of visual and meaningful benefit. Lastly, across groups, a semantically meaningful context provided more benefit in the audio-visual modality relative to the audio-only modality, and the presence of visual cues provided more benefit in semantically meaningful contexts relative to anomalous contexts. These results suggest that older adults can perceive speech as well as younger adults when both

  3. Perception of Audio-Visual Speech Synchrony in Spanish-Speaking Children with and without Specific Language Impairment

    Science.gov (United States)

    Pons, Ferran; Andreu, Llorenc; Sanz-Torrent, Monica; Buil-Legaz, Lucia; Lewkowicz, David J.

    2013-01-01

    Speech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the…

  4. Audio-Visual Tibetan Speech Recognition Based on a Deep Dynamic Bayesian Network for Natural Human Robot Interaction

    Directory of Open Access Journals (Sweden)

    Yue Zhao

    2012-12-01

    Full Text Available Audio‐visual speech recognition is a natural and robust approach to improving human-robot interaction in noisy environments. Although multi‐stream Dynamic Bayesian Network and coupled HMM are widely used for audio‐visual speech recognition, they fail to learn the shared features between modalities and ignore the dependency of features among the frames within each discrete state. In this paper, we propose a Deep Dynamic Bayesian Network (DDBN to perform unsupervised extraction of spatial‐temporal multimodal features from Tibetan audio‐visual speech data and build an accurate audio‐visual speech recognition model under a no frame‐independency assumption. The experiment results on Tibetan speech data from some real‐world environments showed the proposed DDBN outperforms the state‐of‐art methods in word recognition accuracy.

  5. Basic to Applied Research: The Benefits of Audio-Visual Speech Perception Research in Teaching Foreign Languages

    Science.gov (United States)

    Erdener, Dogu

    2016-01-01

    Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the…

  6. Audio-Visual Aids: Historians in Blunderland.

    Science.gov (United States)

    Decarie, Graeme

    1988-01-01

    A history professor relates his experiences producing and using audio-visual material and warns teachers not to rely on audio-visual aids for classroom presentations. Includes examples of popular audio-visual aids on Canada that communicate unintended, inaccurate, or unclear ideas. Urges teachers to exercise caution in the selection and use of…

  7. [Audio-visual aids and tropical medicine].

    Science.gov (United States)

    Morand, J J

    1989-01-01

    The author presents a list of the audio-visual productions about Tropical Medicine, as well as of their main characteristics. He thinks that the audio-visual educational productions are often dissociated from their promotion; therefore, he invites the future creator to forward his work to the Audio-Visual Health Committee.

  8. Alfasecuencialización: la enseñanza del cine en la era del audiovisual Sequential literacy: the teaching of cinema in the age of audio-visual speech

    Directory of Open Access Journals (Sweden)

    José Antonio Palao Errando

    2007-10-01

    Full Text Available En la llamada «sociedad de la información» los estudios sobre cine se han visto diluidos en el abordaje pragmático y tecnológico del discurso audiovisual, así como la propia fruición del cine se ha visto atrapada en la red del DVD y del hipertexto. El propio cine reacciona ante ello a través de estructuras narrativas complejas que lo alejan del discurso audiovisual estándar. La función de los estudios sobre cine y de su enseñanza universitaria debe ser la reintroducción del sujeto rechazado del saber informativo por medio de la interpretación del texto fílmico. In the so called «information society», film studies have been diluted in the pragmatic and technological approaching of the audiovisual speech, as well as the own fruition of the cinema has been caught in the net of DVD and hypertext. The cinema itself reacts in the face of it through complex narrative structures that take it away from the standard audio-visual speech. The function of film studies at the university education should be the reintroduction of the rejected subject of the informative knowledge by means of the interpretation of film text.

  9. Audio-Visual Aids in Universities

    Science.gov (United States)

    Douglas, Jackie

    1970-01-01

    A report on the proceedings and ideas expressed at a one day seminar on "Audio-Visual Equipment--Its Uses and Applications for Teaching and Research in Universities." The seminar was organized by England's National Committee for Audio-Visual Aids in Education in conjunction with the British Universities Film Council. (LS)

  10. Temporal structure and complexity affect audio-visual correspondence detection

    Directory of Open Access Journals (Sweden)

    Rachel N Denison

    2013-01-01

    Full Text Available Synchrony between events in different senses has long been considered the critical temporal cue for multisensory integration. Here, using rapid streams of auditory and visual events, we demonstrate how humans can use temporal structure (rather than mere temporal coincidence to detect multisensory relatedness. We find psychophysically that participants can detect matching auditory and visual streams via shared temporal structure for crossmodal lags of up to 200 ms. Performance on this task reproduced features of past findings based on explicit timing judgments but did not show any special advantage for perfectly synchronous streams. Importantly, the complexity of temporal patterns influences sensitivity to correspondence. Stochastic, irregular streams – with richer temporal pattern information – led to higher audio-visual matching sensitivity than predictable, rhythmic streams. Our results reveal that temporal structure and its complexity are key determinants for human detection of audio-visual correspondence. The distinctive emphasis of our new paradigms on temporal patterning could be useful for studying special populations with suspected abnormalities in audio-visual temporal perception and multisensory integration.

  11. Joint evaluation of communication quality and user experience in an audio-visual virtual reality meeting

    DEFF Research Database (Denmark)

    Møller, Anders Kalsgaard; Hoffmann, Pablo F.; Carrozzino, Marcello

    2013-01-01

    The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...... and the indexical characteristics of the speaker. The results will be available in the final paper. Index Terms: speech intelligibility , virtual reality, body language, telecommunication.......The state-of-the-art speech intelligibility tests are created with the purpose of evaluating acoustic communication devices and not for evaluating audio-visual virtual reality systems. This paper present a novel method to evaluate a communication situation based on both the speech intelligibility...

  12. Speech identification in noise: Contribution of temporal, spectral, and visual speech cues.

    Science.gov (United States)

    Kim, Jeesun; Davis, Chris; Groot, Christopher

    2009-12-01

    This study investigated the degree to which two types of reduced auditory signals (cochlear implant simulations) and visual speech cues combined for speech identification. The auditory speech stimuli were filtered to have only amplitude envelope cues or both amplitude envelope and spectral cues and were presented with/without visual speech. In Experiment 1, IEEE sentences were presented in quiet and noise. For in-quiet presentation, speech identification was enhanced by the addition of both spectral and visual speech cues. Due to a ceiling effect, the degree to which these effects combined could not be determined. In noise, these facilitation effects were more marked and were additive. Experiment 2 examined consonant and vowel identification in the context of CVC or VCV syllables presented in noise. For consonants, both spectral and visual speech cues facilitated identification and these effects were additive. For vowels, the effect of combined cues was underadditive, with the effect of spectral cues reduced when presented with visual speech cues. Analysis indicated that without visual speech, spectral cues facilitated the transmission of place information and vowel height, whereas with visual speech, they facilitated lip rounding, with little impact on the transmission of place information.

  13. Stream Weight Training Based on MCE for Audio-Visual LVCSR

    Institute of Scientific and Technical Information of China (English)

    LIU Peng; WANG Zuoying

    2005-01-01

    In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re-scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental results show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments.

  14. Audio-visual affective expression recognition

    Science.gov (United States)

    Huang, Thomas S.; Zeng, Zhihong

    2007-11-01

    Automatic affective expression recognition has attracted more and more attention of researchers from different disciplines, which will significantly contribute to a new paradigm for human computer interaction (affect-sensitive interfaces, socially intelligent environments) and advance the research in the affect-related fields including psychology, psychiatry, and education. Multimodal information integration is a process that enables human to assess affective states robustly and flexibly. In order to understand the richness and subtleness of human emotion behavior, the computer should be able to integrate information from multiple sensors. We introduce in this paper our efforts toward machine understanding of audio-visual affective behavior, based on both deliberate and spontaneous displays. Some promising methods are presented to integrate information from both audio and visual modalities. Our experiments show the advantage of audio-visual fusion in affective expression recognition over audio-only or visual-only approaches.

  15. Segmentation cues in conversational speech: Robust semantics and fragile phonotactics

    Directory of Open Access Journals (Sweden)

    Laurence eWhite

    2012-10-01

    Full Text Available Multiple cues influence listeners’ segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker’s articulatory effort – hyperarticulation vs hypoarticulation (H&H – may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners’ interpretation of segmentation cues is affected by speech style (spontaneous conversation vs read, using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylised landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues – semantic likelihood and cross-boundary diphone phonotactics – was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech.Independent of speech style, we found an interaction between cue valence (favourable/unfavourable and cue type (phonotactics/semantics. Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behaviour. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically-based cues in the segmentation of natural

  16. Segmentation cues in conversational speech: robust semantics and fragile phonotactics.

    Science.gov (United States)

    White, Laurence; Mattys, Sven L; Wiget, Lukas

    2012-01-01

    Multiple cues influence listeners' segmentation of connected speech into words, but most previous studies have used stimuli elicited in careful readings rather than natural conversation. Discerning word boundaries in conversational speech may differ from the laboratory setting. In particular, a speaker's articulatory effort - hyperarticulation vs. hypoarticulation (H&H) - may vary according to communicative demands, suggesting a compensatory relationship whereby acoustic-phonetic cues are attenuated when other information sources strongly guide segmentation. We examined how listeners' interpretation of segmentation cues is affected by speech style (spontaneous conversation vs. read), using cross-modal identity priming. To elicit spontaneous stimuli, we used a map task in which speakers discussed routes around stylized landmarks. These landmarks were two-word phrases in which the strength of potential segmentation cues - semantic likelihood and cross-boundary diphone phonotactics - was systematically varied. Landmark-carrying utterances were transcribed and later re-recorded as read speech. Independent of speech style, we found an interaction between cue valence (favorable/unfavorable) and cue type (phonotactics/semantics). Thus, there was an effect of semantic plausibility, but no effect of cross-boundary phonotactics, indicating that the importance of phonotactic segmentation may have been overstated in studies where lexical information was artificially suppressed. These patterns were unaffected by whether the stimuli were elicited in a spontaneous or read context, even though the difference in speech styles was evident in a main effect. Durational analyses suggested speaker-driven cue trade-offs congruent with an H&H account, but these modulations did not impact on listener behavior. We conclude that previous research exploiting read speech is reliable in indicating the primacy of lexically based cues in the segmentation of natural conversational speech.

  17. Modeling the Development of Audiovisual Cue Integration in Speech Perception

    Science.gov (United States)

    Getz, Laura M.; Nordeen, Elke R.; Vrabic, Sarah C.; Toscano, Joseph C.

    2017-01-01

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues. PMID:28335558

  18. The contribution of dynamic visual cues to audiovisual speech perception.

    Science.gov (United States)

    Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

    2015-08-01

    Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech.

  19. Cues That Language Users Exploit to Segment Speech

    Institute of Scientific and Technical Information of China (English)

    陈冰茹

    2015-01-01

    <正>The capability to segment words from fluent speech is an important step for learning and acquiring a language(Jusczyk,1999).Therefore,a number of researches and studies have focused on various cues that language learners exploit to locate word boundaries.During the half century,it has been discussed that there are mainly four crucial cues can be used by listeners to segment words in speech.Particularly,they are:(1)Prosody(Echols et al.1997;Jusczyk et al.1996):(2)Statistical and distributional regularities(Brent et al.1996;Saffran et al.1996);(3)Phonotactics(Brent et al.1996;Myers et al.1996);

  20. Decision-level fusion for audio-visual laughter detection

    NARCIS (Netherlands)

    Reuderink, B.; Poel, M.; Truong, K.; Poppe, R.; Pantic, M.

    2008-01-01

    Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is per

  1. Decision-Level Fusion for Audio-Visual Laughter Detection

    NARCIS (Netherlands)

    Reuderink, Boris; Poel, Mannes; Truong, Khiet; Poppe, Ronald; Pantic, Maja; Popescu-Belis, Andrei; Stiefelhagen, Rainer

    2008-01-01

    Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laugh- ter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio- visual laughter detection is

  2. Cross-language differences in cue use for speech segmentation.

    Science.gov (United States)

    Tyler, Michael D; Cutler, Anne

    2009-07-01

    Two artificial-language learning experiments directly compared English, French, and Dutch listeners' use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable "words." These words were demarcated by (a) no cue other than transitional probabilities induced by their recurrence, (b) a consistent left-edge cue, or (c) a consistent right-edge cue. Experiment 1 examined a vowel lengthening cue. All three listener groups benefited from this cue in right-edge position; none benefited from it in left-edge position. Experiment 2 examined a pitch-movement cue. English listeners used this cue in left-edge position, French listeners used it in right-edge position, and Dutch listeners used it in both positions. These findings are interpreted as evidence of both language-universal and language-specific effects. Final lengthening is a language-universal effect expressing a more general (non-linguistic) mechanism. Pitch movement expresses prominence which has characteristically different placements across languages: typically at right edges in French, but at left edges in English and Dutch. Finally, stress realization in English versus Dutch encourages greater attention to suprasegmental variation by Dutch than by English listeners, allowing Dutch listeners to benefit from an informative pitch-movement cue even in an uncharacteristic position.

  3. Cross-language differences in cue use for speech segmentation

    NARCIS (Netherlands)

    Tyler, M.D.; Cutler, A.

    2009-01-01

    Two artificial-language learning experiments directly compared English, French, and Dutch listeners' use of suprasegmental cues for continuous-speech segmentation. In both experiments, listeners heard unbroken sequences of consonant-vowel syllables, composed of recurring three- and four-syllable "wo

  4. A Review on Audio-visual Translation Studies

    Institute of Scientific and Technical Information of China (English)

    李瑶

    2008-01-01

    <正>This paper is dedicated to a thorough review on the audio-visual related translations from both home and abroad.In reviewing the foreign achievements on this specific field of translation studies it can shed some lights on our national audio-visual practice and research.The review on the Chinese scholars’ audio-visual translation studies is to offer the potential developing direction and guidelines to the studies and aspects neglected as well.Based on the summary of relevant studies,possible topics for further studies are proposed.

  5. Audio-Visual Perception System for a Humanoid Robotic Head

    Directory of Open Access Journals (Sweden)

    Raquel Viciana-Abad

    2014-05-01

    Full Text Available One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  6. Audio-visual perception system for a humanoid robotic head.

    Science.gov (United States)

    Viciana-Abad, Raquel; Marfil, Rebeca; Perez-Lorenzo, Jose M; Bandera, Juan P; Romero-Garces, Adrian; Reche-Lopez, Pedro

    2014-01-01

    One of the main issues within the field of social robotics is to endow robots with the ability to direct attention to people with whom they are interacting. Different approaches follow bio-inspired mechanisms, merging audio and visual cues to localize a person using multiple sensors. However, most of these fusion mechanisms have been used in fixed systems, such as those used in video-conference rooms, and thus, they may incur difficulties when constrained to the sensors with which a robot can be equipped. Besides, within the scope of interactive autonomous robots, there is a lack in terms of evaluating the benefits of audio-visual attention mechanisms, compared to only audio or visual approaches, in real scenarios. Most of the tests conducted have been within controlled environments, at short distances and/or with off-line performance measurements. With the goal of demonstrating the benefit of fusing sensory information with a Bayes inference for interactive robotics, this paper presents a system for localizing a person by processing visual and audio data. Moreover, the performance of this system is evaluated and compared via considering the technical limitations of unimodal systems. The experiments show the promise of the proposed approach for the proactive detection and tracking of speakers in a human-robot interactive framework.

  7. Audio-Visual Integration of Emotional Information

    Directory of Open Access Journals (Sweden)

    Penny Bergman

    2011-10-01

    Full Text Available Emotions are central to our perception of the environment surrounding us (Berlyne, 1971. An important aspect in the emotional response to a sound is dependent on the meaning of the sound, ie, it is not the physical parameter per se that determines our emotional response to the sound but rather the source of the sound (Genell, 2008, and the relevance it has to the self (Tajadura-Jiménez et al 2010. When exposed to sound together with visual information, the information from both modalities is integrated, altering the perception of each modality, in order to generate a coherent experience. In emotional information this integration is rapid and without requirements of attentional processes (De Gelder, 1999. The present experiment investigates perception of pink noise in two visual settings in a within-subjects design. Nineteen participants rated the same sound twice in terms of pleasantness and arousal in either a pleasant or an unpleasant visual setting. The results showed that pleasantness of the sound decreased in the negative visual setting, thus suggesting an audio-visual integration, where the affective information in the visual modality is translated to the auditory modality when information-markers are lacking in it. The results are discussed in relation to theories of emotion perception.

  8. Auditory and audio-visual processing in patients with cochlear, auditory brainstem, and auditory midbrain implants: An EEG study.

    Science.gov (United States)

    Schierholz, Irina; Finke, Mareike; Kral, Andrej; Büchner, Andreas; Rach, Stefan; Lenarz, Thomas; Dengler, Reinhard; Sandmann, Pascale

    2017-04-01

    There is substantial variability in speech recognition ability across patients with cochlear implants (CIs), auditory brainstem implants (ABIs), and auditory midbrain implants (AMIs). To better understand how this variability is related to central processing differences, the current electroencephalography (EEG) study compared hearing abilities and auditory-cortex activation in patients with electrical stimulation at different sites of the auditory pathway. Three different groups of patients with auditory implants (Hannover Medical School; ABI: n = 6, CI: n = 6; AMI: n = 2) performed a speeded response task and a speech recognition test with auditory, visual, and audio-visual stimuli. Behavioral performance and cortical processing of auditory and audio-visual stimuli were compared between groups. ABI and AMI patients showed prolonged response times on auditory and audio-visual stimuli compared with NH listeners and CI patients. This was confirmed by prolonged N1 latencies and reduced N1 amplitudes in ABI and AMI patients. However, patients with central auditory implants showed a remarkable gain in performance when visual and auditory input was combined, in both speech and non-speech conditions, which was reflected by a strong visual modulation of auditory-cortex activation in these individuals. In sum, the results suggest that the behavioral improvement for audio-visual conditions in central auditory implant patients is based on enhanced audio-visual interactions in the auditory cortex. Their findings may provide important implications for the optimization of electrical stimulation and rehabilitation strategies in patients with central auditory prostheses. Hum Brain Mapp 38:2206-2225, 2017. © 2017 Wiley Periodicals, Inc.

  9. Proper Use of Audio-Visual Aids: Essential for Educators.

    Science.gov (United States)

    Dejardin, Conrad

    1989-01-01

    Criticizes educators as the worst users of audio-visual aids and among the worst public speakers. Offers guidelines for the proper use of an overhead projector and the development of transparencies. (DMM)

  10. Harmonic cues for speech segmentation: a cross-linguistic corpus study on child-directed speech.

    Science.gov (United States)

    Ketrez, F Nihan

    2014-03-01

    Previous studies on the role of vowel harmony in word segmentation are based on artificial languages where harmonic cues reliably signal word boundaries. In this corpus study run on the data available at CHILDES, we investigated whether natural languages provide a learner with reliable segmentation cues similar to the ones created artificially. We observed that in harmonic languages (child-directed speech to thirty-five Turkish and three Hungarian children), but not in non-harmonic ones (child-directed speech to one Farsi and four Polish children), harmonic vowel sequences are more likely to appear within words, and non-harmonic ones mostly appear across word boundaries, suggesting that natural harmonic languages provide a learner with regular cues that could potentially be used for word segmentation along with other cues.

  11. Acoustic cues in the perception of second language speech sounds

    Science.gov (United States)

    Bogacka, Anna A.

    2001-05-01

    The experiment examined to what acoustic cues Polish learners of English pay attention when distinguishing between English high vowels. Predictions concerned the influence of Polish vowel system (no duration differences and only one vowel in the high back vowel region), salience of duration cues and L1 orthography. Thirty-seven Polish subjects and a control group of English native speakers identified stimuli from heed-hid and who'd-hood continua varying in spectral and duration steps. Identification scores by spectral and duration steps, and F1/F2 plots of identifications, were given as well as fundamental frequency variation comments. English subjects strongly relied on spectral cues (typical categorical perception) and almost did not react to temporal cues. Polish subjects relied strongly on temporal cues for both continua, but showed a reversed pattern of identification of who'd-hood contrast. Their reliance on spectral cues was weak and had a reversed pattern for heed-hid contrast. The results were interpreted with reference to the speech learning model [Flege (1995)], perceptual assimilation model [Best (1995)] and ontogeny phylogeny model [Major (2001)].

  12. Acoustic cues to lexical segmentation: a study of resynthesized speech.

    Science.gov (United States)

    Spitzer, Stephanie M; Liss, Julie M; Mattys, Sven L

    2007-12-01

    It has been posited that the role of prosody in lexical segmentation is elevated when the speech signal is degraded or unreliable. Using predictions from Cutler and Norris' [J. Exp. Psychol. Hum. Percept. Perform. 14, 113-121 (1988)] metrical segmentation strategy hypothesis as a framework, this investigation examined how individual suprasegmental and segmental cues to syllabic stress contribute differentially to the recognition of strong and weak syllables for the purpose of lexical segmentation. Syllabic contrastivity was reduced in resynthesized phrases by systematically (i) flattening the fundamental frequency (F0) contours, (ii) equalizing vowel durations, (iii) weakening strong vowels, (iv) combining the two suprasegmental cues, i.e., F0 and duration, and (v) combining the manipulation of all cues. Results indicated that, despite similar decrements in overall intelligibility, F0 flattening and the weakening of strong vowels had a greater impact on lexical segmentation than did equalizing vowel duration. Both combined-cue conditions resulted in greater decrements in intelligibility, but with no additional negative impact on lexical segmentation. The results support the notion of F0 variation and vowel quality as primary conduits for stress-based segmentation and suggest that the effectiveness of stress-based segmentation with degraded speech must be investigated relative to the suprasegmental and segmental impoverishments occasioned by each particular degradation.

  13. Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition

    NARCIS (Netherlands)

    Jesse, A.; McQueen, J.M.

    2014-01-01

    Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes i

  14. The Fungible Audio-Visual Mapping and its Experience

    Directory of Open Access Journals (Sweden)

    Adriana Sa

    2014-12-01

    Full Text Available This article draws a perceptual approach to audio-visual mapping. Clearly perceivable cause and effect relationships can be problematic if one desires the audience to experience the music. Indeed perception would bias those sonic qualities that fit previous concepts of causation, subordinating other sonic qualities, which may form the relations between the sounds themselves. The question is, how can an audio-visual mapping produce a sense of causation, and simultaneously confound the actual cause-effect relationships. We call this a fungible audio-visual mapping. Our aim here is to glean its constitution and aspect. We will report a study, which draws upon methods from experimental psychology to inform audio-visual instrument design and composition. The participants are shown several audio-visual mapping prototypes, after which we pose quantitative and qualitative questions regarding their sense of causation, and their sense of understanding the cause-effect relationships. The study shows that a fungible mapping requires both synchronized and seemingly non-related components – sufficient complexity to be confusing. As the specific cause-effect concepts remain inconclusive, the sense of causation embraces the whole. 

  15. Kernel-Based Sensor Fusion With Application to Audio-Visual Voice Activity Detection

    Science.gov (United States)

    Dov, David; Talmon, Ronen; Cohen, Israel

    2016-12-01

    In this paper, we address the problem of multiple view data fusion in the presence of noise and interferences. Recent studies have approached this problem using kernel methods, by relying particularly on a product of kernels constructed separately for each view. From a graph theory point of view, we analyze this fusion approach in a discrete setting. More specifically, based on a statistical model for the connectivity between data points, we propose an algorithm for the selection of the kernel bandwidth, a parameter, which, as we show, has important implications on the robustness of this fusion approach to interferences. Then, we consider the fusion of audio-visual speech signals measured by a single microphone and by a video camera pointed to the face of the speaker. Specifically, we address the task of voice activity detection, i.e., the detection of speech and non-speech segments, in the presence of structured interferences such as keyboard taps and office noise. We propose an algorithm for voice activity detection based on the audio-visual signal. Simulation results show that the proposed algorithm outperforms competing fusion and voice activity detection approaches. In addition, we demonstrate that a proper selection of the kernel bandwidth indeed leads to improved performance.

  16. The unity assumption facilitates cross-modal binding of musical, non-speech stimuli: The role of spectral and amplitude envelope cues.

    Science.gov (United States)

    Chuen, Lorraine; Schutz, Michael

    2016-07-01

    An observer's inference that multimodal signals originate from a common underlying source facilitates cross-modal binding. This 'unity assumption' causes asynchronous auditory and visual speech streams to seem simultaneous (Vatakis & Spence, Perception & Psychophysics, 69(5), 744-756, 2007). Subsequent tests of non-speech stimuli such as musical and impact events found no evidence for the unity assumption, suggesting the effect is speech-specific (Vatakis & Spence, Acta Psychologica, 127(1), 12-23, 2008). However, the role of amplitude envelope (the changes in energy of a sound over time) was not previously appreciated within this paradigm. Here, we explore whether previous findings suggesting speech-specificity of the unity assumption were confounded by similarities in the amplitude envelopes of the contrasted auditory stimuli. Experiment 1 used natural events with clearly differentiated envelopes: single notes played on either a cello (bowing motion) or marimba (striking motion). Participants performed an un-speeded temporal order judgments task; viewing audio-visually matched (e.g., marimba auditory with marimba video) and mismatched (e.g., cello auditory with marimba video) versions of stimuli at various stimulus onset asynchronies, and were required to indicate which modality was presented first. As predicted, participants were less sensitive to temporal order in matched conditions, demonstrating that the unity assumption can facilitate the perception of synchrony outside of speech stimuli. Results from Experiments 2 and 3 revealed that when spectral information was removed from the original auditory stimuli, amplitude envelope alone could not facilitate the influence of audiovisual unity. We propose that both amplitude envelope and spectral acoustic cues affect the percept of audiovisual unity, working in concert to help an observer determine when to integrate across modalities.

  17. An Audio-Visual Lecture Course in Russian Culture

    Science.gov (United States)

    Leighton, Lauren G.

    1977-01-01

    An audio-visual course in Russian culture is given at Northern Illinois University. A collection of 4-5,000 color slides is the basis for the course, with lectures focussed on literature, philosophy, religion, politics, art and crafts. Acquisition, classification, storage and presentation of slides, and organization of lectures are discussed. (CHK)

  18. Voice activity detection using audio-visual information

    DEFF Research Database (Denmark)

    Petsatodis, Theodore; Pnevmatikakis, Aristodemos; Boukis, Christos

    2009-01-01

    An audio-visual voice activity detector that uses sensors positioned distantly from the speaker is presented. Its constituting unimodal detectors are based on the modeling of the temporal variation of audio and visual features using Hidden Markov Models; their outcomes are fused using a post-deci...

  19. Audio-Visual Aid in Teaching "Fatty Liver"

    Science.gov (United States)

    Dash, Sambit; Kamath, Ullas; Rao, Guruprasad; Prakash, Jay; Mishra, Snigdha

    2016-01-01

    Use of audio visual tools to aid in medical education is ever on a rise. Our study intends to find the efficacy of a video prepared on "fatty liver," a topic that is often a challenge for pre-clinical teachers, in enhancing cognitive processing and ultimately learning. We prepared a video presentation of 11:36 min, incorporating various…

  20. Market potential for interactive audio-visual media

    NARCIS (Netherlands)

    Leurdijk, A.; Limonard, S.

    2005-01-01

    NM2 (New Media for a New Millennium) develops tools for interactive, personalised and non-linear audio-visual content that will be tested in seven pilot productions. This paper looks at the market potential for these productions from a technological, a business and a users' perspective. It shows tha

  1. Audio-visual voice activity detection

    Institute of Scientific and Technical Information of China (English)

    LIU Peng; WANG Zuo-ying

    2006-01-01

    In speech signal processing systems,frame-energy based voice activity detection (VAD) method may be interfered with the background noise and non-stationary characteristic of the frame-energy in voice segment.The purpose of this paper is to improve the performance and robustness of VAD by introducing visual information.Meanwhile,data-driven linear transformation is adopted in visual feature extraction,and a general statistical VAD model is designed.Using the general model and a two-stage fusion strategy presented in this paper,a concrete multimodal VAD system is built.Experiments show that a 55.0% relative reduction in frame error rate and a 98.5% relative reduction in sentence-breaking error rate are obtained when using multimodal VAD,compared to frame-energy based audio VAD.The results show that using multimodal method,sentence-breaking errors are almost avoided,and flame-detection performance is clearly improved, which proves the effectiveness of the visual modal in VAD.

  2. Asynchrony adaptation reveals neural population code for audio-visual timing.

    Science.gov (United States)

    Roach, Neil W; Heron, James; Whitaker, David; McGraw, Paul V

    2011-05-01

    The relative timing of auditory and visual stimuli is a critical cue for determining whether sensory signals relate to a common source and for making inferences about causality. However, the way in which the brain represents temporal relationships remains poorly understood. Recent studies indicate that our perception of multisensory timing is flexible--adaptation to a regular inter-modal delay alters the point at which subsequent stimuli are judged to be simultaneous. Here, we measure the effect of audio-visual asynchrony adaptation on the perception of a wide range of sub-second temporal relationships. We find distinctive patterns of induced biases that are inconsistent with the previous explanations based on changes in perceptual latency. Instead, our results can be well accounted for by a neural population coding model in which: (i) relative audio-visual timing is represented by the distributed activity across a relatively small number of neurons tuned to different delays; (ii) the algorithm for reading out this population code is efficient, but subject to biases owing to under-sampling; and (iii) the effect of adaptation is to modify neuronal response gain. These results suggest that multisensory timing information is represented by a dedicated population code and that shifts in perceived simultaneity following asynchrony adaptation arise from analogous neural processes to well-known perceptual after-effects.

  3. Audio-visual feedback improves the BCI performance in the navigational control of a humanoid robot.

    Science.gov (United States)

    Tidoni, Emmanuele; Gergondet, Pierre; Kheddar, Abderrahmane; Aglioti, Salvatore M

    2014-01-01

    Advancement in brain computer interfaces (BCI) technology allows people to actively interact in the world through surrogates. Controlling real humanoid robots using BCI as intuitively as we control our body represents a challenge for current research in robotics and neuroscience. In order to successfully interact with the environment the brain integrates multiple sensory cues to form a coherent representation of the world. Cognitive neuroscience studies demonstrate that multisensory integration may imply a gain with respect to a single modality and ultimately improve the overall sensorimotor performance. For example, reactivity to simultaneous visual and auditory stimuli may be higher than to the sum of the same stimuli delivered in isolation or in temporal sequence. Yet, knowledge about whether audio-visual integration may improve the control of a surrogate is meager. To explore this issue, we provided human footstep sounds as audio feedback to BCI users while controlling a humanoid robot. Participants were asked to steer their robot surrogate and perform a pick-and-place task through BCI-SSVEPs. We found that audio-visual synchrony between footsteps sound and actual humanoid's walk reduces the time required for steering the robot. Thus, auditory feedback congruent with the humanoid actions may improve motor decisions of the BCI's user and help in the feeling of control over it. Our results shed light on the possibility to increase robot's control through the combination of multisensory feedback to a BCI user.

  4. Effects of audio-visual presentation of target words in word translation training

    Science.gov (United States)

    Akahane-Yamada, Reiko; Komaki, Ryo; Kubo, Rieko

    2004-05-01

    Komaki and Akahane-Yamada (Proc. ICA2004) used 2AFC translation task in vocabulary training, in which the target word is presented visually in orthographic form of one language, and the appropriate meaning in another language has to be chosen between two choices. Present paper examined the effect of audio-visual presentation of target word when native speakers of Japanese learn to translate English words into Japanese. Pairs of English words contrasted in several phonemic distinctions (e.g., /r/-/l/, /b/-/v/, etc.) were used as word materials, and presented in three conditions; visual-only (V), audio-only (A), and audio-visual (AV) presentations. Identification accuracy of those words produced by two talkers was also assessed. During pretest, the accuracy for A stimuli was lowest, implying that insufficient translation ability and listening ability interact with each other when aurally presented word has to be translated. However, there was no difference in accuracy between V and AV stimuli, suggesting that participants translate the words depending on visual information only. The effect of translation training using AV stimuli did not transfer to identification ability, showing that additional audio information during translation does not help improve speech perception. Further examination is necessary to determine the effective L2 training method. [Work supported by TAO, Japan.

  5. Effects of audio-visual presentation of target words in word translation training

    Science.gov (United States)

    Akahane-Yamada, Reiko; Komaki, Ryo; Kubo, Rieko

    2001-05-01

    Komaki and Akahane-Yamada (Proc. ICA2004) used 2AFC translation task in vocabulary training, in which the target word is presented visually in orthographic form of one language, and the appropriate meaning in another language has to be chosen between two choices. Present paper examined the effect of audio-visual presentation of target word when native speakers of Japanese learn to translate English words into Japanese. Pairs of English words contrasted in several phonemic distinctions (e.g., /r/-/l/, /b/-/v/, etc.) were used as word materials, and presented in three conditions; visual-only (V), audio-only (A), and audio-visual (AV) presentations. Identification accuracy of those words produced by two talkers was also assessed. During pretest, the accuracy for A stimuli was lowest, implying that insufficient translation ability and listening ability interact with each other when aurally presented word has to be translated. However, there was no difference in accuracy between V and AV stimuli, suggesting that participants translate the words depending on visual information only. The effect of translation training using AV stimuli did not transfer to identification ability, showing that additional audio information during translation does not help improve speech perception. Further examination is necessary to determine the effective L2 training method. [Work supported by TAO, Japan.

  6. Audio-visual interactions in product sound design

    Science.gov (United States)

    Özcan, Elif; van Egmond, René

    2010-02-01

    Consistent product experience requires congruity between product properties such as visual appearance and sound. Therefore, for designing appropriate product sounds by manipulating their spectral-temporal structure, product sounds should preferably not be considered in isolation but as an integral part of the main product concept. Because visual aspects of a product are considered to dominate the communication of the desired product concept, sound is usually expected to fit the visual character of a product. We argue that this can be accomplished successfully only on basis of a thorough understanding of the impact of audio-visual interactions on product sounds. Two experimental studies are reviewed to show audio-visual interactions on both perceptual and cognitive levels influencing the way people encode, recall, and attribute meaning to product sounds. Implications for sound design are discussed defying the natural tendency of product designers to analyze the "sound problem" in isolation from the other product properties.

  7. Acoustic Features and Perceptive Cues of Songs and Dialogues in Whistled Speech: Convergences with Sung Speech

    CERN Document Server

    Meyer, Julien

    2007-01-01

    Whistled speech is a little studied local use of language shaped by several cultures of the world either for distant dialogues or for rendering traditional songs. This practice consists of an emulation of the voice thanks to a simple modulated pitch. It is therefore the result of a transformation of the vocal signal that implies simplifications in the frequency domain. The whistlers adapt their productions to the way each language combines the qualities of height perceived simultaneously by the human ear in the complex frequency spectrum of the spoken or sung voice (pitch, timbre). As a consequence, this practice underlines key acoustic cues for the intelligibility of the concerned languages. The present study provides an analysis of the acoustic and phonetic features selected by whistled speech in several traditions either in purely oral whistles (Spanish, Turkish, Mazatec) or in whistles produced with an instrument like a leaf (Akha, Hmong). It underlines the convergences with the strategies of the singing ...

  8. Neural systems underlying British Sign Language and audio-visual English processing in native users.

    Science.gov (United States)

    MacSweeney, Mairéad; Woll, Bencie; Campbell, Ruth; McGuire, Philip K; David, Anthony S; Williams, Steven C R; Suckling, John; Calvert, Gemma A; Brammer, Michael J

    2002-07-01

    In order to understand the evolution of human language, it is necessary to explore the neural systems that support language processing in its many forms. In particular, it is informative to separate those mechanisms that may have evolved for sensory processing (hearing) from those that have evolved to represent events and actions symbolically (language). To what extent are the brain systems that support language processing shaped by auditory experience and to what extent by exposure to language, which may not necessarily be acoustically structured? In this first neuroimaging study of the perception of British Sign Language (BSL), we explored these questions by measuring brain activation using functional MRI in nine hearing and nine congenitally deaf native users of BSL while they performed a BSL sentence-acceptability task. Eight hearing, non-signing subjects performed an analogous task that involved audio-visual English sentences. The data support the argument that there are both modality-independent and modality-dependent language localization patterns in native users. In relation to modality-independent patterns, regions activated by both BSL in deaf signers and by spoken English in hearing non-signers included inferior prefrontal regions bilaterally (including Broca's area) and superior temporal regions bilaterally (including Wernicke's area). Lateralization patterns were similar for the two languages. There was no evidence of enhanced right-hemisphere recruitment for BSL processing in comparison with audio-visual English. In relation to modality-specific patterns, audio-visual speech in hearing subjects generated greater activation in the primary and secondary auditory cortices than BSL in deaf signers, whereas BSL generated enhanced activation in the posterior occipito-temporal regions (V5), reflecting the greater movement component of BSL. The influence of hearing status on the recruitment of sign language processing systems was explored by comparing deaf

  9. Effect of Cues to Increase Sound Pressure Level on Respiratory Kinematic Patterns during Connected Speech

    Science.gov (United States)

    Huber, Jessica E.

    2007-01-01

    Purpose: This study examined the response of the respiratory system to 3 cues used to elicit increased vocal loudness to determine whether the effects of cueing, shown previously in sentence tasks, were present in connected speech tasks and to describe differences among tasks. Method: Fifteen young men and 15 young women produced a 2-paragraph…

  10. Phonetic Category Cues in Adult-Directed Speech: Evidence from Three Languages with Distinct Vowel Characteristics

    Science.gov (United States)

    Pons, Ferran; Biesanz, Jeremy C.; Kajikawa, Sachiyo; Fais, Laurel; Narayan, Chandan R.; Amano, Shigeaki; Werker, Janet F.

    2012-01-01

    Using an artificial language learning manipulation, Maye, Werker, and Gerken (2002) demonstrated that infants' speech sound categories change as a function of the distributional properties of the input. In a recent study, Werker et al. (2007) showed that Infant-directed Speech (IDS) input contains reliable acoustic cues that support distributional…

  11. Training the Brain to Weight Speech Cues Differently: A Study of Finnish Second-language Users of English

    Science.gov (United States)

    Ylinen, Sari; Uther, Maria; Latvala, Antti; Vepsalainen, Sara; Iverson, Paul; Akahane-Yamada, Reiko; Naatanen, Risto

    2010-01-01

    Foreign-language learning is a prime example of a task that entails perceptual learning. The correct comprehension of foreign-language speech requires the correct recognition of speech sounds. The most difficult speech-sound contrasts for foreign-language learners often are the ones that have multiple phonetic cues, especially if the cues are…

  12. Effects of virtual speaker density and room reverberation on spatiotemporal thresholds of audio-visual motion coherence.

    Directory of Open Access Journals (Sweden)

    Narayan Sankaran

    Full Text Available The present study examined the effects of spatial sound-source density and reverberation on the spatiotemporal window for audio-visual motion coherence. Three different acoustic stimuli were generated in Virtual Auditory Space: two acoustically "dry" stimuli via the measurement of anechoic head-related impulse responses recorded at either 1° or 5° spatial intervals (Experiment 1, and a reverberant stimulus rendered from binaural room impulse responses recorded at 5° intervals in situ in order to capture reverberant acoustics in addition to head-related cues (Experiment 2. A moving visual stimulus with invariant localization cues was generated by sequentially activating LED's along the same radial path as the virtual auditory motion. Stimuli were presented at 25°/s, 50°/s and 100°/s with a random spatial offset between audition and vision. In a 2AFC task, subjects made a judgment of the leading modality (auditory or visual. No significant differences were observed in the spatial threshold based on the point of subjective equivalence (PSE or the slope of psychometric functions (β across all three acoustic conditions. Additionally, both the PSE and β did not significantly differ across velocity, suggesting a fixed spatial window of audio-visual separation. Findings suggest that there was no loss in spatial information accompanying the reduction in spatial cues and reverberation levels tested, and establish a perceptual measure for assessing the veracity of motion generated from discrete locations and in echoic environments.

  13. Information-Driven Active Audio-Visual Source Localization.

    Directory of Open Access Journals (Sweden)

    Niclas Schult

    Full Text Available We present a system for sensorimotor audio-visual source localization on a mobile robot. We utilize a particle filter for the combination of audio-visual information and for the temporal integration of consecutive measurements. Although the system only measures the current direction of the source, the position of the source can be estimated because the robot is able to move and can therefore obtain measurements from different directions. These actions by the robot successively reduce uncertainty about the source's position. An information gain mechanism is used for selecting the most informative actions in order to minimize the number of actions required to achieve accurate and precise position estimates in azimuth and distance. We show that this mechanism is an efficient solution to the action selection problem for source localization, and that it is able to produce precise position estimates despite simplified unisensory preprocessing. Because of the robot's mobility, this approach is suitable for use in complex and cluttered environments. We present qualitative and quantitative results of the system's performance and discuss possible areas of application.

  14. Head Tracking of Auditory, Visual, and Audio-Visual Targets.

    Science.gov (United States)

    Leung, Johahn; Wei, Vincent; Burgess, Martin; Carlile, Simon

    2015-01-01

    The ability to actively follow a moving auditory target with our heads remains unexplored even though it is a common behavioral response. Previous studies of auditory motion perception have focused on the condition where the subjects are passive. The current study examined head tracking behavior to a moving auditory target along a horizontal 100° arc in the frontal hemisphere, with velocities ranging from 20 to 110°/s. By integrating high fidelity virtual auditory space with a high-speed visual presentation we compared tracking responses of auditory targets against visual-only and audio-visual "bisensory" stimuli. Three metrics were measured-onset, RMS, and gain error. The results showed that tracking accuracy (RMS error) varied linearly with target velocity, with a significantly higher rate in audition. Also, when the target moved faster than 80°/s, onset and RMS error were significantly worst in audition the other modalities while responses in the visual and bisensory conditions were statistically identical for all metrics measured. Lastly, audio-visual facilitation was not observed when tracking bisensory targets.

  15. Head Tracking of Auditory, Visual and Audio-Visual Targets

    Directory of Open Access Journals (Sweden)

    Johahn eLeung

    2016-01-01

    Full Text Available The ability to actively follow a moving auditory target with our heads remains unexplored even though it is a common behavioral response. Previous studies of auditory motion perception have focused on the condition where the subjects are passive. The current study examined head tracking behavior to a moving auditory target along a horizontal 100° arc in the frontal hemisphere, with velocities ranging from 20°/s to 110°/s. By integrating high fidelity virtual auditory space with a high-speed visual presentation we compared tracking responses of auditory targets against visual-only and audio-visual bisensory stimuli. Three metrics were measured – onset, RMS and gain error. The results showed that tracking accuracy (RMS error varied linearly with target velocity, with a significantly higher rate in audition. Also, when the target moved faster than 80°/s, onset and RMS error were significantly worst in audition the other modalities while responses in the visual and bisensory conditions were statistically identical for all metrics measured. Lastly, audio-visual facilitation was not observed when tracking bisensory targets.

  16. Video genre categorization and representation using audio-visual information

    Science.gov (United States)

    Ionescu, Bogdan; Seyerlehner, Klaus; Rasche, Christoph; Vertan, Constantin; Lambert, Patrick

    2012-04-01

    We propose an audio-visual approach to video genre classification using content descriptors that exploit audio, color, temporal, and contour information. Audio information is extracted at block-level, which has the advantage of capturing local temporal information. At the temporal structure level, we consider action content in relation to human perception. Color perception is quantified using statistics of color distribution, elementary hues, color properties, and relationships between colors. Further, we compute statistics of contour geometry and relationships. The main contribution of our work lies in harnessing the descriptive power of the combination of these descriptors in genre classification. Validation was carried out on over 91 h of video footage encompassing 7 common video genres, yielding average precision and recall ratios of 87% to 100% and 77% to 100%, respectively, and an overall average correct classification of up to 97%. Also, experimental comparison as part of the MediaEval 2011 benchmarking campaign demonstrated the efficiency of the proposed audio-visual descriptors over other existing approaches. Finally, we discuss a 3-D video browsing platform that displays movies using feature-based coordinates and thus regroups them according to genre.

  17. Emotional speech processing: disentangling the effects of prosody and semantic cues.

    Science.gov (United States)

    Pell, Marc D; Jaywant, Abhishek; Monetta, Laura; Kotz, Sonja A

    2011-08-01

    To inform how emotions in speech are implicitly processed and registered in memory, we compared how emotional prosody, emotional semantics, and both cues in tandem prime decisions about conjoined emotional faces. Fifty-two participants rendered facial affect decisions (Pell, 2005a), indicating whether a target face represented an emotion (happiness or sadness) or not (a facial grimace), after passively listening to happy, sad, or neutral prime utterances. Emotional information from primes was conveyed by: (1) prosody only; (2) semantic cues only; or (3) combined prosody and semantic cues. Results indicated that prosody, semantics, and combined prosody-semantic cues facilitate emotional decisions about target faces in an emotion-congruent manner. However, the magnitude of priming did not vary across tasks. Our findings highlight that emotional meanings of prosody and semantic cues are systematically registered during speech processing, but with similar effects on associative knowledge about emotions, which is presumably shared by prosody, semantics, and faces.

  18. Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

    Science.gov (United States)

    Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

    2015-01-01

    Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.

  19. Changes of the Prefrontal EEG (Electroencephalogram) Activities According to the Repetition of Audio-Visual Learning.

    Science.gov (United States)

    Kim, Yong-Jin; Chang, Nam-Kee

    2001-01-01

    Investigates the changes of neuronal response according to a four time repetition of audio-visual learning. Obtains EEG data from the prefrontal (Fp1, Fp2) lobe from 20 subjects at the 8th grade level. Concludes that the habituation of neuronal response shows up in repetitive audio-visual learning and brain hemisphericity can be changed by…

  20. A Management Review and Analysis of Purdue University Libraries and Audio-Visual Center.

    Science.gov (United States)

    Baaske, Jan; And Others

    A management review and analysis was conducted by the staff of the libraries and audio-visual center of Purdue University. Not only were the study team and the eight task forces drawn from all levels of the libraries and audio-visual center staff, but a systematic effort was sustained through inquiries, draft reports and open meetings to involve…

  1. The audio-visual revolution: do we really need it?

    Science.gov (United States)

    Townsend, I

    1979-03-01

    In the United Kingdom, The audio-visual revolution has steadily gained converts in the nursing profession. Nurse tutor courses now contain information on the techniques of educational technology and schools of nursing increasingly own (or wish to own) many of the sophisticated electronic aids to teaching that abound. This is taking place at a time of hitherto inexperienced crisis and change. Funds have been or are being made available to buy audio-visual equipment. But its purchase and use relies on satisfying personal whim, prejudice or educational fashion, not on considerations of educational efficiency. In the rush of enthusiasm, the overwhelmed teacher (everywhere; the phenomenon is not confined to nursing) forgets to ask the searching, critical questions: 'Why should we use this aid?','How effective is it?','And, at what?'. Influential writers in this profession have repeatedly called for a more responsible attitude towards published research work of other fields. In an attempt to discover what is known about the answers to this group of questions, an eclectic look at media research is taken and the widespread dissatisfaction existing amongst international educational technologists is noted. The paper isolates out of the literature several causative factors responsible for the present state of affairs. Findings from the field of educational television are cited as representative of an aid which has had a considerable amount of time and research directed at it. The concluding part of the paper shows the decisions to be taken in using or not using educational media as being more complicated than might at first appear.

  2. Effects of pitch, level, and tactile cues on speech segregation

    Science.gov (United States)

    Drullman, Rob; Bronkhorst, Adelbert W.

    2003-04-01

    Sentence intelligibility for interfering speech was investigated as a function of level difference, pitch difference, and presence of tactile support. A previous study by the present authors [J. Acoust. Soc. Am. 111, 2432-2433 (2002)] had shown a small benefit of tactile support in the speech-reception threshold measured against a background of one to eight competing talkers. The present experiment focused on the effects of informational and energetic masking for one competing talker. Competing speech was obtained by manipulating the speech of the male target talker (different sentences). The PSOLA technique was used to increase the average pitch of competing speech by 2, 4, 8, or 12 semitones. Level differences between target and competing speech ranged from -16 to +4 dB. Tactile support (B&K 4810 shaker) was given to the index finger by presenting the temporal envelope of the low-pass-filtered speech (0-200 Hz). Sentences were presented diotically and the percentage of correctly perceived words was measured. Results show a significant overall increase in intelligibility score from 71% to 77% due to tactile support. Performance improves monotonically with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences.

  3. Psychoacoustic cues to emotion in speech prosody and music.

    Science.gov (United States)

    Coutinho, Eduardo; Dibben, Nicola

    2013-01-01

    There is strong evidence of shared acoustic profiles common to the expression of emotions in music and speech, yet relatively limited understanding of the specific psychoacoustic features involved. This study combined a controlled experiment and computational modelling to investigate the perceptual codes associated with the expression of emotion in the acoustic domain. The empirical stage of the study provided continuous human ratings of emotions perceived in excerpts of film music and natural speech samples. The computational stage created a computer model that retrieves the relevant information from the acoustic stimuli and makes predictions about the emotional expressiveness of speech and music close to the responses of human subjects. We show that a significant part of the listeners' second-by-second reported emotions to music and speech prosody can be predicted from a set of seven psychoacoustic features: loudness, tempo/speech rate, melody/prosody contour, spectral centroid, spectral flux, sharpness, and roughness. The implications of these results are discussed in the context of cross-modal similarities in the communication of emotion in the acoustic domain.

  4. Audio-Visual Integration Modifies Emotional Judgment in Music

    Directory of Open Access Journals (Sweden)

    Shen-Yuan Su

    2011-10-01

    Full Text Available The conventional view that perceived emotion in music is derived mainly from auditory signals has led to neglect of the contribution of visual image. In this study, we manipulated mode (major vs. minor and examined the influence of a video image on emotional judgment in music. Melodies in either major or minor mode were controlled for tempo and rhythm and played to the participants. We found that Taiwanese participants, like Westerners, judged major melodies as expressing positive, and minor melodies negative, emotions. The major or minor melodies were then paired with video images of the singers, which were either emotionally congruent or incongruent with their modes. Results showed that participants perceived stronger positive or negative emotions with congruent audio-visual stimuli. Compared to listening to music alone, stronger emotions were perceived when an emotionally congruent video image was added and weaker emotions were perceived when an incongruent image was added. We therefore demonstrate that mode is important to perceive the emotional valence in music and that treating musical art as a purely auditory event might lose the enhanced emotional strength perceived in music, since going to a concert may lead to stronger perceived emotion than listening to the CD at home.

  5. Human performance measures for interactive haptic-audio-visual interfaces.

    Science.gov (United States)

    Jia, Dawei; Bhatti, Asim; Nahavandi, Saeid; Horan, Ben

    2013-01-01

    Virtual reality and simulation are becoming increasingly important in modern society and it is essential to improve our understanding of system usability and efficacy from the users' perspective. This paper introduces a novel evaluation method designed to assess human user capability when undertaking technical and procedural training using virtual training systems. The evaluation method falls under the user-centered design and evaluation paradigm and draws on theories of cognitive, skill-based and affective learning outcomes. The method focuses on user interaction with haptic-audio-visual interfaces and the complexities related to variability in users' performance, and the adoption and acceptance of the technologies. A large scale user study focusing on object assembly training tasks involving selecting, rotating, releasing, inserting, and manipulating three-dimensional objects was performed. The study demonstrated the advantages of the method in obtaining valuable multimodal information for accurate and comprehensive evaluation of virtual training system efficacy. The study investigated how well users learn, perform, adapt to, and perceive the virtual training. The results of the study revealed valuable aspects of the design and evaluation of virtual training systems contributing to an improved understanding of more usable virtual training systems.

  6. Listener deficits in hypokinetic dysarthria: Which cues are most important in speech segmentation?

    Science.gov (United States)

    Wade, Carolyn Ann

    Listeners use prosodic cues to help them quickly process running speech. In English, listeners effortlessly use strong syllables to help them to find words in the continuous stream of speech produced by neurologically-intact individuals. However, listeners are not always presented with speech under such ideal circumstances. This thesis explores the question of word segmentation of English speech under one of these less ideal conditions; specifically, when the speaker may be impaired in his/her production of strong syllables, as in the case of hypokinetic dysarthria. Further, we attempt to discern which acoustic cue(s) are most degraded in hypokinetic dysarthria and the effect that this degradation has on listeners' segmentation when no additional semantic or pragmatic cues are present. Two individuals with Parkinson's disease, one with a rate disturbance and one with articulatory disruption, along with a typically aging control, were recorded repeating a series of nonsense syllables. Young adult listeners were then presented with recordings from one of these three speakers producing non-words (imprecise consonant articulation, rate disturbance, and control). After familiarization, the listeners were asked to rate the familiarity of the non-words produced by a second typically aging speaker. Results indicated speakers with hypokinetic dysarthria were able to modulate their intensity and duration for stressed and unstressed syllables in a way similar to that of control speakers. In addition, their mean and peak fundamental frequency for both stressed and unstressed syllables were significantly higher than that of the normally aging controls. ANOVA results revealed a marginal main effect of frequency in normal and consonant conditions for word versus nonwords listener ratings.

  7. Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition.

    Science.gov (United States)

    Jesse, Alexandra; McQueen, James M

    2014-01-01

    Visual cues to the individual segments of speech and to sentence prosody guide speech recognition. The present study tested whether visual suprasegmental cues to the stress patterns of words can also constrain recognition. Dutch listeners use acoustic suprasegmental cues to lexical stress (changes in duration, amplitude, and pitch) in spoken-word recognition. We asked here whether they can also use visual suprasegmental cues. In two categorization experiments, Dutch participants saw a speaker say fragments of word pairs that were segmentally identical but differed in their stress realization (e.g., 'ca-vi from cavia "guinea pig" vs. 'ka-vi from kaviaar "caviar"). Participants were able to distinguish between these pairs from seeing a speaker alone. Only the presence of primary stress in the fragment, not its absence, was informative. Participants were able to distinguish visually primary from secondary stress on first syllables, but only when the fragment-bearing target word carried phrase-level emphasis. Furthermore, participants distinguished fragments with primary stress on their second syllable from those with secondary stress on their first syllable (e.g., pro-'jec from projector "projector" vs. 'pro-jec from projectiel "projectile"), independently of phrase-level emphasis. Seeing a speaker thus contributes to spoken-word recognition by providing suprasegmental information about the presence of primary lexical stress.

  8. Effects of syntactic cueing therapy on picture naming and connected speech in acquired aphasia.

    Science.gov (United States)

    Herbert, Ruth; Webster, Dianne; Dyson, Lucy

    2012-01-01

    Language therapy for word-finding difficulties in aphasia usually involves picture naming of single words with the support of cues. Most studies have addressed nouns in isolation, even though in connected speech nouns are more frequently produced with determiners. We hypothesised that improved word finding in connected speech would be most likely if intervention treated nouns in usual syntactic contexts. Six speakers with aphasia underwent language therapy using a software program developed for the purpose, which provided lexical and syntactic (determiner) cues. Exposure to determiners with nouns would potentially lead to improved picture naming of both treated and untreated nouns, and increased production of determiner plus noun combinations in connected speech. After intervention, picture naming of treated words improved for five of the six speakers, but naming of untreated words was unchanged. The number of determiner plus noun combinations in connected speech increased for four speakers. These findings attest to the close relationship between frequently co-occurring content and function words, and indicate that intervention for word-finding deficits can profitably proceed beyond single word naming, to retrieval in appropriate syntactic contexts. We also examined the relationship between effects of therapy, and amount and intensity of therapy. We found no relationship between immediate effects and amount or intensity of therapy. However, those participants whose naming maintained at follow-up completed the therapy regime in fewer sessions, of relatively longer duration. We explore the relationship between therapy regime and outcomes, and propose future considerations for research.

  9. Listeners' expectation of room acoustical parameters based on visual cues

    Science.gov (United States)

    Valente, Daniel L.

    Despite many studies investigating auditory spatial impressions in rooms, few have addressed the impact of simultaneous visual cues on localization and the perception of spaciousness. The current research presents an immersive audio-visual study, in which participants are instructed to make spatial congruency and quantity judgments in dynamic cross-modal environments. The results of these psychophysical tests suggest the importance of consilient audio-visual presentation to the legibility of an auditory scene. Several studies have looked into audio-visual interaction in room perception in recent years, but these studies rely on static images, speech signals, or photographs alone to represent the visual scene. Building on these studies, the aim is to propose a testing method that uses monochromatic compositing (blue-screen technique) to position a studio recording of a musical performance in a number of virtual acoustical environments and ask subjects to assess these environments. In the first experiment of the study, video footage was taken from five rooms varying in physical size from a small studio to a small performance hall. Participants were asked to perceptually align two distinct acoustical parameters---early-to-late reverberant energy ratio and reverberation time---of two solo musical performances in five contrasting visual environments according to their expectations of how the room should sound given its visual appearance. In the second experiment in the study, video footage shot from four different listening positions within a general-purpose space was coupled with sounds derived from measured binaural impulse responses (IRs). The relationship between the presented image, sound, and virtual receiver position was examined. It was found that many visual cues caused different perceived events of the acoustic environment. This included the visual attributes of the space in which the performance was located as well as the visual attributes of the performer

  10. Relative Contributions of Spectral and Temporal Cues for Speech Recognition in Patients with Sensorineural Hearing Loss

    Institute of Scientific and Technical Information of China (English)

    XU Li; ZHOU Ning; Rebecca Brashears; Katherine Rife

    2008-01-01

    The present study was designed to examine speech recognition in patients with sensorineural hearing loss when the temporal and spectral information in the speech signals were co-varied. Four subjects with mild to moderate sensorineural hearing loss were recruited to participate in consonant and vowel recognition tests that used speech stimuli processed through a noise-excited voeoder. The number of channels was varied between 2 and 32, which defined spectral information. The lowpass cutoff frequency of the temporal envelope extractor was varied from 1 to 512 Hz, which defined temporal information. Results indicate that performance of subjects with sensorineural heating loss varied tremendously among the subjects. For consonant recognition, patterns of relative contributions of spectral and temporal information were similar to those in normal-hearing subjects. The utility of temporal envelope information appeared to be normal in the hearing-impaired listeners. For vowel recognition, which depended predominately on spectral information, the performance plateau was achieved with numbers of channels as high as 16-24, much higher than expected, given that the frequency selectivity in patients with sensorineural hearing loss might be compromised. In order to understand the mechanisms on how hearing-impaired listeners utilize spectral and temporal cues for speech recognition, future studies that involve a large sample of patients with sensorineural hearing loss will be necessary to elucidate the relationship between frequency selectivity as well as central processing capability and speech recognition performance using vocoded signals.

  11. Rehabilitation of balance-impaired stroke patients through audio-visual biofeedback

    DEFF Research Database (Denmark)

    Gheorghe, Cristina; Nissen, Thomas; Juul Rosengreen Christensen, Daniel;

    2015-01-01

    This study explored how audio-visual biofeedback influences physical balance of seven balance-impaired stroke patients, between 33–70 years-of-age. The setup included a bespoke balance board and a music rhythm game. The procedure was designed as follows: (1) a control group who performed a balance...... training exercise without any technological input, (2) a visual biofeedback group, performing via visual input, and (3) an audio-visual biofeedback group, performing via audio and visual input. Results retrieved from comparisons between the data sets (2) and (3) suggested superior postural stability...

  12. An Audio-Visual Resource Notebook for Adult Consumer Education. An Annotated Bibliography of Selected Audio-Visual Aids for Adult Consumer Education, with Special Emphasis on Materials for Elderly, Low-Income and Handicapped Consumers.

    Science.gov (United States)

    Virginia State Dept. of Agriculture and Consumer Services, Richmond, VA.

    This document is an annotated bibliography of audio-visual aids in the field of consumer education, intended especially for use among low-income, elderly, and handicapped consumers. It was developed to aid consumer education program planners in finding audio-visual resources to enhance their presentations. Materials listed include 293 resources…

  13. Evaluation of Modular EFL Educational Program (Audio-Visual Materials Translation & Translation of Deeds & Documents)

    Science.gov (United States)

    Imani, Sahar Sadat Afshar

    2013-01-01

    Modular EFL Educational Program has managed to offer specialized language education in two specific fields: Audio-visual Materials Translation and Translation of Deeds and Documents. However, no explicit empirical studies can be traced on both internal and external validity measures as well as the extent of compatibility of both courses with the…

  14. Acceptance of online audio-visual cultural heritage archive services: a study of the general public

    NARCIS (Netherlands)

    Ongena, G.; Wijngaert, van de L.A.L.; Huizer, E.

    2013-01-01

    Introduction. This study examines the antecedents of user acceptance of an audio-visual heritage archive for a wider audience (i.e., the general public) by extending the technology acceptance model with the concepts of perceived enjoyment, nostalgia proneness and personal innovativeness. Method. A W

  15. Audio-visual synchrony and feature-selective attention co-amplify early visual processing.

    Science.gov (United States)

    Keitel, Christian; Müller, Matthias M

    2016-05-01

    Our brain relies on neural mechanisms of selective attention and converging sensory processing to efficiently cope with rich and unceasing multisensory inputs. One prominent assumption holds that audio-visual synchrony can act as a strong attractor for spatial attention. Here, we tested for a similar effect of audio-visual synchrony on feature-selective attention. We presented two superimposed Gabor patches that differed in colour and orientation. On each trial, participants were cued to selectively attend to one of the two patches. Over time, spatial frequencies of both patches varied sinusoidally at distinct rates (3.14 and 3.63 Hz), giving rise to pulse-like percepts. A simultaneously presented pure tone carried a frequency modulation at the pulse rate of one of the two visual stimuli to introduce audio-visual synchrony. Pulsed stimulation elicited distinct time-locked oscillatory electrophysiological brain responses. These steady-state responses were quantified in the spectral domain to examine individual stimulus processing under conditions of synchronous versus asynchronous tone presentation and when respective stimuli were attended versus unattended. We found that both, attending to the colour of a stimulus and its synchrony with the tone, enhanced its processing. Moreover, both gain effects combined linearly for attended in-sync stimuli. Our results suggest that audio-visual synchrony can attract attention to specific stimulus features when stimuli overlap in space.

  16. Multimodal indexing of digital audio-visual documents: A case study for cultural heritage data

    NARCIS (Netherlands)

    Carmichael, J.; Larson, M.; Marlow, J.; Newman, E.; Clough, P.; Oomen, J.; Sav, S.

    2008-01-01

    This paper describes a multimedia multimodal information access sub-system (MIAS) for digital audio-visual documents, typically presented in streaming media format. The system is designed to provide both professional and general users with entry points into video documents that are relevant to their

  17. Challenges of Using Audio-Visual Aids as Warm-Up Activity in Teaching Aviation English

    Science.gov (United States)

    Sahin, Mehmet; Sule, St.; Seçer, Y. E.

    2016-01-01

    This study aims to find out the challenges encountered in the use of video as audio-visual material as a warm-up activity in aviation English course at high school level. This study is based on a qualitative study in which focus group interview is used as the data collection procedure. The participants of focus group are four instructors teaching…

  18. Training the brain to weight speech cues differently: a study of Finnish second-language users of English.

    Science.gov (United States)

    Ylinen, Sari; Uther, Maria; Latvala, Antti; Vepsäläinen, Sara; Iverson, Paul; Akahane-Yamada, Reiko; Näätänen, Risto

    2010-06-01

    Foreign-language learning is a prime example of a task that entails perceptual learning. The correct comprehension of foreign-language speech requires the correct recognition of speech sounds. The most difficult speech-sound contrasts for foreign-language learners often are the ones that have multiple phonetic cues, especially if the cues are weighted differently in the foreign and native languages. The present study aimed to determine whether non-native-like cue weighting could be changed by using phonetic training. Before the training, we compared the use of spectral and duration cues of English /i/ and /I/ vowels (e.g., beat vs. bit) between native Finnish and English speakers. In Finnish, duration is used phonologically to separate short and long phonemes, and therefore Finns were expected to weight duration cues more than native English speakers. The cross-linguistic differences and training effects were investigated with behavioral and electrophysiological methods, in particular by measuring the MMN brain response that has been used to probe long-term memory representations for speech sounds. The behavioral results suggested that before the training, the Finns indeed relied more on duration in vowel recognition than the native English speakers did. After the training, however, the Finns were able to use the spectral cues of the vowels more reliably than before. Accordingly, the MMN brain responses revealed that the training had enhanced the Finns' ability to preattentively process the spectral cues of the English vowels. This suggests that as a result of training, plastic changes had occurred in the weighting of phonetic cues at early processing stages in the cortex.

  19. Audio-visual training-aid for speechreading

    DEFF Research Database (Denmark)

    Bothe, Hans-Heinrich; Gebert, H.

    2011-01-01

    ‐recorded video material; it also allows the teacher to produce and combine a large number of individual lessons without the need of expensive recording equipment. Our system uses a scene manager to enhance teaching. It allows the creation of different scenarios that are composed of appropriate background images...... of classroom teaching, but the system may also be used as a new e‐learning or, in general, distance learning tool for hearing impaired people. It presents a facial animation on the computer screen with synchronized speech output and is driven by input text sequences in orthographic transcription. The input may...... modular structure of the software package and the centralized event manager, it is possible to add or replace specific modules when needed. The present version of our teacher‐student module uses a hierarchically structured composition of important single words and short phrases, supplemented by easy...

  20. Brain networks engaged in audiovisual integration during speech perception revealed by persistent homology-based network filtration.

    Science.gov (United States)

    Kim, Heejung; Hahm, Jarang; Lee, Hyekyoung; Kang, Eunjoo; Kang, Hyejin; Lee, Dong Soo

    2015-05-01

    The human brain naturally integrates audiovisual information to improve speech perception. However, in noisy environments, understanding speech is difficult and may require much effort. Although the brain network is supposed to be engaged in speech perception, it is unclear how speech-related brain regions are connected during natural bimodal audiovisual or unimodal speech perception with counterpart irrelevant noise. To investigate the topological changes of speech-related brain networks at all possible thresholds, we used a persistent homological framework through hierarchical clustering, such as single linkage distance, to analyze the connected component of the functional network during speech perception using functional magnetic resonance imaging. For speech perception, bimodal (audio-visual speech cue) or unimodal speech cues with counterpart irrelevant noise (auditory white-noise or visual gum-chewing) were delivered to 15 subjects. In terms of positive relationship, similar connected components were observed in bimodal and unimodal speech conditions during filtration. However, during speech perception by congruent audiovisual stimuli, the tighter couplings of left anterior temporal gyrus-anterior insula component and right premotor-visual components were observed than auditory or visual speech cue conditions, respectively. Interestingly, visual speech is perceived under white noise by tight negative coupling in the left inferior frontal region-right anterior cingulate, left anterior insula, and bilateral visual regions, including right middle temporal gyrus, right fusiform components. In conclusion, the speech brain network is tightly positively or negatively connected, and can reflect efficient or effortful processes during natural audiovisual integration or lip-reading, respectively, in speech perception.

  1. Person identification for mobile robot using audio-visual modality

    Science.gov (United States)

    Kim, Young-Ouk; Chin, Sehoon; Lee, Jihoon; Paik, Joonki

    2005-10-01

    Recently, we experienced significant advancement in intelligent service robots. The remarkable features of an intelligent robot include tracking and identification of person using biometric features. The human-robot interaction is very important because it is one of the final goals of an intelligent service robot. Many researches are concentrating in two fields. One is self navigation of a mobile robot and the other is human-robot interaction in natural environment. In this paper we will present an effective person identification method for HRI (Human Robot Interaction) using two different types of expert systems. However, most of mobile robots run under uncontrolled and complicated environment. It means that face and speech information can't be guaranteed under varying conditions, such as lighting, noisy sound, orientation of a robot. According to a value of illumination and signal to noise ratio around mobile a robot, our proposed fuzzy rule make a reasonable person identification result. Two embedded HMM (Hidden Marhov Model) are used for each visual and audio modality to identify person. The performance of our proposed system and experimental results are compared with single modality identification and simply mixed method of two modality.

  2. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli.

    Science.gov (United States)

    Moradi, Shahram; Lidestam, Björn; Rönnberg, Jerker

    2016-06-17

    The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context.

  3. Mandarin Visual Speech Information

    Science.gov (United States)

    Chen, Trevor H.

    2010-01-01

    While the auditory-only aspects of Mandarin speech are heavily-researched and well-known in the field, this dissertation addresses its lesser-known aspects: The visual and audio-visual perception of Mandarin segmental information and lexical-tone information. Chapter II of this dissertation focuses on the audiovisual perception of Mandarin…

  4. Suprasegmental speech cues are automatically processed by the human brain: a mismatch negativity study.

    Science.gov (United States)

    Honbolygó, Ferenc; Csépe, Valéria; Ragó, Anett

    2004-06-03

    This study investigates the electrical brain activity correlates of the automatic detection of suprasegmental and local speech cues by using a passive oddball paradigm, in which the standard Hungarian word 'banán' ('banana' in English) was contrasted with two deviants: a voiceless phoneme deviant ('panán'), and a stress deviant, where the stress was on the second syllable, instead of the obligatory first one. As a result, we obtained the mismatch negativity component (MMN) of event-related brain potentials in each condition. The stress deviant elicited two MMNs: one as a response to the lack of stress as compared to the standard stimulus, and another to the additional stress. Our results support that the MMN is as valuable in investigating processing characteristics of suprasegmental features as in that of phonemic features. MMN data may provide further insight into pre-attentive processes contributing to spoken word recognition.

  5. Prioritized MPEG-4 Audio-Visual Objects Streaming over the DiffServ

    Institute of Scientific and Technical Information of China (English)

    HUANG Tian-yun; ZHENG Chan

    2005-01-01

    The object-based scalable coding in MPEG-4 is investigated, and a prioritized transmission scheme of MPEG-4 audio-visual objects (AVOs) over the DiffServ network with the QoS guarantee is proposed. MPEG-4 AVOs are extracted and classified into different groups according to their priority values and scalable layers (visual importance). These priority values are mapped to the IP DiffServ per hop behaviors (PHB). This scheme can selectively discard packets with low importance, in order to avoid the network congestion. Simulation results show that the quality of received video can gracefully adapt to network state, as compared with the 'best-effort' manner. Also, by allowing the content provider to define prioritization of each audio-visual object, the adaptive transmission of object-based scalable video can be customized based on the content.

  6. El tratamiento documental del mensaje audiovisual Documentary treatment of the audio-visual message

    Directory of Open Access Journals (Sweden)

    Blanca Rodríguez Bravo

    2005-06-01

    Full Text Available Se analizan las peculiaridades del documento audiovisual y el tratamiento documental que sufre en las emisoras de televisión. Observando a las particularidades de la imagen que condicionan su análisis y recuperación, se establecen las etapas y procedimientos para representar el mensaje audiovisual con vistas a su reutilización. Por último se realizan algunas consideraciones acerca del procesamiento automático del video y de los cambios introducidos por la televisión digital.Peculiarities of the audio-visual document and the treatment it undergoes in TV broadcasting stations are analyzed. The particular features of images condition their analysis and recovery; this paper establishes stages and proceedings for the representation of audio-visual messages with a view to their re-usability Also, some considerations about the automatic processing of the video and the changes introduced by digital TV are made.

  7. The ventriloquist in periphery: impact of eccentricity-related reliability on audio-visual localization.

    Science.gov (United States)

    Charbonneau, Geneviève; Véronneau, Marie; Boudrias-Fournier, Colin; Lepore, Franco; Collignon, Olivier

    2013-10-28

    The relative reliability of separate sensory estimates influences the way they are merged into a unified percept. We investigated how eccentricity-related changes in reliability of auditory and visual stimuli influence their integration across the entire frontal space. First, we surprisingly found that despite a strong decrease in auditory and visual unisensory localization abilities in periphery, the redundancy gain resulting from the congruent presentation of audio-visual targets was not affected by stimuli eccentricity. This result therefore contrasts with the common prediction that a reduction in sensory reliability necessarily induces an enhanced integrative gain. Second, we demonstrate that the visual capture of sounds observed with spatially incongruent audio-visual targets (ventriloquist effect) steadily decreases with eccentricity, paralleling a lowering of the relative reliability of unimodal visual over unimodal auditory stimuli in periphery. Moreover, at all eccentricities, the ventriloquist effect positively correlated with a weighted combination of the spatial resolution obtained in unisensory conditions. These findings support and extend the view that the localization of audio-visual stimuli relies on an optimal combination of auditory and visual information according to their respective spatial reliability. All together, these results evidence that the external spatial coordinates of multisensory events relative to an observer's body (e.g., eyes' or head's position) influence how this information is merged, and therefore determine the perceptual outcome.

  8. Expressing the Needs of Digital Audio-Visual Applications in Different Communities of Practice for Long Term Preservation

    OpenAIRE

    Kumar, Naresh

    2014-01-01

    Digital audio-visual preservation is nerve of the research nowadays in this digital world, where use of audio-visuals in creation and storage of research data has increased rapidly. Thereby it has created many opportunities for new problems regarding their maintenance, preservation and future accessibility. Lack of awareness about the preservation tools and applications is a big issue today. To solve such issues a European Commission research project, Presto4U that aimed to enable semi-automa...

  9. THE IMPROVEMENT OF AUDIO-VISUAL BASED DANCE APPRECIATION LEARNING AMONG PRIMARY TEACHER EDUCATION STUDENTS OF MAKASSAR STATE UNIVERSITY

    OpenAIRE

    Wahira

    2014-01-01

    This research aimed to improve the skill in appreciating dances owned by the students of Primary Teacher Education of Makassar State University, to improve the perception towards audio-visual based art appreciation, to increase the students’ interest in audio-visual based art education subject, and to increase the students’ responses to the subject. This research was classroom action research using the research design created by Kemmis & MC. Taggart, which was conducted to 42 students of Prim...

  10. Investigating the impact of audio instruction and audio-visual biofeedback for lung cancer radiation therapy

    Science.gov (United States)

    George, Rohini

    Lung cancer accounts for 13% of all cancers in the Unites States and is the leading cause of deaths among both men and women. The five-year survival for lung cancer patients is approximately 15%.(ACS facts & figures) Respiratory motion decreases accuracy of thoracic radiotherapy during imaging and delivery. To account for respiration, generally margins are added during radiation treatment planning, which may cause a substantial dose delivery to normal tissues and increase the normal tissue toxicity. To alleviate the above-mentioned effects of respiratory motion, several motion management techniques are available which can reduce the doses to normal tissues, thereby reducing treatment toxicity and allowing dose escalation to the tumor. This may increase the survival probability of patients who have lung cancer and are receiving radiation therapy. However the accuracy of these motion management techniques are inhibited by respiration irregularity. The rationale of this thesis was to study the improvement in regularity of respiratory motion by breathing coaching for lung cancer patients using audio instructions and audio-visual biofeedback. A total of 331 patient respiratory motion traces, each four minutes in length, were collected from 24 lung cancer patients enrolled in an IRB-approved breathing-training protocol. It was determined that audio-visual biofeedback significantly improved the regularity of respiratory motion compared to free breathing and audio instruction, thus improving the accuracy of respiratory gated radiotherapy. It was also observed that duty cycles below 30% showed insignificant reduction in residual motion while above 50% there was a sharp increase in residual motion. The reproducibility of exhale based gating was higher than that of inhale base gating. Modeling the respiratory cycles it was found that cosine and cosine 4 models had the best correlation with individual respiratory cycles. The overall respiratory motion probability distribution

  11. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech

    Directory of Open Access Journals (Sweden)

    Matthew ePoon

    2015-11-01

    Full Text Available Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound happier than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here we describe a novel, score-based exploration of the use of pitch height and timing in a set of balanced major and minor key compositions. Our corpus contained all 24 Preludes and 24 Fugues from Bach’s Well Tempered Clavier (book 1, as well as all 24 of Chopin’s Preludes for piano. These three sets are balanced with respect to both modality (major/minor and key chroma (A, B, C, etc.. Consistent with predictions derived from speech, we found major-key (nominally happy pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally sad pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post-hoc analyses illustrate interesting trade-offs, with

  12. Cueing musical emotions: An empirical analysis of 24-piece sets by Bach and Chopin documents parallels with emotional speech.

    Science.gov (United States)

    Poon, Matthew; Schutz, Michael

    2015-01-01

    Acoustic cues such as pitch height and timing are effective at communicating emotion in both music and speech. Numerous experiments altering musical passages have shown that higher and faster melodies generally sound "happier" than lower and slower melodies, findings consistent with corpus analyses of emotional speech. However, equivalent corpus analyses of complex time-varying cues in music are less common, due in part to the challenges of assembling an appropriate corpus. Here, we describe a novel, score-based exploration of the use of pitch height and timing in a set of "balanced" major and minor key compositions. Our analysis included all 24 Preludes and 24 Fugues from Bach's Well-Tempered Clavier (book 1), as well as all 24 of Chopin's Preludes for piano. These three sets are balanced with respect to both modality (major/minor) and key chroma ("A," "B," "C," etc.). Consistent with predictions derived from speech, we found major-key (nominally "happy") pieces to be two semitones higher in pitch height and 29% faster than minor-key (nominally "sad") pieces. This demonstrates that our balanced corpus of major and minor key pieces uses low-level acoustic cues for emotion in a manner consistent with speech. A series of post hoc analyses illustrate interesting trade-offs, with sets featuring greater emphasis on timing distinctions between modalities exhibiting the least pitch distinction, and vice-versa. We discuss these findings in the broader context of speech-music research, as well as recent scholarship exploring the historical evolution of cue use in Western music.

  13. Internet video telephony allows speech reading by deaf individuals and improves speech perception by cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Georgios Mantokoudis

    Full Text Available OBJECTIVE: To analyze speech reading through Internet video calls by profoundly hearing-impaired individuals and cochlear implant (CI users. METHODS: Speech reading skills of 14 deaf adults and 21 CI users were assessed using the Hochmair Schulz Moser (HSM sentence test. We presented video simulations using different video resolutions (1280 × 720, 640 × 480, 320 × 240, 160 × 120 px, frame rates (30, 20, 10, 7, 5 frames per second (fps, speech velocities (three different speakers, webcameras (Logitech Pro9000, C600 and C500 and image/sound delays (0-500 ms. All video simulations were presented with and without sound and in two screen sizes. Additionally, scores for live Skype™ video connection and live face-to-face communication were assessed. RESULTS: Higher frame rate (>7 fps, higher camera resolution (>640 × 480 px and shorter picture/sound delay (<100 ms were associated with increased speech perception scores. Scores were strongly dependent on the speaker but were not influenced by physical properties of the camera optics or the full screen mode. There is a significant median gain of +8.5%pts (p = 0.009 in speech perception for all 21 CI-users if visual cues are additionally shown. CI users with poor open set speech perception scores (n = 11 showed the greatest benefit under combined audio-visual presentation (median speech perception +11.8%pts, p = 0.032. CONCLUSION: Webcameras have the potential to improve telecommunication of hearing-impaired individuals.

  14. Modern Foreign Language Audio Visual Education and Computer Technology%现代外语电化教学与计算机技术

    Institute of Scientific and Technical Information of China (English)

    赵飒

    2012-01-01

    Computer Assisted Foreign Language Teaching is an important and effective means of Modern Foreign Language Audio Visual Education.In teaching,language testing and analysis,corpus construction,electronic dictionaries,machine translation,speech recognition and speech synthesis visual voice software,including text,images,output,audio,video,animation,hypertext links,database production and output function,it has been continuously improving with the development of computer software control.This paper describes the basic content and purpose of the Modern Foreign Language Audio Visual Education as well as the computer teaching software controls.%在现代外语电化教学中,为实现文本、图片、声音、影像、动画、链接、数据库等输出功能,必须借助现代计算机技术的相关语言测试分析、语料库建设、字典词典、机器翻译、语音识别及可视语音合成等教学软件。计算机辅助教学是外语教学中的重要和有效手段,并且始终在不断地完善和发展。本文阐述了现代外语电化教学的基本内容、目的以及相应的计算机教学主要软件控件。

  15. An interactive audio-visual installation using ubiquitous hardware and web-based software deployment

    Directory of Open Access Journals (Sweden)

    Tiago Fernandes Tavares

    2015-05-01

    Full Text Available This paper describes an interactive audio-visual musical installation, namely MOTUS, that aims at being deployed using low-cost hardware and software. This was achieved by writing the software as a web application and using only hardware pieces that are built-in most modern personal computers. This scenario implies in specific technical restrictions, which leads to solutions combining both technical and artistic aspects of the installation. The resulting system is versatile and can be freely used from any computer with Internet access. Spontaneous feedback from the audience has shown that the provided experience is interesting and engaging, regardless of the use of minimal hardware.

  16. PHYSIOLOGICAL MONITORING OPERATORS ACS IN AUDIO-VISUAL SIMULATION OF AN EMERGENCY

    Directory of Open Access Journals (Sweden)

    S. S. Aleksanin

    2010-01-01

    Full Text Available In terms of ship simulator automated control systems we have investigated the information content of physiological monitoring cardiac rhythm to assess the reliability and noise immunity of operators of various specializations with audio-visual simulation of an emergency. In parallel, studied the effectiveness of protection against the adverse effects of electromagnetic fields. Monitoring of cardiac rhythm in a virtual crash it is possible to differentiate the degree of voltage regulation systems of body functions of operators on specialization and note the positive effect of the use of means of protection from exposure of electromagnetic fields.

  17. Using Play Activities and Audio-Visual Aids to Develop Speaking Skills

    Directory of Open Access Journals (Sweden)

    Casallas Mutis Nidia

    2000-08-01

    Full Text Available A project was conducted in order to improve oral proficiency in English through the use of play activities and audio-visual aids, with students of first grade in a bilingual school, in la Calera. They were between 6 and 7 years old. As the sample for this study, the fivestudents who had the lowest language oral proficiency were selected. According to the results, it is clear that the sample has improved their English oral proficiency a great deal. However, the process has to be continued because this skill needs constant practice in order to be developed.

  18. The perception of speech modulation cues in lexical tones is guided by early language-specific experience

    Directory of Open Access Journals (Sweden)

    Laurianne eCabrera

    2015-08-01

    Full Text Available A number of studies showed that infants reorganize their perception of speech sounds according to their native language categories during their first year of life. Still, information is lacking about the contribution of basic auditory mechanisms to this process. This study aimed to evaluate when native language experience starts to noticeably affect the perceptual processing of basic acoustic cues (i.e., frequency-modulation (FM and amplitude-modulation (AM information known to be crucial for speech perception in adults. The discrimination of a lexical-tone contrast (rising versus low was assessed in 6- and 10-month-old infants learning either French or Mandarin using a visual habituation paradigm. The lexical tones were presented in two conditions designed to either keep intact or to severely degrade the FM and fine spectral cues needed to accurately perceive voice-pitch trajectory. A third condition was designed to assess the discrimination of the same voice-pitch trajectories using click trains containing only the FM cues related to the fundamental-frequency (F0 in French- and Mandarin-learning 10-month-old infants. Results showed that the younger infants of both language groups and the Mandarin-learning 10-month-olds discriminated the intact lexical-tone contrast while French-learning 10-month-olds failed. However, only the French 10-month-olds discriminated degraded lexical tones when FM, and thus voice-pitch cues were reduced. Moreover, Mandarin-learning 10-month-olds were found to discriminate the pitch trajectories as presented in click trains better than French infants. Altogether, these results reveal that the perceptual reorganization occurring during the first year of life for lexical tones is coupled with changes in the auditory ability to use speech modulation cues.

  19. Effects of audio-visual aids on foreign language test anxiety, reading and listening comprehension, and retention in EFL learners.

    Science.gov (United States)

    Lee, Shu-Ping; Lee, Shin-Da; Liao, Yuan-Lin; Wang, An-Chi

    2015-04-01

    This study examined the effects of audio-visual aids on anxiety, comprehension test scores, and retention in reading and listening to short stories in English as a Foreign Language (EFL) classrooms. Reading and listening tests, general and test anxiety, and retention were measured in English-major college students in an experimental group with audio-visual aids (n=83) and a control group without audio-visual aids (n=94) with similar general English proficiency. Lower reading test anxiety, unchanged reading comprehension scores, and better reading short-term and long-term retention after four weeks were evident in the audiovisual group relative to the control group. In addition, lower listening test anxiety, higher listening comprehension scores, and unchanged short-term and long-term retention were found in the audiovisual group relative to the control group after the intervention. Audio-visual aids may help to reduce EFL learners' listening test anxiety and enhance their listening comprehension scores without facilitating retention of such materials. Although audio-visual aids did not increase reading comprehension scores, they helped reduce EFL learners' reading test anxiety and facilitated retention of reading materials.

  20. The presentation of expert testimony via live audio-visual communication.

    Science.gov (United States)

    Miller, R D

    1991-01-01

    As part of a national effort to improve efficiency in court procedures, the American Bar Association has recommended, on the basis of a number of pilot studies, increased use of current audio-visual technology, such as telephone and live video communication, to eliminate delays caused by unavailability of participants in both civil and criminal procedures. Although these recommendations were made to facilitate court proceedings, and for the convenience of attorneys and judges, they also have the potential to save significant time for clinical expert witnesses as well. The author reviews the studies of telephone testimony that were done by the American Bar Association and other legal research groups, as well as the experience in one state forensic evaluation and treatment center. He also reviewed the case law on the issue of remote testimony. He then presents data from a national survey of state attorneys general concerning the admissibility of testimony via audio-visual means, including video depositions. Finally, he concludes that the option to testify by telephone provides a significant savings in precious clinical time for forensic clinicians in public facilities, and urges that such clinicians work actively to convince courts and/or legislatures in states that do not permit such testimony (currently the majority), to consider accepting it, to improve the effective use of scarce clinical resources in public facilities.

  1. A new alley in Opinion Mining using Senti Audio Visual Algorithm

    Directory of Open Access Journals (Sweden)

    Mukesh Rawat,

    2016-02-01

    Full Text Available People share their views about products and services over social media, blogs, forums etc. If someone is willing to spend resources and money over these products and services will definitely learn about them from the past experiences of their peers. Opinion mining plays vital role in knowing increasing interests of a particular community, social and political events, making business strategies, marketing campaigns etc. This data is in unstructured form over internet but analyzed properly can be of great use. Sentiment analysis focuses on polarity detection of emotions like happy, sad or neutral. In this paper we proposed an algorithm i.e. Senti Audio Visual for examining Video as well as Audio sentiments. A review in the form of video/audio may contain several opinions/emotions, this algorithm will classify the reviews with the help of Baye’s Classifiers to three different classes i.e., positive, negative or neutral. The algorithm will use smiles, cries, gazes, pauses, pitch, and intensity as relevant Audio Visual features.

  2. Designing Promotion Strategy of Malang Raya’s Tourism Destination Branding through Audio Visual Media

    Directory of Open Access Journals (Sweden)

    Chanira Nuansa

    2014-04-01

    Full Text Available This study examines the suitability concept of destination branding with existing models of Malang tourism promotion. This research is qualitative by taking the data directly in the form of existing promotional models of Malang, namely: information portal sites, blogs, social networking, and video via the Internet. This study used SWOT analysis to find strengths, weaknesses, opportunities, and threats on existing models of the tourism promotion. The data is analyzed based on destination branding’s concept indicators. Results of analysis are used as a basis in designing solutions for Malang tourism promotion through a new integrated tourism advertising model. Through the analysis we found that video is the most suitable media that used to promote Malang tourism in the form of advertisements. Videos are able to show the objectivity of the fact that intact better through audio-visual form, making it easier to associate the viewer thoughts on the phenomenon of destination. Moreover, video creation of Malang tourism as well as conceptualized ad is still rare. This is an opportunity, because later models of audio-visual advertisements made of this study is expected to be an example for concerned parties to conceptualize the next Malang tourism advertising.Keywords: Advertise, SWOT Analysis, Malang City, tourism promotion

  3. Finding the Correspondence of Audio-Visual Events by Object Manipulation

    Science.gov (United States)

    Nishibori, Kento; Takeuchi, Yoshinori; Matsumoto, Tetsuya; Kudo, Hiroaki; Ohnishi, Noboru

    A human being understands the objects in the environment by integrating information obtained by the senses of sight, hearing and touch. In this integration, active manipulation of objects plays an important role. We propose a method for finding the correspondence of audio-visual events by manipulating an object. The method uses the general grouping rules in Gestalt psychology, i.e. “simultaneity” and “similarity” among motion command, sound onsets and motion of the object in images. In experiments, we used a microphone, a camera, and a robot which has a hand manipulator. The robot grasps an object like a bell and shakes it or grasps an object like a stick and beat a drum in a periodic, or non-periodic motion. Then the object emits periodical/non-periodical events. To create more realistic scenario, we put other event source (a metronome) in the environment. As a result, we had a success rate of 73.8 percent in finding the correspondence between audio-visual events (afferent signal) which are relating to robot motion (efferent signal).

  4. Spectacular Attractions: Museums, Audio-Visuals and the Ghosts of Memory

    Directory of Open Access Journals (Sweden)

    Mandelli Elisa

    2015-12-01

    Full Text Available In the last decades, moving images have become a common feature not only in art museums, but also in a wide range of institutions devoted to the conservation and transmission of memory. This paper focuses on the role of audio-visuals in the exhibition design of history and memory museums, arguing that they are privileged means to achieve the spectacular effects and the visitors’ emotional and “experiential” engagement that constitute the main objective of contemporary museums. I will discuss this topic through the concept of “cinematic attraction,” claiming that when embedded in displays, films and moving images often produce spectacular mises en scène with immersive effects, creating wonder and astonishment, and involving visitors on an emotional, visceral and physical level. Moreover, I will consider the diffusion of audio-visual witnesses of real or imaginary historical characters, presented in Phantasmagoria-like displays that simulate ghostly and uncanny apparitions, creating an ambiguous and often problematic coexistence of truth and illusion, subjectivity and objectivity, facts and imagination.

  5. Modulation of visual responses in the superior temporal sulcus by audio-visual congruency.

    Science.gov (United States)

    Dahl, Christoph D; Logothetis, Nikos K; Kayser, Christoph

    2010-01-01

    Our ability to identify or recognize visual objects is often enhanced by evidence provided by other sensory modalities. Yet, where and how visual object processing benefits from the information received by the other senses remains unclear. One candidate region is the temporal lobe, which features neural representations of visual objects, and in which previous studies have provided evidence for multisensory influences on neural responses. In the present study we directly tested whether visual representations in the lower bank of the superior temporal sulcus (STS) benefit from acoustic information. To this end, we recorded neural responses in alert monkeys passively watching audio-visual scenes, and quantified the impact of simultaneously presented sounds on responses elicited by the presentation of naturalistic visual scenes. Using methods of stimulus decoding and information theory, we then asked whether the responses of STS neurons become more reliable and informative in multisensory contexts. Our results demonstrate that STS neurons are indeed sensitive to the modality composition of the sensory stimulus. Importantly, information provided by STS neurons' responses about the particular visual stimulus being presented was highest during congruent audio-visual and unimodal visual stimulation, but was reduced during incongruent bimodal stimulation. Together, these findings demonstrate that higher visual representations in the STS not only convey information about the visual input but also depend on the acoustic context of a visual scene.

  6. Modulation of visual responses in the superior temporal sulcus by audio-visual congruency

    Directory of Open Access Journals (Sweden)

    Christoph Dahl

    2010-04-01

    Full Text Available Our ability to identify or recognize visual objects is often enhanced by evidence provided by other sensory modalities. Yet, where and how visual object processing benefits from the information received by the other senses remains unclear. One candidate region is the temporal lobe, which features neural representations of visual objects, and in which previous studies have provided evidence for multisensory influences on neural responses. In the present study we directly tested whether visual representations in the lower bank of the superior temporal sulcus (STS benefit from acoustic information. To this end, we recorded neural responses in alert monkeys passively watching audio-visual scenes, and quantified the impact of simultaneously presented sounds on responses elicited by the presentation of naturalistic visual scenes. Using methods of stimulus decoding and information theory, we then asked whether the responses of STS neurons become more reliable and informative in multisensory contexts. Our results demonstrate that STS neurons are indeed sensitive to the modality composition of the sensory stimulus. Importantly, information provided by STS neurons’ responses about the particular visual stimulus being presented was highest during congruent audio-visual and unimodal visual stimulation, but was reduced during incongruent bimodal stimulation. Together, these findings demonstrate that higher visual representations in the STS not only convey information about the visual input but also depend on the acoustic context of a visual scene.

  7. THE IMPROVEMENT OF AUDIO-VISUAL BASED DANCE APPRECIATION LEARNING AMONG PRIMARY TEACHER EDUCATION STUDENTS OF MAKASSAR STATE UNIVERSITY

    Directory of Open Access Journals (Sweden)

    Wahira

    2014-06-01

    Full Text Available This research aimed to improve the skill in appreciating dances owned by the students of Primary Teacher Education of Makassar State University, to improve the perception towards audio-visual based art appreciation, to increase the students’ interest in audio-visual based art education subject, and to increase the students’ responses to the subject. This research was classroom action research using the research design created by Kemmis & MC. Taggart, which was conducted to 42 students of Primary Teacher Education of Makassar State University. The data collection was conducted using observation, questionnaire, and interview. The techniques of data analysis applied in this research were descriptive qualitative and quantitative. The results of this research were: (1 the students’ achievement in audio-visual based dance appreciation improved: precycle 33,33%, cycle I 42,85% and cycle II 83,33%, (2 the students’ perception towards the audio-visual based dance appreciation improved: cycle I 59,52%, and cycle II 71,42%. The students’ perception towards the subject obtained through structured interview in cycle I and II was 69,83% in a high category, (3 the interest of the students in the art education subject, especially audio-visual based dance appreciation, increased: cycle I 52,38% and cycle II 64,28%, and the students’ interest in the subject obtained through structured interview was 69,50 % in a high category. (3 the students’ response to audio-visual based dance appreciation increased: cycle I 54,76% and cycle II 69,04% in a good category.

  8. Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video

    Institute of Scientific and Technical Information of China (English)

    LiuHua-yong; ZhouDong-ru

    2003-01-01

    Video data are composed of multimodal information streams including visual, auditory and textual streams, an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust.

  9. Integrating Audio-Visual Features and Text Information for Story Segmentation of News Video

    Institute of Scientific and Technical Information of China (English)

    Liu Hua-yong; Zhou Dong-ru

    2003-01-01

    Video data are composed of multimodal information streams including visual, auditory and textual streams, so an approach of story segmentation for news video using multimodal analysis is described in this paper. The proposed approach detects the topic-caption frames, and integrates them with silence clips detection results, as well as shot segmentation results to locate the news story boundaries. The integration of audio-visual features and text information overcomes the weakness of the approach using only image analysis techniques. On test data with 135 400 frames, when the boundaries between news stories are detected, the accuracy rate 85.8% and the recall rate 97.5% are obtained. The experimental results show the approach is valid and robust.

  10. Literary Genres in Social Life: A Narrative, Audio-visual and Poetic Approach

    Directory of Open Access Journals (Sweden)

    Luis Felipe González Gutiérrez

    2008-05-01

    Full Text Available The proposal, "Literary Genres in Social Life: a Narrative, Audio-visual and Poetic Approach", attempts, by objective, to present/display to the academic psychology community and compatible social science disciplines the main contributions of literary genre theory through a social constructionist understanding of narrations and daily stories, and by means of an interactive construction of narrative collage. This work, sustained by an investigation financed by the University Santo Tomás in Bogota, Colombia, "Understanding of structuralist literary theories in the development of the narrative 'I' within the social constructionist approach", tries to propose alternative spaces for the presentation of its investigative results through the expression of metaphors, visual narrative sequences and interactive artistic forms, which invite the spectator to share in and to include/understand important concepts in the consolidation of social forms of construction of the quotidian. URN: urn:nbn:de:0114-fqs0802373

  11. Exploring determinants of early user acceptance for an audio-visual heritage archive service using the vignette method

    NARCIS (Netherlands)

    Ongena, Guido; Wijngaert, van de Lidwien; Huizer, E.

    2013-01-01

    The purpose of this study is to investigate factors, which explain the behavioural intention of the use of a new audio-visual cultural heritage archive service. An online survey in combination with a factorial survey is utilised to investigate the predictable strength of technological, individual an

  12. Undifferentiated Facial Electromyography Responses to Dynamic, Audio-Visual Emotion Displays in Individuals with Autism Spectrum Disorders

    Science.gov (United States)

    Rozga, Agata; King, Tricia Z.; Vuduc, Richard W.; Robins, Diana L.

    2013-01-01

    We examined facial electromyography (fEMG) activity to dynamic, audio-visual emotional displays in individuals with autism spectrum disorders (ASD) and typically developing (TD) individuals. Participants viewed clips of happy, angry, and fearful displays that contained both facial expression and affective prosody while surface electrodes measured…

  13. The role of reverberation-related binaural cues in the externalization of speech

    DEFF Research Database (Denmark)

    Catic, Jasmina; Santurette, Sébastien; Dau, Torsten

    2015-01-01

    for the externalization of a lateral sound source. In contrast, for a frontal source, an increased amount of binaural cues from reflections was required in order to obtain well externalized sound images. It was demonstrated that the interaction between the interaural cues of the direct sound and the reverberation...

  14. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults ?

    Directory of Open Access Journals (Sweden)

    Clémence eBayard

    2014-05-01

    Full Text Available Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967. Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/ which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/, lip-reading (when the response was /ka/, fusion (when the response was /ta/ and other (when the response was something other than /pa/, /ka/ or /ta/. Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N=8, hearing-individuals who were experts in CS (N = 14 and hearing-individuals who were completely naïve of CS (N = 15. Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf

  15. How is the McGurk effect modulated by Cued Speech in deaf and hearing adults?

    Science.gov (United States)

    Bayard, Clémence; Colin, Cécile; Leybaert, Jacqueline

    2014-01-01

    Speech perception for both hearing and deaf people involves an integrative process between auditory and lip-reading information. In order to disambiguate information from lips, manual cues from Cued Speech may be added. Cued Speech (CS) is a system of manual aids developed to help deaf people to clearly and completely understand speech visually (Cornett, 1967). Within this system, both labial and manual information, as lone input sources, remain ambiguous. Perceivers, therefore, have to combine both types of information in order to get one coherent percept. In this study, we examined how audio-visual (AV) integration is affected by the presence of manual cues and on which form of information (auditory, labial or manual) the CS receptors primarily rely. To address this issue, we designed a unique experiment that implemented the use of AV McGurk stimuli (audio /pa/ and lip-reading /ka/) which were produced with or without manual cues. The manual cue was congruent with either auditory information, lip information or the expected fusion. Participants were asked to repeat the perceived syllable aloud. Their responses were then classified into four categories: audio (when the response was /pa/), lip-reading (when the response was /ka/), fusion (when the response was /ta/) and other (when the response was something other than /pa/, /ka/ or /ta/). Data were collected from hearing impaired individuals who were experts in CS (all of which had either cochlear implants or binaural hearing aids; N = 8), hearing-individuals who were experts in CS (N = 14) and hearing-individuals who were completely naïve of CS (N = 15). Results confirmed that, like hearing-people, deaf people can merge auditory and lip-reading information into a single unified percept. Without manual cues, McGurk stimuli induced the same percentage of fusion responses in both groups. Results also suggest that manual cues can modify the AV integration and that their impact differs between hearing and deaf people.

  16. The Big Australian Speech Corpus (The Big ASC)

    NARCIS (Netherlands)

    Wagner, M.; Tran, D.; Togneri, R.; Rose, P.; Powers, D.M.; Onslow, M.; Loakes, D.E.; Lewis, T.W.; Kuratate, T.; Kinoshita, Y.; Kemp, N.; Ishihara, S.; Ingram, J.C.; Hajek, J.T.; Grayden, D.B.; Goecke, R.; Fletcher, J.M.; Estival, D.; Epps, J.R.; Dale, R.; Cutler, A.; Cox, F.M.; Chetty, G.; Cassidy, S.; Butcher, A.R.; Burnham, D.; Bird, S.; Best, C.T.; Bennamoun, M.; Arciuli, J.; Ambikairajah, E.

    2011-01-01

    Under an ARC Linkage Infrastructure, Equipment and Facilities (LIEF) grant, speech science and technology experts from across Australia have joined forces to organise the recording of audio-visual (AV) speech data from representative speakers of Australian English in all capital cities and some regi

  17. Visual-Auditory Integration during Speech Imitation in Autism

    Science.gov (United States)

    Williams, Justin H. G.; Massaro, Dominic W.; Peel, Natalie J.; Bosseler, Alexis; Suddendorf, Thomas

    2004-01-01

    Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional "mirror neuron" systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a "virtual" head (Baldi), delivered speech stimuli for…

  18. Effectiveness of Two Topical Anaesthetic Agents used along with Audio Visual Aids in Paediatric Dental Patients

    Science.gov (United States)

    Dhawan, Jayata; Kumar, Dipanshu; Anand, Ashish; Tangri, Karan

    2017-01-01

    Abstract Introduction Topical anaesthetic agents enable pain free intraoral procedures, symptomatic pain relief for toothache, superficial mucosal lesions and pain related to post extraction time. Most common anxiety provoking and fearful experience for children in dental operatory is administration of local anaesthesia because on seeing the needle, children usually become uncooperative. One of recent trend of behaviour management technique is using non-aversive techniques out of which audiovisual distraction has emerged as a very successful technique for managing children in dental settings. Audio visual distraction could decrease the procedure related anxiety of patients undergoing dental treatment and can be very relaxing for highly anxious patients. Aim The aim of the present study was to compare the efficacy of topical anaesthetics EMLA (Eutectic Mixture of Local Anaesthetics) cream and benzocaine (20%) gel in reducing the pain during the needle insertion with and without the use of Audio Visual (AV) aids. Materials and Methods The study was conducted on 120 children, the age range of 3-14 years attending the outpatient department for their treatment. EMLA and benzocaine gel (20%) were assessed for their effectiveness in reducing the pain on needle insertion during local anaesthesia administration. Based on the inclusion and the exclusion criteria, children requiring local anaesthesia for the dental treatment were randomly divided into four equal groups of 30 children based upon whether AV aids were used or not. AV aids were given using Sony Vaio laptop with earphones with nursery rhymes and cartoon movies DVD. The pain assessment was done by using the Visual Analogue Scale (VAS) scale and measurement of the physiological responses of pulse rate and oxygen saturation were done by pulse oximeter. Results There was a statistically significant difference in the mean pain score, pulse rate and mean oxygen saturation rate when it was compared between the four

  19. Training changes processing of speech cues in older adults with hearing loss

    Directory of Open Access Journals (Sweden)

    Samira eAnderson

    2013-11-01

    Full Text Available Aging results in a loss of sensory function, and the effects of hearing impairment can be especially devastating due to reduced communication ability. Older adults with hearing loss report that speech, especially in noisy backgrounds, is uncomfortably loud yet unclear. Hearing loss results in an unbalanced neural representation of speech: the slowly-varying envelope is enhanced, dominating representation in the auditory pathway and perceptual salience at the cost of the rapidly-varying fine structure. We hypothesized that older adults with hearing loss can be trained to compensate for these changes in central auditory processing through directed attention to behaviorally-relevant speech sounds. To that end, we evaluated the effects of auditory-cognitive training in older adults (ages 55-79 with normal hearing and hearing loss. After training, the auditory training group with hearing loss experienced a reduction in the neural representation of the speech envelope presented in noise, approaching levels observed in normal hearing older adults. No changes were noted in the control group. Importantly, changes in speech processing were accompanied by improvements in speech perception. Thus, central processing deficits associated with hearing loss may be partially remediated with training, resulting in real-life benefits for everyday communication.

  20. Online dissection audio-visual resources for human anatomy: Undergraduate medical students' usage and learning outcomes.

    Science.gov (United States)

    Choi-Lundberg, Derek L; Cuellar, William A; Williams, Anne-Marie M

    2016-11-01

    In an attempt to improve undergraduate medical student preparation for and learning from dissection sessions, dissection audio-visual resources (DAVR) were developed. Data from e-learning management systems indicated DAVR were accessed by 28% ± 10 (mean ± SD for nine DAVR across three years) of students prior to the corresponding dissection sessions, representing at most 58% ± 20 of assigned dissectors. Approximately 50% of students accessed all available DAVR by the end of semester, while 10% accessed none. Ninety percent of survey respondents (response rate 58%) generally agreed that DAVR improved their preparation for and learning from dissection when used. Of several learning resources, only DAVR usage had a significant positive correlation (P = 0.002) with feeling prepared for dissection. Results on cadaveric anatomy practical examination questions in year 2 (Y2) and year 3 (Y3) cohorts were 3.9% (P Educ 9: 545-554. © 2016 American Association of Anatomists.

  1. Bilingualism and Children's Use of Paralinguistic Cues to Interpret Emotion in Speech

    Science.gov (United States)

    Yow, W. Quin; Markman, Ellen M.

    2011-01-01

    Preschoolers tend to rely on what speakers say rather than how they sound when interpreting a speaker's emotion while adults rely instead on tone of voice. However, children who have a greater need to attend to speakers' communicative requirements, such as bilingual children, may be more adept in using paralinguistic cues (e.g. tone of voice) when…

  2. Synchronized audio-visual transients drive efficient visual search for motion-in-depth.

    Directory of Open Access Journals (Sweden)

    Marina Zannoli

    Full Text Available In natural audio-visual environments, a change in depth is usually correlated with a change in loudness. In the present study, we investigated whether correlating changes in disparity and loudness would provide a functional advantage in binding disparity and sound amplitude in a visual search paradigm. To test this hypothesis, we used a method similar to that used by van der Burg et al. to show that non-spatial transient (square-wave modulations of loudness can drastically improve spatial visual search for a correlated luminance modulation. We used dynamic random-dot stereogram displays to produce pure disparity modulations. Target and distractors were small disparity-defined squares (either 6 or 10 in total. Each square moved back and forth in depth in front of the background plane at different phases. The target's depth modulation was synchronized with an amplitude-modulated auditory tone. Visual and auditory modulations were always congruent (both sine-wave or square-wave. In a speeded search task, five observers were asked to identify the target as quickly as possible. Results show a significant improvement in visual search times in the square-wave condition compared to the sine condition, suggesting that transient auditory information can efficiently drive visual search in the disparity domain. In a second experiment, participants performed the same task in the absence of sound and showed a clear set-size effect in both modulation conditions. In a third experiment, we correlated the sound with a distractor instead of the target. This produced longer search times, indicating that the correlation is not easily ignored.

  3. Audio-Visual and Autogenic Relaxation Alter Amplitude of Alpha EEG Band, Causing Improvements in Mental Work Performance in Athletes.

    Science.gov (United States)

    Mikicin, Mirosław; Kowalczyk, Marek

    2015-09-01

    The aim of the present study was to investigate the effect of regular audio-visual relaxation combined with Schultz's autogenic training on: (1) the results of behavioral tests that evaluate work performance during burdensome cognitive tasks (Kraepelin test), (2) changes in classical EEG alpha frequency band, neocortex (frontal, temporal, occipital, parietal), hemisphere (left, right) versus condition (only relaxation 7-12 Hz). Both experimental (EG) and age-and skill-matched control group (CG) consisted of eighteen athletes (ten males and eight females). After 7-month training EG demonstrated changes in the amplitude of mean electrical activity of the EEG alpha bend at rest and an improvement was significantly changing and an improvement in almost all components of Kraepelin test. The same examined variables in CG were unchanged following the period without the intervention. Summing up, combining audio-visual relaxation with autogenic training significantly improves athlete's ability to perform a prolonged mental effort. These changes are accompanied by greater amplitude of waves in alpha band in the state of relax. The results suggest usefulness of relaxation techniques during performance of mentally difficult sports tasks (sports based on speed and stamina, sports games, combat sports) and during relax of athletes.

  4. Speech entrainment enables patients with Broca's aphasia to produce fluent speech.

    Science.gov (United States)

    Fridriksson, Julius; Hubbard, H Isabel; Hudspeth, Sarah Grace; Holland, Audrey L; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

    2012-12-01

    A distinguishing feature of Broca's aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect 'speech entrainment' and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca's aphasia. In Experiment 1, 13 patients with Broca's aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca's area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and

  5. Impact of Audio-Visual Asynchrony on Lip-Reading Effects -Neuromagnetic and Psychophysical Study.

    Science.gov (United States)

    Kawase, Tetsuaki; Yahata, Izumi; Kanno, Akitake; Sakamoto, Shuichi; Takanashi, Yoshitaka; Takata, Shiho; Nakasato, Nobukazu; Kawashima, Ryuta; Katori, Yukio

    2016-01-01

    The effects of asynchrony between audio and visual (A/V) stimuli on the N100m responses of magnetoencephalography in the left hemisphere were compared with those on the psychophysical responses in 11 participants. The latency and amplitude of N100m were significantly shortened and reduced in the left hemisphere by the presentation of visual speech as long as the temporal asynchrony between A/V stimuli was within 100 ms, but were not significantly affected with audio lags of -500 and +500 ms. However, some small effects were still preserved on average with audio lags of 500 ms, suggesting similar asymmetry of the temporal window to that observed in psychophysical measurements, which tended to be more robust (wider) for audio lags; i.e., the pattern of visual-speech effects as a function of A/V lag observed in the N100m in the left hemisphere grossly resembled that in psychophysical measurements on average, although the individual responses were somewhat varied. The present results suggest that the basic configuration of the temporal window of visual effects on auditory-speech perception could be observed from the early auditory processing stage.

  6. Media audio-visual English course design%媒体英语视听说课程设计

    Institute of Scientific and Technical Information of China (English)

    陈赏

    2014-01-01

    Media audio-visual English course is from western mainstream media show or movie clips featured in linguistic context to the students of the video materials as a new teaching content. The author combines his own teaching practice, according to media English of this new subject to put forward reasonable curriculum design solutions and suggestions.%媒体英语视听说课程是从西方国家主流媒体节目或影视片断中精选出语言语境贴近学生的视频材料作为教学内容的一门新课。笔者结合自己的教学实践,针对媒体英语视听说这门新兴课程提出了合理的课程设计方案和建议。

  7. Brain responses and looking behaviour during audiovisual speech integration in infants predict auditory speech comprehension in the second year of life.

    Directory of Open Access Journals (Sweden)

    Elena V Kushnerenko

    2013-07-01

    Full Text Available The use of visual cues during the processing of audiovisual speech is known to be less efficient in children and adults with language difficulties and difficulties are known to be more prevalent in children from low-income populations. In the present study, we followed an economically diverse group of thirty-seven infants longitudinally from 6-9 months to 14-16 months of age. We used eye-tracking to examine whether individual differences in visual attention during audiovisual processing of speech in 6 to 9 month old infants, particularly when processing congruent and incongruent auditory and visual speech cues, might be indicative of their later language development. Twenty-two of these 6-9 month old infants also participated in an event-related potential (ERP audiovisual task within the same experimental session. Language development was then followed-up at the age of 14-16 months, using two measures of language development, the Preschool Language Scale (PLS and the Oxford Communicative Development Inventory (CDI. The results show that those infants who were less efficient in auditory speech processing at the age of 6-9 months had lower receptive language scores at 14-16 months. A correlational analysis revealed that the pattern of face scanning and ERP responses to audio-visually incongruent stimuli at 6-9 months were both significantly associated with language development at 14-16 months. These findings add to the understanding of individual differences in neural signatures of audiovisual processing and associated looking behaviour in infants.

  8. Relationship between Audio-Visual Materials and Environmental Factors on Students Academic Performance in Senior Secondary Schools in Borno State: Implications for Counselling

    Science.gov (United States)

    Bello, S.; Goni, Umar

    2016-01-01

    This is a survey study, designed to determine the relationship between audio-visual materials and environmental factors on students' academic performance in Senior Secondary Schools in Borno State: Implications for Counselling. The study set two research objectives, and tested two research hypotheses. The population of this study is 1,987 students…

  9. IST BENOGO (IST – 2001-39184) Deliverable I-AAU-05-01: Role of sound in VR and Audio Visual Preferences

    DEFF Research Database (Denmark)

    Nordahl, Rolf

    This Periodic Progres Report (PPR) document reports on the studies done in Aalborg University on December 2004 concerning role of sound in VR, audio-visual correlations and attention triggering. The report contains a description and evaluation of the experiments run, together with the analysis...

  10. Twenty-Fifth Annual Audio-Visual Aids Conference, Wednesday 9th to Friday 11th July 1975, Whitelands College, Putney SW15. Conference Preprints.

    Science.gov (United States)

    National Committee for Audio-Visual Aids in Education, London (England).

    Preprints of papers to be presented at the 25th annual Audio-Visual Aids Conference are collected along with the conference program. Papers include official messages, a review of the conference's history, and presentations on photography in education, using school broadcasts, flexibility in the use of television, the "communications generation,"…

  11. Attitude of medical students towards the use of audio visual aids during didactic lectures in pharmacology in a medical college of central India

    Directory of Open Access Journals (Sweden)

    Mehul Agrawal

    2016-04-01

    Conclusions: In our study we found that students preferred mixture of audio visual aids over other teaching methods. Teachers should consider the suggestions given by the students while preparing their lectures. [Int J Basic Clin Pharmacol 2016; 5(2.000: 416-422

  12. Clever Use of Audio-visual Media to Promote the Teaching of History%巧用电教媒体推进历史教学

    Institute of Scientific and Technical Information of China (English)

    刘艳丽

    2012-01-01

    把电教手段引入课堂教学是一类比较新的教学方式创新,特别是运用在历史的教学实践中,在历史课堂上使用电教媒体,不但能增加学生对过去历史的具体感知,同时也可以通过对历史事实的客观描述强化思维力度.本文笔者对电教媒体教学的特点进行了详细介绍,并就如何利用电教媒体推动历史教学方面谈了自己的一些感受.%Audio-visual means of introduction of classroom teaching a class of relatively new way of teaching innovation, especially the use of in the history of teaching practice, the use of audio-visual media in the history classroom, not only to in- crease the students' past history perception, and also throughobjective description of historical facts are efforts to suengthen the thinking. This article the author described in detail the characteristics of the audio-visual media teaching, and on how to promote the teaching of history in the use of audio-visual media to talk about their own feelings.

  13. Audiovisual Speech Perception in Infancy: The Influence of Vowel Identity and Infants' Productive Abilities on Sensitivity to (Mis)Matches between Auditory and Visual Speech Cues

    Science.gov (United States)

    Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias

    2016-01-01

    Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds…

  14. Design Foundations for Content-Rich Acoustic Interfaces: Investigating Audemes as Referential Non-Speech Audio Cues

    Science.gov (United States)

    Ferati, Mexhid Adem

    2012-01-01

    To access interactive systems, blind and visually impaired users can leverage their auditory senses by using non-speech sounds. The current structure of non-speech sounds, however, is geared toward conveying user interface operations (e.g., opening a file) rather than large theme-based information (e.g., a history passage) and, thus, is ill-suited…

  15. Toastmasters学习模式在商务英语视听说中的运用%The Application of Toastmasters Learning Model in Business English Audio-visual Course

    Institute of Scientific and Technical Information of China (English)

    付爱玲

    2014-01-01

    Toastmasters国际演讲俱乐部因其独特的英语学习模式,在全球取得了成功。该俱乐部有明确的角色任务分配,对于提高会员的英语表达能力、自信心、沟通力和领导力有重要的意义。《商务英语视听说》是一门实践性较强的ESP课程,在其课堂中借鉴Toastmasters模式,能够充分调动学生的积极性主动性,提高其口语表达能力及用语言处理商务事宜的能力。%Toastmasters international speech club, because of its particular mode of English learning, got the global success. The club has a clear role distribution that is significant to improve the members' English express ability, self-confidence, communication skills, and leadership. Business English audio-visual is a highly practical ESP course in which draw lessons from Toastmasters class mode, can fully arouse the enthusiasm of the students initiative and improve their oral English skills and ability to use language processing business matters.

  16. THE EFFECT OF USING AUDIO-VISUAL AIDS VERSUS PICTURES ON FOREIGN LANGUAGE VOCABULARY LEARNING OF INDIVIDUALS WITH MILD INTELLECTUAL DISABILITY

    Directory of Open Access Journals (Sweden)

    Zahra Sadat NOORI

    2016-04-01

    Full Text Available This study aimed to examine the effect of using audio-visual aids and pictures on foreign language vocabulary learning of individuals with mild intellectual disability. Method: To this end, a comparison group quasi-experimental study was conducted along with a pre-test and a post-test. The participants were 16 individuals with mild intellectual disability living in a center for mentally disabled individuals in Dezfoul, Iran. They were all male individuals with the age range of 20 to 30. Their mother tongue was Persian, and they did not have any English background. In order to ensure that all participants were within the same IQ level, a standard IQ test, i.e. Colored Progressive Matrices test, was run. Afterwards, the participants were randomly assigned to two experimental groups; one group received the instruction through audio-visual aids, while the other group was taught through pictures. The treatment lasted for four weeks, 20 sessions on aggregate. A total number of 60 English words selected from the English package named 'The Smart Child' were taught. After the treatment, the participants took the posttest in which the researchers randomly selected 40 words from among the 60 target words. Results: The results of Mann-Whitney U-test indicated that using audio-visual aids was more effective than pictures in foreign language vocabulary learning of individuals with mild intellectual disability. Conclusions: It can be concluded that the use of audio-visual aids can be more effective than pictures in foreign language vocabulary learning of individuals with mild intellectual disability.

  17. N1 enhancement in synesthesia during visual and audio-visual perception in semantic cross-modal conflict situations: an ERP study

    Directory of Open Access Journals (Sweden)

    Christopher eSinke

    2014-01-01

    Full Text Available Synesthesia entails a special kind of sensory perception, where stimulation in one sensory modality leads to an internally generated perceptual experience of another, not stimulated sensory modality. This phenomenon can be viewed as an abnormal multisensory integration process as here the synesthetic percept is aberrantly fused with the stimulated modality. Indeed, recent synesthesia research has focused on multimodal processing even outside of the specific synesthesia-inducing context and has revealed changed multimodal integration, thus suggesting perceptual alterations at a global level. Here, we focused on audio-visual processing in synesthesia using a semantic classification task in combination with visually or auditory-visually presented animated and inanimated objects in an audio-visual congruent and incongruent manner. Fourteen subjects with auditory-visual and/or grapheme-color synesthesia and 14 control subjects participated in the experiment. During presentation of the stimuli, event-related potentials were recorded from 32 electrodes. The analysis of reaction times and error rates revealed no group differences with best performance for audio-visually congruent stimulation indicating the well-known multimodal facilitation effect. We found an enhanced amplitude of the N1 component over occipital electrode sites for synesthetes compared to controls. The differences occurred irrespective of the experimental condition and therefore suggest a global influence on early sensory processing in synesthetes.

  18. 视听新媒体内容元数据研究%Content Metadata of the Newly Audio-Visual Media Research

    Institute of Scientific and Technical Information of China (English)

    刘俊宇

    2014-01-01

    The rapid rise of audio -visual new media business makes marking way and method about metadata of content of new media become extremely important, appropriate marking way and method of content of audio-visual new media will directly impact exchange of contents of new media,its storage, positioning,retrieval, management and other related applications, meanwhile greatly affecting efficiency and sustainability of audio-visual new media.%视听新媒体业务的迅速崛起使得新媒体内容元数据标识方式和方法变得尤为重要,合理的视听新媒体内容标识方式和方法将对新媒体内容的交换、存储、定位、检索、管理等相关应用带来直接影响,对视听新媒体的高效性和可持续性有很大影响。

  19. UNDERSTANDING PROSE THROUGH TASK ORIENTED AUDIO-VISUAL ACTIVITY: AN AMERICAN MODERN PROSE COURSE AT THE FACULTY OF LETTERS, PETRA CHRISTIAN UNIVERSITY

    Directory of Open Access Journals (Sweden)

    Sarah Prasasti

    2001-01-01

    Full Text Available The method presented here provides the basis for a course in American prose for EFL students. Understanding and appreciation of American prose is a difficult task for the students because they come into contact with works that are full of cultural baggage and far apart from their own world. The audio visual aid is one of the alternatives to sensitize the students to the topic and the cultural background. Instead of proving the ready-made audio visual aids, teachers can involve students to actively engage in a more task oriented audiovisual project. Here, the teachers encourage their students to create their own audio visual aids using colors, pictures, sound and gestures as a point of initiation for further discussion. The students can use color that has become a strong element of fiction to help them calling up a forceful visual representation. Pictures can also stimulate the students to build their mental image. Sound and silence, which are a part of the fabric of literature, may also help them to increase the emotional impact.

  20. The New Thinking on College English Audio-Visual Teaching%大学英语视听说多媒体网络教学新思路

    Institute of Scientific and Technical Information of China (English)

    陈亚斐; 丰建泉

    2011-01-01

    The paper analyses the current situation of audiovisual teaching in college English,interprets its characteristics,put forth the new idea of audio-visual English teaching in the environment of multi-media network.If multi-media network technique is integrated with audio-visual English teaching,there will appear the new teaching idea "students occupy central position and teachers take enlightening one",thus,the comprehensive ability of the students will be greatly improved,especially in audio-visual ability.%本文在分析当前大学英语视听说教学现状的基础上阐述了大学英语视听说教学的特点,并提出了大学英语视听说多媒体网络教学新思路。若可以将多媒体网络技术与大学英语视听说教学相结合,将会产生"学生主体,教师主导"的新教学思想,从而可以提高学生的综合能力,尤其是英语视听说能力。

  1. Children's Judgments of Emotion from Conflicting Cues in Speech: Why 6-Year-Olds Are So Inflexible

    Science.gov (United States)

    Waxer, Matthew; Morton, J. Bruce

    2011-01-01

    Six-year-old children can judge a speaker's feelings either from content or paralanguage but have difficulty switching the basis of their judgments when these cues conflict. This inflexibility may relate to a lexical bias in 6-year-olds' judgments. Two experiments tested this claim. In Experiment 1, 6-year-olds (n = 40) were as inflexible when…

  2. Visual-tactile integration in speech perception: Evidence for modality neutral speech primitives.

    Science.gov (United States)

    Bicevskis, Katie; Derrick, Donald; Gick, Bryan

    2016-11-01

    Audio-visual [McGurk and MacDonald (1976). Nature 264, 746-748] and audio-tactile [Gick and Derrick (2009). Nature 462(7272), 502-504] speech stimuli enhance speech perception over audio stimuli alone. In addition, multimodal speech stimuli form an asymmetric window of integration that is consistent with the relative speeds of the various signals [Munhall, Gribble, Sacco, and Ward (1996). Percept. Psychophys. 58(3), 351-362; Gick, Ikegami, and Derrick (2010). J. Acoust. Soc. Am. 128(5), EL342-EL346]. In this experiment, participants were presented video of faces producing /pa/ and /ba/ syllables, both alone and with air puffs occurring synchronously and at different timings up to 300 ms before and after the stop release. Perceivers were asked to identify the syllable they perceived, and were more likely to respond that they perceived /pa/ when air puffs were present, with asymmetrical preference for puffs following the video signal-consistent with the relative speeds of visual and air puff signals. The results demonstrate that visual-tactile integration of speech perception occurs much as it does with audio-visual and audio-tactile stimuli. This finding contributes to the understanding of multimodal speech perception, lending support to the idea that speech is not perceived as an audio signal that is supplemented by information from other modes, but rather that primitives of speech perception are, in principle, modality neutral.

  3. Audio Visual Center

    Data.gov (United States)

    Federal Laboratory Consortium — The Audiovisual Services Center provides still photographic documentation with laboratory support, video documentation, video editing, video duplication, photo/video...

  4. Multisensory and modality specific processing of visual speech in different regions of the premotor cortex.

    Science.gov (United States)

    Callan, Daniel E; Jones, Jeffery A; Callan, Akiko

    2014-01-01

    Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex (PMC) has been shown to be active during both observation and execution of action ("Mirror System" properties), and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI) study, participants identified vowels produced by a speaker in audio-visual (saw the speaker's articulating face and heard her voice), visual only (only saw the speaker's articulating face), and audio only (only heard the speaker's voice) conditions with varying audio signal-to-noise ratios in order to determine the regions of the PMC involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the functional magnetic resonance imaging (fMRI) analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and PMC. The left ventral inferior premotor cortex (PMvi) showed properties of multimodal (audio-visual) enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex (PMvs/PMd) did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the PMC are involved with mapping unimodal (in this case visual) sensory features of the speech signal with

  5. Multisensory and Modality Specific Processing of Visual Speech in Different Regions of the Premotor Cortex

    Directory of Open Access Journals (Sweden)

    Daniel eCallan

    2014-05-01

    Full Text Available Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex has been shown to be active during both observation and execution of action (‘Mirror System’ properties, and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI study, participants identified vowels produced by a speaker in audio-visual (saw the speaker’s articulating face and heard her voice, visual only (only saw the speaker’s articulating face, and audio only (only heard the speaker’s voice conditions with varying audio signal-to-noise ratios in order to determine the regions of the premotor cortex involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the fMRI analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and premotor cortex. The left ventral inferior premotor cortex showed properties of multimodal (audio-visual enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the premotor cortex are involved with mapping unimodal (in this case visual sensory features of the speech signal with

  6. Maternal depression and the learning-promoting effects of infant-directed speech: Roles of maternal sensitivity, depression diagnosis, and speech acoustic cues.

    Science.gov (United States)

    Kaplan, Peter S; Danko, Christina M; Cejka, Anna M; Everhart, Kevin D

    2015-11-01

    The hypothesis that the associative learning-promoting effects of infant-directed speech (IDS) depend on infants' social experience was tested in a conditioned-attention paradigm with a cumulative sample of 4- to 14-month-old infants. Following six forward pairings of a brief IDS segment and a photographic slide of a smiling female face, infants of clinically depressed mothers exhibited evidence of having acquired significantly weaker voice-face associations than infants of non-depressed mothers. Regression analyses revealed that maternal depression was significantly related to infant learning even after demographic correlates of depression, antidepressant medication use, and extent of pitch modulation in maternal IDS had been taken into account. However, after maternal depression had been accounted for, maternal emotional availability, coded by blind raters from separate play interactions, accounted for significant further increments in the proportion of variance accounted for in infant learning scores. Both maternal depression and maternal insensitivity negatively, and additively, predicted poor learning.

  7. 电教教材建设的实践与思考%Practice and thinking of audio-visual teaching materials construction

    Institute of Scientific and Technical Information of China (English)

    杨宝强; 刘守东; 王莹

    2013-01-01

    电教教材已成为提高教育教学质量、全面实施素质教育的重要手段和培养学生创新精神及实践能力的重要途径。文章系统分析和梳理了空军工程大学电教教材建设理论与实践工作,提出了“着眼复合型人才培养,完善建设体系;发挥人才和技术优势,形成建设合力;实行全流程精细化管理,确保建设质量;推进数字资源共建共享,提高使用效益”的建设策略,对于高等院校探索信息化人才培养,打造精品电教教材有一定的借鉴意义和参考价值。%Audio-visual teaching materials are the important means of improving teaching quality and implementing quality-oriented education .They are also an important way to cultivate students 'innovative spirit and practical ability .This article analyzes the theory and practice of audio-visual teaching material construction at our university .Then it puts forward construction strategies as follows:focusing on the compound-type talents training and improving construction system; giving play to the advantages of talents and technology to form construction power; carrying out whole-process fine management to ensure the quality of construction; and promoting co-construction and sharing of digital resources to enhance using benefits .This study has valuable reference for information professional training and top-quality audio-visual teaching material construction .

  8. The influence of previous environmental history on audio-visual binding occurs during visual-weighted but not auditory-weighted environments.

    Science.gov (United States)

    Wilbiks, Jonathan M P; Dyson, Benjamin J

    2013-01-01

    Although there is substantial evidence for the adjustment of audio-visual binding as a function of the distribution of audio-visual lag, it is not currently clear whether adjustment can take place as a function of task demands. To address this, participants took part in competitive binding paradigms whereby a temporally roving auditory stimulus was assigned to one of two visual anchors (visual-weighted; VAV), or, a temporally roving visual stimulus was assigned to one of two auditory anchors (auditory-weighted; AVA). Using a blocked design it was possible to assess the malleability of audiovisual binding as a function of both the repetition and change of paradigm. VAV performance showed sensitivity to preceding contexts, echoing previous 'repulsive' effects shown in recalibration literature. AVA performance showed no sensitivity to preceding contexts. Despite the use of identical equi-probable temporal distributions in both paradigms, data support the contention that visual contexts may be more sensitive than auditory contexts in being influenced by previous environmental history of temporal events.

  9. Audio-visual speechreading in a group of hearing aid users. The effects of onset age, handicap age, and degree of hearing loss.

    Science.gov (United States)

    Tillberg, I; Rönnberg, J; Svärd, I; Ahlner, B

    1996-01-01

    Speechreading ability was investigated among hearing aid users with different time of onset and different degree of hearing loss. Audio-visual and visual-only performance were assessed. One group of subjects had been hearing-impaired for a large part of their lives, and the impairments appeared early in life. The other group of subjects had been impaired for a fewer number of years, and the impairments appeared later in life. Differences between the groups were obtained. There was no significant difference on the audio-visual test between the groups in spite of the fact that the early onset group scored very poorly auditorily. However, the early-onset group performed significantly better on the visual test. It was concluded that the visual information constituted the dominant coding strategy for the early onset group. An interpretation chiefly in terms of early onset may be the most appropriate, since dB loss variations as such are not related to speechreading skill.

  10. 基于网络平台的汉语视听教材设计%Chinese Audio-visual Teaching Material Design Based on Internet Platform

    Institute of Scientific and Technical Information of China (English)

    徐文婷

    2012-01-01

    In recent twenty years, teaching Chinese as a foreign language has got a rapid development. The international promotion of Chinese has been one of the most important strategies of peaceful development of the country. Although the theory and practice of teaching Chinese as a foreign language have acquired a great achievement, few scholars pay close at- tention to the study of the educational and teaching ideas of audio-visual teaching material, which is a pity. So the present paper focuses on the design principle of audio-visual teaching material.%近二十多年来,我国对外汉语教学事业蓬勃发展,"汉语国际推广"已成为21世纪国家和平发展的重要战略之一。对外汉语教学理论和实践的研究成果颇丰。但是,对"汉语视听教材"的对外汉语教育教学思想的研究状况却与之相形见绌,这不能不说是一个缺憾。因此,本文旨在探讨基于网络的汉语视听教材的设计原则,以期抛砖引玉,引起界内学者的关注与研究。

  11. 试论多媒体网络在英语视听说教学中的作用%On the Function of Multimedia Network in English Audio-Visual Teaching

    Institute of Scientific and Technical Information of China (English)

    陈亚斐; 丰建泉

    2011-01-01

    The paper discusses the important roles of multi-media network in the audio-visual English teaching,namely,it is conducive to the change towards the students-centered initiative mode of study,to the optimum of audio-visual study,and to the improvement of the students' comprehensive ability,especially in audio-visual ability.%本文讨论了多媒体网络在英语视听说教学中的重要作用,视听说教学对"以学生为中心"教学模式的转变有推动作用,有利于优化视听说学习,同时有助于提高学生的综合能力尤其是英语视听说能力。

  12. Impact of audio-visual context upon Incidental Vocabulary Acquisition%视听环境下词汇附带习得的影响

    Institute of Scientific and Technical Information of China (English)

    王毅

    2012-01-01

      有关二语习得领域内的词汇附带习得的研究日益增加,然而,这些研究却很少是在视听环境下进行的。本文以42名英语专业学生为研究目标,测试观看英文原声电影对词汇附带习得的影响。%  :Research on incidental vocabulary acquisition in learning a second language is prosperous recently, however, there is little research conducted in audio-visual context. This paper conducted a research on 42 English-major students’vocabulary incidental acquisition through watching an English movie.

  13. 基础阶段西班牙语视听说课程的教学思考%Thoughts on Basic Spanish Audio-Visual Teaching

    Institute of Scientific and Technical Information of China (English)

    杨洁

    2012-01-01

    As a compulsory course in Spanish major, Spanish audio-visual Course of theFoundation Stage is the supplement and expansion of Intensive Reading Course. By listening to recordings, news, watching DVD,video and other means, this course is able to expose students to the different pronunciations and intonations of many Spanish-speaking countries. Besides, in studying this course the students would have a better understanding of the social and cultural backgrounds and the present development of these countries. It plays an important role not only in helping the students to broaden their horizons, but also in improving their knowledge of the theoretical system. However, clue to the limited foreign materials, there still exist some problems in the Spanish audio-visual teaching. The writer discusses some of her thoughts about this course based on her own teaching experience during these years.%基础阶段的西班牙语视听说课程作为一门专业必修课,是对精读课的补充与扩展,通过听录音、新闻,看影碟、录像等手段,能够使学生接触到众多西语国家不同的语音语调,了解其社会文化背景以及现今发展状态,不仅有助于学生拓宽视野,而且对完善其知识理论体系起到了举足轻重的作用。然而由于外文资料有限,西语视听说课程的教学也存在着一些难题,笔者简单地谈谈对于视听说课程的教学思考。

  14. 关于动画视听语言课程教学改革的探索%Exploration on the Teaching Reform of Animation Audio-Visual Language Course

    Institute of Scientific and Technical Information of China (English)

    殷俊; 张慧

    2015-01-01

    视听语言是动画专业的基础课程,传统的纯理论教学已经不能满足当今社会需求。本文分别从提高视听语言课程教材的专业性,改变学生对视听语言课程的单纯认识,以实践操作丰富传统理论课程等角度探讨教学方法的实践,以期达到提高教学质量,学生完全掌握视听语言知识的目的。%Audio-visual language is a basic course of animation major, but the traditional pure theory teaching can no longer meet the needs of today's society. Respectively from improving the professionalization of audio-visual language curriculum materi-als, changing students' simple understanding of audio-visual language course, and enriching the traditional theory teaching by practical operations, this paper aims to improve the teaching quality and make students completely master audio-visual lan-guage knowledge.

  15. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond.

    Science.gov (United States)

    Narayanan, Shrikanth; Georgiou, Panayiotis G

    2013-02-01

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion.

  16. Audio-Visual Biofeedback Does Not Improve the Reliability of Target Delineation Using Maximum Intensity Projection in 4-Dimensional Computed Tomography Radiation Therapy Planning

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Wei, E-mail: wlu@umm.edu [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States); Neuner, Geoffrey A.; George, Rohini; Wang, Zhendong; Sasor, Sarah [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States); Huang, Xuan [Research and Development, Care Management Department, Johns Hopkins HealthCare LLC, Glen Burnie, Maryland (United States); Regine, William F.; Feigenberg, Steven J.; D' Souza, Warren D. [Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, Maryland (United States)

    2014-01-01

    Purpose: To investigate whether coaching patients' breathing would improve the match between ITV{sub MIP} (internal target volume generated by contouring in the maximum intensity projection scan) and ITV{sub 10} (generated by combining the gross tumor volumes contoured in 10 phases of a 4-dimensional CT [4DCT] scan). Methods and Materials: Eight patients with a thoracic tumor and 5 patients with an abdominal tumor were included in an institutional review board-approved prospective study. Patients underwent 3 4DCT scans with: (1) free breathing (FB); (2) coaching using audio-visual (AV) biofeedback via the Real-Time Position Management system; and (3) coaching via a spirometer system (Active Breathing Coordinator or ABC). One physician contoured all scans to generate the ITV{sub 10} and ITV{sub MIP}. The match between ITV{sub MIP} and ITV{sub 10} was quantitatively assessed with volume ratio, centroid distance, root mean squared distance, and overlap/Dice coefficient. We investigated whether coaching (AV or ABC) or uniform expansions (1, 2, 3, or 5 mm) of ITV{sub MIP} improved the match. Results: Although both AV and ABC coaching techniques improved frequency reproducibility and ABC improved displacement regularity, neither improved the match between ITV{sub MIP} and ITV{sub 10} over FB. On average, ITV{sub MIP} underestimated ITV{sub 10} by 19%, 19%, and 21%, with centroid distance of 1.9, 2.3, and 1.7 mm and Dice coefficient of 0.87, 0.86, and 0.88 for FB, AV, and ABC, respectively. Separate analyses indicated a better match for lung cancers or tumors not adjacent to high-intensity tissues. Uniform expansions of ITV{sub MIP} did not correct for the mismatch between ITV{sub MIP} and ITV{sub 10}. Conclusions: In this pilot study, audio-visual biofeedback did not improve the match between ITV{sub MIP} and ITV{sub 10}. In general, ITV{sub MIP} should be limited to lung cancers, and modification of ITV{sub MIP} in each phase of the 4DCT data set is recommended.

  17. The challenge of reducing scientific complexity for different target groups (without losing the essence) - experiences from interdisciplinary audio-visual media production

    Science.gov (United States)

    Hezel, Bernd; Broschkowski, Ephraim; Kropp, Jürgen

    2013-04-01

    The Climate Media Factory originates from an interdisciplinary media lab run by the Film and Television University "Konrad Wolf" Potsdam-Babelsberg (HFF) and the Potsdam Institute for Climate Impact Research (PIK). Climate scientists, authors, producers and media scholars work together to develop media products on climate change and sustainability. We strive towards communicating scientific content via different media platforms reconciling the communication needs of scientists and the audience's need to understand the complexity of topics that are relevant in their everyday life. By presenting four audio-visual examples, that have been designed for very different target groups, we show (i) the interdisciplinary challenges during the production process and the lessons learnt and (ii) possibilities to reach the required degree of simplification without the need for dumbing down the content. "We know enough about climate change" is a short animated film that was produced for the German Agency for International Cooperation (GIZ) for training programs and conferences on adaptation in the target countries including Indonesia, Tunisia and Mexico. "Earthbook" is a short animation produced for "The Year of Science" to raise awareness for the topics of sustainability among digital natives. "What is Climate Engineering?". Produced for the Institute for Advanced Sustainability Studies (IASS) the film is meant for an informed and interested public. "Wimmelwelt Energie!" is a prototype of an iPad application for children from 4-6 years of age to help them learn about different forms of energy and related greenhouse gas emissions.

  18. Using a three-dimension head mounted displayer in audio-visual sexual stimulation aids in differential diagnosis of psychogenic from organic erectile dysfunction.

    Science.gov (United States)

    Moon, K-H; Song, P-H; Park, T-C

    2005-01-01

    We designed this study to compare the efficacy of using a three-dimension head mounted displayer (3-D HMD) and a conventional monitor in audio-visual sexual stimulation (AVSS) in differential diagnosis of psychogenic from organic erectile dysfunction (ED). Three groups of subjects such as psychogenic ED, organic ED, and healthy control received the evaluation. The change of penile tumescence in AVSS was monitored with Nocturnal Electrobioimpedance Volumetric Assessment and sexual arousal after AVSS was assessed by a simple question as being good, fair, or poor. Both the group of healthy control and psychogenic ED demonstrated a significantly higher rate of normal response in penile tumescence (P<0.05) and a significantly higher level of sexual arousal (P<0.05) if stimulated with 3-D HMD than conventional monitor. In the group of organic ED, even using 3-D HMD in AVSS could not give rise to a better response in both assessments. Therefore, we conclude that using a 3-D HMD in AVSS helps more to differentiate psychogenic from organic ED than a conventional monitor in AVSS.

  19. 翻转课堂在艺术类高校英语视听说教学中的应用%The application of flipped classroom in English audio-visual courses for art colleges

    Institute of Scientific and Technical Information of China (English)

    王莹莹; 孟庆娟

    2016-01-01

    This paper discusses how to apply flipped classroom to English audio-visual courses in art colleges efficiently. The author analyzes the features of flipped classroom theory and the arts majors, and gives some examples of college English audio-visual courses to show the detailed application of flipped classroom for art majors.%本文探讨在艺术类高校的英语教学中如何有效开展翻转课堂活动,笔者分析了翻转课堂的理念和艺术类大学生的特点,并以大学英语视听说课为例,提出翻转课堂在艺术类高校英语教学中的具体应用。

  20. The Present Situation of Teaching and Countermeasure Studies on Japanese Audio-visual-oral Course%日语视听说课程教学现状及对策研究

    Institute of Scientific and Technical Information of China (English)

    糜玲

    2012-01-01

    The Japanese Audio-visual-oral Course aims at promoting students' abilities of listening comprehension and oral Japanese as well as cross-cultural communicative competence. This paper introduces present situation of Japanese teaching of Audio-visual-oral Course and points out the problems of the Course, and discuss how to improve it.%日语视听说课程的开设目的,在于提高学生听说能力和跨文化交际能力。本文主要围绕当前视听说课程教学现状展开,就其中存在的问题以及改善方法进行探讨。

  1. 运用电教手段优化竞技健美操专业教学%Improvement of Sports Aerobics Teaching by Electrical Audio-visual Aids

    Institute of Scientific and Technical Information of China (English)

    赵静

    2011-01-01

    This paper discusses the better effects on teaching methods, teaching course, content of courses, teaching purpose and teaching results by education with electrical audio-visual aids in Sports Aerobics teaching, It provides the basis to the use of electrical audio-visual aids in Sports Aerobics teaching.%文章主要针对在竞技健美操专业课教学中运用电教手段,以达到优化教学方法,优化教学过程、优化教学内容、优化教学目的及优化教学效果等进行阐述,旨在为竞技健美操专业教学过程中合理运用电教手段提供科学依据.

  2. 视听新媒体时代有线电视产业链与竞争格局分析%Cable TV Industry Chain and Competition Pattern Analysis in the Audio-Visual New Media Era

    Institute of Scientific and Technical Information of China (English)

    肖叶飞

    2015-01-01

    With the development of the digital technology and network technology , mobile phone , Pad, com-puter and other intelligent video terminals , emerge endlessly.Mobile TV, internet TV, IPTV, mobile multime-dia TV and other audio-visual new media emerge continuously , the audio-visual new media has the characteris-tics of interaction , video on demand , delay , search , change TV consumption and communication mode .In the audio-visual new media era , the cable TV face opportunities and challenges in such aspects as business , ter-minal, network.Cable TV need multi-screen interactive, web links, transform from the private network to the public network , from one screen services to many screens services , establishing integration industry chain in audio-visual new media era .%随着数字技术与网络技术的发展,手机、Pad、电脑等各种智能化视频终端不断涌现,手机电视、网络电视、IPTV、移动多媒体电视等各种视听新媒体层出不穷,这些视听新媒体具有互动、点播、延缓、搜索的特点,改变了电视的消费方式和传播模式。在视听新媒体时代,有线电视在业务、终端、网络等方面面临机遇与挑战,需要多屏互动、多网链接,从专网向公网、从一屏服务向多屏服务转变,构建视听新媒体时代的融合产业链。

  3. Collection of Digital Audio-visual Material Preservation and Backup Data Transfer%典藏音像资料保存与数字化备份转移

    Institute of Scientific and Technical Information of China (English)

    李浚

    2011-01-01

    According to the audio and video material carrier form, storage media technical features type classification accord- ing to different types of collection, audio-visual materials of corresponding preserving method proposed. In audio and video material carrier storage life could not infinite long cases, and many early video data broadcast devices will be eliminated, causing many valuable audio-visual material will collapse of reality, audio-visual materials need to put forward the urgency views. Finally talk about how video data provide detailed digital transfer methods.%根据音像资料载体形式、存储媒介技术特点进行类型划分,针对不同类型的典藏音像资料提出各种相应的保存方法。在音像资料载体保存期不可能无限长的情况下,以及很多早期音像资料播放设备即将被淘汰,致使许多珍贵声像资料面·临无法使用的现实,为此提出音像资料迫切需要数字化的观点。最后为音像资料怎样数字化转移提供了详细方法

  4. BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

    Directory of Open Access Journals (Sweden)

    A. A. Karpov

    2014-09-01

    Full Text Available We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating all the components for generation of audio, visual and signed speech. The proposed system performs automatic translation of input textual information into speech (audio information and gestures (video information, information fusion and its output in the form of multimedia information. A user can input any grammatically correct text in Russian or Czech languages to the system; it is analyzed by the text processor to detect sentences, words and characters. Then this textual information is converted into symbols of the sign language notation. We apply international «Hamburg Notation System» - HamNoSys, which describes the main differential features of each manual sign: hand shape, hand orientation, place and type of movement. On their basis the 3D signing avatar displays the elements of the sign language. The virtual 3D model of human’s head and upper body has been created using VRML virtual reality modeling language, and it is controlled by the software based on OpenGL graphical library. The developed multimodal synthesis system is a universal one since it is oriented for both regular users and disabled people (in particular, for the hard-of-hearing and visually impaired, and it serves for multimedia output (by audio and visual modalities of input textual information.

  5. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    Science.gov (United States)

    Ogawa, Akitoshi; Bordier, Cecile; Macaluso, Emiliano

    2013-01-01

    The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion). Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround), 3D with monaural sound (3D-Mono), 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG). The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life-like stimuli.

  6. Audio-visual perception of 3D cinematography: an fMRI study using condition-based and computation-based analyses.

    Directory of Open Access Journals (Sweden)

    Akitoshi Ogawa

    Full Text Available The use of naturalistic stimuli to probe sensory functions in the human brain is gaining increasing interest. Previous imaging studies examined brain activity associated with the processing of cinematographic material using both standard "condition-based" designs, as well as "computational" methods based on the extraction of time-varying features of the stimuli (e.g. motion. Here, we exploited both approaches to investigate the neural correlates of complex visual and auditory spatial signals in cinematography. In the first experiment, the participants watched a piece of a commercial movie presented in four blocked conditions: 3D vision with surround sounds (3D-Surround, 3D with monaural sound (3D-Mono, 2D-Surround, and 2D-Mono. In the second experiment, they watched two different segments of the movie both presented continuously in 3D-Surround. The blocked presentation served for standard condition-based analyses, while all datasets were submitted to computation-based analyses. The latter assessed where activity co-varied with visual disparity signals and the complexity of auditory multi-sources signals. The blocked analyses associated 3D viewing with the activation of the dorsal and lateral occipital cortex and superior parietal lobule, while the surround sounds activated the superior and middle temporal gyri (S/MTG. The computation-based analyses revealed the effects of absolute disparity in dorsal occipital and posterior parietal cortices and of disparity gradients in the posterior middle temporal gyrus plus the inferior frontal gyrus. The complexity of the surround sounds was associated with activity in specific sub-regions of S/MTG, even after accounting for changes of sound intensity. These results demonstrate that the processing of naturalistic audio-visual signals entails an extensive set of visual and auditory areas, and that computation-based analyses can track the contribution of complex spatial aspects characterizing such life

  7. Effectiveness of respiratory-gated radiotherapy with audio-visual biofeedback for synchrotron-based scanned heavy-ion beam delivery

    Science.gov (United States)

    He, Pengbo; Li, Qiang; Zhao, Ting; Liu, Xinguo; Dai, Zhongying; Ma, Yuanyuan

    2016-12-01

    A synchrotron-based heavy-ion accelerator operates in pulse mode at a low repetition rate that is comparable to a patient’s breathing rate. To overcome inefficiencies and interplay effects between the residual motion of the target and the scanned heavy-ion beam delivery process for conventional free breathing (FB)-based gating therapy, a novel respiratory guidance method was developed to help patients synchronize their breathing patterns with the synchrotron excitation patterns by performing short breath holds with the aid of personalized audio-visual biofeedback (BFB) system. The purpose of this study was to evaluate the treatment precision, efficiency and reproducibility of the respiratory guidance method in scanned heavy-ion beam delivery mode. Using 96 breathing traces from eight healthy volunteers who were asked to breathe freely and guided to perform short breath holds with the aid of BFB, a series of dedicated four-dimensional dose calculations (4DDC) were performed on a geometric model which was developed assuming a linear relationship between external surrogate and internal tumor motions. The outcome of the 4DDCs was quantified in terms of the treatment time, dose-volume histograms (DVH) and dose homogeneity index. Our results show that with the respiratory guidance method the treatment efficiency increased by a factor of 2.23-3.94 compared with FB gating, depending on the duty cycle settings. The magnitude of dose inhomogeneity for the respiratory guidance methods was 7.5 times less than that of the non-gated irradiation, and good reproducibility of breathing guidance among different fractions was achieved. Thus, our study indicates that the respiratory guidance method not only improved the overall treatment efficiency of respiratory-gated scanned heavy-ion beam delivery, but also had the advantages of lower dose uncertainty and better reproducibility among fractions.

  8. La regulación audiovisual: argumentos a favor y en contra The audio-visual regulation: the arguments for and against

    Directory of Open Access Journals (Sweden)

    Jordi Sopena Palomar

    2008-03-01

    Full Text Available El artículo analiza la efectividad de la regulación audiovisual y valora los diversos argumentos a favor y en contra de la existencia de consejos reguladores a nivel estatal. El debate sobre la necesidad de un organismo de este calado en España todavía persiste. La mayoría de los países comunitarios se han dotado de consejos competentes en esta materia, como es el caso del OFCOM en el Reino Unido o el CSA en Francia. En España, la regulación audiovisual se limita a organismos de alcance autonómico, como son el Consejo Audiovisual de Navarra, el de Andalucía y el Consell de l’Audiovisual de Catalunya (CAC, cuyo modelo también es abordado en este artículo. The article analyzes the effectiveness of the audio-visual regulation and assesses the different arguments for and against the existence of the broadcasting authorities at the state level. The debate of the necessity of a Spanish organism of regulation is still active. Most of the European countries have created some competent authorities, like the OFCOM in United Kingdom and the CSA in France. In Spain, the broadcasting regulation is developed by regional organisms, like the Consejo Audiovisual de Navarra, the Consejo Audiovisual de Andalucía and the Consell de l’Audiovisual de Catalunya (CAC, whose case is also studied in this article.

  9. Audio-visual media,Vitality Source of the Language Classroom%电教媒体,语文课堂活力的源泉

    Institute of Scientific and Technical Information of China (English)

    史坤萍

    2012-01-01

    With the rapid development of information technology,multimedia is gradually into the classroom for language teaching in primary schools has injected new vitality.Using audio-visual media,to stimulate students 'interest in learning;auxiliary breakthrough heavy and difficult;to participate in self-regulation,enhance memory;develop observation and imagination;cultivate a wealth of reading emotions;to enrich students' horizons.In short,involved in language teaching,the use of multimedia teaching methods to stimulate the enthusiasm of the teachers and students,and fully reflects the dominant position of students,and has played a proactive attitude.%随着信息技术日新月异的发展,多媒体正逐步走入课堂,为小学语文课堂教学注入了新的活力。巧用电教媒体,激发学生学习兴趣;辅助突破重难点;参与自我调控,增强记忆;培养观察能力和想象能力;培养丰富的朗读情感;丰富学生的视野。总之,运用多媒体教学手段参与语文课堂教学,有效地激发了师生的积极性,充分体现了学生的主体地位,发挥了主动精神。

  10. The Audio-Visual Man.

    Science.gov (United States)

    Babin, Pierre, Ed.

    A series of twelve essays discuss the use of audiovisuals in religious education. The essays are divided into three sections: one which draws on the ideas of Marshall McLuhan and other educators to explore the newest ideas about audiovisual language and faith, one that describes how to learn and use the new language of audio and visual images, and…

  11. Interactions between distal speech rate, linguistic knowledge, and speech environment.

    Science.gov (United States)

    Morrill, Tuuli; Baese-Berk, Melissa; Heffner, Christopher; Dilley, Laura

    2015-10-01

    During lexical access, listeners use both signal-based and knowledge-based cues, and information from the linguistic context can affect the perception of acoustic speech information. Recent findings suggest that the various cues used in lexical access are implemented with flexibility and may be affected by information from the larger speech context. We conducted 2 experiments to examine effects of a signal-based cue (distal speech rate) and a knowledge-based cue (linguistic structure) on lexical perception. In Experiment 1, we manipulated distal speech rate in utterances where an acoustically ambiguous critical word was either obligatory for the utterance to be syntactically well formed (e.g., Conner knew that bread and butter (are) both in the pantry) or optional (e.g., Don must see the harbor (or) boats). In Experiment 2, we examined identical target utterances as in Experiment 1 but changed the distribution of linguistic structures in the fillers. The results of the 2 experiments demonstrate that speech rate and linguistic knowledge about critical word obligatoriness can both influence speech perception. In addition, it is possible to alter the strength of a signal-based cue by changing information in the speech environment. These results provide support for models of word segmentation that include flexible weighting of signal-based and knowledge-based cues.

  12. Respiratory motion management using audio-visual biofeedback for respiratory-gated radiotherapy of synchrotron-based pulsed heavy-ion beam delivery

    Energy Technology Data Exchange (ETDEWEB)

    He, Pengbo; Ma, Yuanyuan; Huang, Qiyan; Yan, Yuanlin [Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000 (China); Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou 730000 (China); School of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049 (China); Li, Qiang, E-mail: liqiang@impcas.ac.cn; Liu, Xinguo; Dai, Zhongying; Zhao, Ting; Fu, Tingyan; Shen, Guosheng [Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000 (China); Key Laboratory of Heavy Ion Radiation Biology and Medicine of Chinese Academy of Sciences, Lanzhou 730000 (China)

    2014-11-01

    Purpose: To efficiently deliver respiratory-gated radiation during synchrotron-based pulsed heavy-ion radiotherapy, a novel respiratory guidance method combining a personalized audio-visual biofeedback (BFB) system, breath hold (BH), and synchrotron-based gating was designed to help patients synchronize their respiratory patterns with synchrotron pulses and to overcome typical limitations such as low efficiency, residual motion, and discomfort. Methods: In-house software was developed to acquire body surface marker positions and display BFB, gating signals, and real-time beam profiles on a LED screen. Patients were prompted to perform short BHs or short deep breath holds (SDBH) with the aid of BFB following a personalized standard BH/SDBH (stBH/stSDBH) guiding curve or their own representative BH/SDBH (reBH/reSDBH) guiding curve. A practical simulation was performed for a group of 15 volunteers to evaluate the feasibility and effectiveness of this method. Effective dose rates (EDRs), mean absolute errors between the guiding curves and the measured curves, and mean absolute deviations of the measured curves were obtained within 10%–50% duty cycles (DCs) that were synchronized with the synchrotron’s flat-top phase. Results: All maneuvers for an individual volunteer took approximately half an hour, and no one experienced discomfort during the maneuvers. Using the respiratory guidance methods, the magnitude of residual motion was almost ten times less than during nongated irradiation, and increases in the average effective dose rate by factors of 2.39–4.65, 2.39–4.59, 1.73–3.50, and 1.73–3.55 for the stBH, reBH, stSDBH, and reSDBH guiding maneuvers, respectively, were observed in contrast with conventional free breathing-based gated irradiation, depending on the respiratory-gated duty cycle settings. Conclusions: The proposed respiratory guidance method with personalized BFB was confirmed to be feasible in a group of volunteers. Increased effective dose

  13. 传统电视与视听新媒体融合发展路径的选择与拓展%The Selection and Expansion of Traditional TV and Audio-Visual New Media Convergence Development Path

    Institute of Scientific and Technical Information of China (English)

    王长潇

    2011-01-01

    The audio-visual new media has resulted in the diversion of the traditional TV's audience, advertising and personnel resources. In order to continue to maintain dominance in the future media landscape, the traditional TV should integrate audio-visual new media, foster audio-visual new media industries. From the logic of development of network technology and media law of evolution, this paper not only reveals the traditional TV development trend of self-improvement, but also proposes the traditional TV development path of the vertical extension and horizontal integration.%当前,由于传统电视与;觇听新媒体融合发展状况因技术依托和业务倚重而呈现扩散性发展态势,新情况、新问题不断出现,这就造成两者融合发展路径不是简单的终端扩张,也不是纯粹的网点联结,更不是注册域名建个网站,而是系统、全面的转型,首先面,瞄的是发展路径的选择与拓展。论文从网络技术发展逻辑和媒介演进规律出发,揭示了传统电视自我完善发展的必然趋势、内在规律以及核心问题,提出了传统电视与视听新媒体融合发展路径的纵向延伸与横向联合的理论观点。

  14. 浅谈电化教学对传统教学的继承与发展%On the Role of Audio-Visual Teaching in the Inheritance and Development of Traditional Teaching

    Institute of Scientific and Technical Information of China (English)

    汤卓凡

    2011-01-01

    随着信息技术的迅猛发展和教育改革的不断深入,以信息化带动教育的现代化,努力实现我国教育的跨越式发展已经成为必然趋势。作为信息和知识传递工具的电化教学具有交互性、应答式和可控制性等优势,它适应了现代学习者积极探索、终身学习的要求。但是,电化教学在本质上仍然是一种工具,作为一种辅助手段有其局限性,电化教学必须与传统教学优化整合,实现优势互补,才能取得最佳教学效果。%With the rapid development of information technology and the continuing deepening of education reform,it has become an inexorable trend to achieve the goal of Great-leap-forward development of China's education through education moderniz-ation driven by informationization.As a carrier of information and knowledge,audio-visual teaching has several advantages which acclimatize to the requirements of modern learners for active exploration and lifelong learning.However,audio-visual teaching is a just an aid with its limitations.To achieve the best teaching effects,audio-visual teaching should be integrated with traditional teaching to complement one another.

  15. 互动教学法在大学英语视听说教学中的应用%A Study of Application of Interactive Approach into College English Audio-visual and Speaking Teaching

    Institute of Scientific and Technical Information of China (English)

    李之松

    2012-01-01

    Interactive approach is a specific embodiment of the teaching concept of being student-centered at college and it puts an em- phasis on the interaction between teachers and students, among students, and among the teachers and students and the teachingmaterials College English Audio-visual and Speaking Course pays close attention to students'English audio-visual and oral expression ability. By way of exploring the three modes of application of interactive approach into College English Audio-visual and Speaking classroom teach- ing and talking about the advantages and notices of applying the interactive approach into classroom teaching, with the students'real situ- ations taken into account, the paper is trying to search the methods of improving the efficiency of the classroom teaching so as to further promote the students'English viewing,listening and speaking abilities.%互动教学法是高校教学"以学生为中心"理念的一种体现,该教法强调师生互动,生生互动,同时也强调师生与教学内容的互动。大学英语视听说课程重视学生的英语视听和口头表达能力。通过分析与探讨互动教学法中三种互动模式在大学英语视听说课堂教学中的应用,讨论互动教学法应用的优势与注意事项,努力探寻提高大学英语视听说课堂教学效率的方法,进而结合学生的实际有效提高大学生英语视、听、说能力。

  16. Polarizing cues.

    Science.gov (United States)

    Nicholson, Stephen P

    2012-01-01

    People categorize themselves and others, creating ingroup and outgroup distinctions. In American politics, parties constitute the in- and outgroups, and party leaders hold sway in articulating party positions. A party leader's endorsement of a policy can be persuasive, inducing co-partisans to take the same position. In contrast, a party leader's endorsement may polarize opinion, inducing out-party identifiers to take a contrary position. Using survey experiments from the 2008 presidential election, I examine whether in- and out-party candidate cues—John McCain and Barack Obama—affected partisan opinion. The results indicate that in-party leader cues do not persuade but that out-party leader cues polarize. This finding holds in an experiment featuring President Bush in which his endorsement did not persuade Republicans but it polarized Democrats. Lastly, I compare the effect of party leader cues to party label cues. The results suggest that politicians, not parties, function as polarizing cues.

  17. 网络环境下PBL教学法在医学英语视听说教学中的应用%Application of PBL in Audio-visual-oral Medical English under Multimedia Environment

    Institute of Scientific and Technical Information of China (English)

    杨琳; 李广伟; 朱莉莉

    2014-01-01

    网络环境下将PBL教学法应用于医学英语视听说教学中,能够克服这门课程教学中存在的问题。在教学实践中,这种教学模式可以弥补医学英语视听说课程学时的不足、创设真实的学习情境、促进动态的形成性评估体系发展。培养出的医学人才不仅掌握扎实的英语语言基础,而且还具有较强的沟通能力和团队意识,能够真正满足社会需要。%Through the application of PBL in Audio-visual-oral medical English under Multimedia Environment, it was found that the existent problems in the Audio-visual-oral medical English teaching can be overcome. In teaching practice,the new teaching methods could make up the lack of class hours, create real learning situation and develop formative evaluation.So the students not only master English fundamental knowledge, but also have good communication capability and team spirit. They can meet the requirements of the society.

  18. The Schema Features and Aesthetic Functions of the Foreign Language Teaching with Electric Audio-visual Aids%外语电化教学的图式特征与美育功能

    Institute of Scientific and Technical Information of China (English)

    齐欣

    2015-01-01

    外语电化教学对传统外语教学模式提出挑战的同时,其自身也面临着诸多的挑战,需要更多的理论支撑和功能研究。基于图式理论和美育教育,对外语电化教学图式特征及其隐性、感性、个性三种美育功能的创新审视,进一步丰富了外语电化教学的理论基础,并强调了其美育功能实现的必要性。%While the foreign language teaching with electric audio-visual aids brings about challenges to the traditional language teaching,it is also faced with many challenges,and more studies on its theoretical basis and functions are encouraged. On the basis of Schema Theory and aesthetic education,this paper makes an innovative examination of the schema features of foreign language teaching with electric audio-visual aids and its implicit,emotional,and personalized aesthetic functions,further enriches its theoretical basis and emphasizes the necessity of achieving its aesthetic functions.

  19. 网络资源辅助高职英语视听说教学的应用研究%Application and Research of Network Resources for the English Audio-visual Course Auxiliary Teaching in Higher Vocational Colleges

    Institute of Scientific and Technical Information of China (English)

    孙敏

    2016-01-01

    By exploring auxiliary teaching for English audio-visual course in higher vocational education under the applica-tion of network resources, reforming and improving the teaching mode, with the organic combination of network resources and teaching process, with the breakthrough in teaching activity space and time limit, laying emphasis on abilities training for stu-dents' autonomous learning, cooperative learning and inquiry-based learning, paying attention to the coordinated development of students' English audio-visual skills, we definitely get the improvements on the teaching quality and efficiency.%探索应用网络资源辅助高职英语视听说教学,网络资源与教学过程有机融合,改善教学方式,突破教学活动时空限制,提升教学效率与质量,注重培养学生学习兴趣和自主学习能力,注重学生英语视听说技能的协调发展。

  20. 基于网络教学平台的日语视听课策略%Strategy of Japanese Audio-visual Lesson Based on Network Teaching Platform

    Institute of Scientific and Technical Information of China (English)

    梁暹

    2014-01-01

    现代科学技术的发展给语言课教学带来了革命性的变化。电脑的迅速发展、更新也导致其技术被用于现代语言教学中。网络的出现以及迅猛发展更是使教学发生了巨大的变化。基于这种形式下的高级日语视听课的教学策略也相应要适应形势的变化和发展。本文则探讨在网络环境下,如何应用网络教学平台改变日语视听课的教学策略。%The development of modern science and technology has brought a revolutionary change to language teaching. The rapid development of computer technology update also led to its being used in modern language teaching. Advent of the Internet and the rapid development is to make teaching has undergone tremendous changes. Advanced Japanese audio-visual teaching strategies based on lessons under this form corresponding to adapt to changes and developments in the situation. This article will explore the teaching strategies in the network environment, how to apply network teaching platform to change Japanese audio-visual course.

  1. The Application of Audio-visual Media in Junior High School English Teaching%关于初中英语教学中电教手段的应用

    Institute of Scientific and Technical Information of China (English)

    江介香

    2012-01-01

    In junior high school English teaching, applying audio-visual media can motivate students' English learning interest. As a teaching aid, the application of audio-visual media in English classroom is the supplement and development of English classroom teaching, and it is helpful to the improvement of classroom teaching effect and it is of important meaning in cultivating students' comprehensive applying ability.%在初中英语教学中,运用电教手段可以激发学生英语学习的兴趣。作为一种辅助教学手段,电教手段运用于英语课堂中,是对英语课堂教学的补充和发展,有利于提高整体课堂教学效率,对于培养学生英语综合应用能力有着十分重要的意义。

  2. Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus

    Directory of Open Access Journals (Sweden)

    Patterson Eric K

    2002-01-01

    Full Text Available Strides in computer technology and the search for deeper, more powerful techniques in signal processing have brought multimodal research to the forefront in recent years. Audio-visual speech processing has become an important part of this research because it holds great potential for overcoming certain problems of traditional audio-only methods. Difficulties, due to background noise and multiple speakers in an application environment, are significantly reduced by the additional information provided by visual features. This paper presents information on a new audio-visual database, a feature study on moving speakers, and on baseline results for the whole speaker group. Although a few databases have been collected in this area, none has emerged as a standard for comparison. Also, efforts to date have often been limited, focusing on cropped video or stationary speakers. This paper seeks to introduce a challenging audio-visual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The Clemson University Audio-Visual Experiments (CUAVE database is a speaker-independent corpus of both connected and continuous digit strings totaling over 7000 utterances. It contains a wide variety of speakers and is designed to meet several goals discussed in this paper. One of these goals is to allow testing of adverse conditions such as moving talkers and speaker pairs. A feature study of connected digit strings is also discussed. It compares stationary and moving talkers in a speaker-independent grouping. An image-processing-based contour technique, an image transform method, and a deformable template scheme are used in this comparison to obtain visual features. This paper also presents methods and results in an attempt to make these techniques more robust to speaker movement. Finally, initial baseline speaker-independent results are included using all speakers, and conclusions as well as suggested areas of research are

  3. Estimating the relative weights of visual and auditory tau versus heuristic-based cues for time-to-contact judgments in realistic, familiar scenes by older and younger adults.

    Science.gov (United States)

    Keshavarz, Behrang; Campos, Jennifer L; DeLucia, Patricia R; Oberfeld, Daniel

    2017-04-01

    Estimating time to contact (TTC) involves multiple sensory systems, including vision and audition. Previous findings suggested that the ratio of an object's instantaneous optical size/sound intensity to its instantaneous rate of change in optical size/sound intensity (τ) drives TTC judgments. Other evidence has shown that heuristic-based cues are used, including final optical size or final sound pressure level. Most previous studies have used decontextualized and unfamiliar stimuli (e.g., geometric shapes on a blank background). Here we evaluated TTC estimates by using a traffic scene with an approaching vehicle to evaluate the weights of visual and auditory TTC cues under more realistic conditions. Younger (18-39 years) and older (65+ years) participants made TTC estimates in three sensory conditions: visual-only, auditory-only, and audio-visual. Stimuli were presented within an immersive virtual-reality environment, and cue weights were calculated for both visual cues (e.g., visual τ, final optical size) and auditory cues (e.g., auditory τ, final sound pressure level). The results demonstrated the use of visual τ as well as heuristic cues in the visual-only condition. TTC estimates in the auditory-only condition, however, were primarily based on an auditory heuristic cue (final sound pressure level), rather than on auditory τ. In the audio-visual condition, the visual cues dominated overall, with the highest weight being assigned to visual τ by younger adults, and a more equal weighting of visual τ and heuristic cues in older adults. Overall, better characterizing the effects of combined sensory inputs, stimulus characteristics, and age on the cues used to estimate TTC will provide important insights into how these factors may affect everyday behavior.

  4. The effect of audio- visual segregation on the sleep disorder of patients who Check- in ICU%视听觉隔离对ICU患者睡眠障碍的疗效观察

    Institute of Scientific and Technical Information of China (English)

    解军丽; 刁井地; 马昭君; 冯伟龙; 冯伟生

    2011-01-01

    Objective To investigate the effect of audio - visual segregation , a simply nursing methods, on the sleep quantity, quality and structure of patients who Check - in ICU. Methods 75 cases selected patients were randomly divided into observation, through muscle tension, Pittsburgh sleep quality index scale and EEG monitoring three kinds of methods, audio-visual isolated group and control group observation and statistics processing. Results between the two groups, the sleep quality and sleep time has a significant difference (P<0.01), sleep structure of audio - visual segregation group maintained at normal level, but in control group, the period NREM3-4 and REM significant reduced, and there was a significant difference in two groups. Conclusion The method of physical isolation guarantees the quality of sleep ICU patients with normal structure effect significantly, popularization.%目的 研究单纯的视听隔离护理方法对ICU患者睡眠的量、质和结构的影响.方法 对75例入选患者随机分组后,通过肌张力观察法、匹兹堡睡眠质量指数量表和脑电图监测3种方法,对视听隔离组(视听隔离组)和对照组观察观察结果进行统计学处理.结果 两组间睡眠时间、睡眠质量有统计学差异(P<0.01),睡眠结构视听隔离组保持了正常的分期结构,对照组快波睡眠(REM)和慢波睡眠(NREM)3~4期有显著性减少,和视听隔离组有显著性差异.结论 物理隔离的方法在保证ICU患者睡眠的质量和结构方面效果显著,有着较大的应用价值.

  5. On the English Films and TV Programs in English Audio -visual Class%浅析英文影视材料在英语视听说教学中的应用

    Institute of Scientific and Technical Information of China (English)

    邓丽娟

    2011-01-01

    The application of some films and TV programs in English audio - visual class is a popular and effective method in ELT in China. This paper mainly points out the advantages of this method in teaching listening and some suggestions on films and TV programs%采用一些原版英文影视材料进行狈。听说微学,是目前视听说课程中经常使用而且行之有效的一种方法。本文论述了这种方法在英语视听说教学中的优势地位,并重点对电影的选择与运用英文电影的教学设计提出了建议。

  6. 独立学院英语视听说课堂教学模式探索%An Inquiry into the Model of English Audio-Visual Classroom Teaching in Independent College

    Institute of Scientific and Technical Information of China (English)

    刘艳明; 张新坤

    2011-01-01

    本文基于独立学院英语专业学生的特点和英语视听说课程的教学现状,在人本主义和建构主义指导下,构建了一个多元化、个性化、协作化的英语视听说课堂教学模式,并通过具体的教案设计分析了其在教学中的实际应用。%Based on English majors' characteristics and the present situation of English audio-visual lesson in Independent college,this paper puts forward a diversified,personalized and collaborative classroom teaching model and applies it to actual teaching design with the guide of Humanism and Constructivism.

  7. Word segmentation with universal prosodic cues.

    Science.gov (United States)

    Endress, Ansgar D; Hauser, Marc D

    2010-09-01

    When listening to speech from one's native language, words seem to be well separated from one another, like beads on a string. When listening to a foreign language, in contrast, words seem almost impossible to extract, as if there was only one bead on the same string. This contrast reveals that there are language-specific cues to segmentation. The puzzle, however, is that infants must be endowed with a language-independent mechanism for segmentation, as they ultimately solve the segmentation problem for any native language. Here, we approach the acquisition problem by asking whether there are language-independent cues to segmentation that might be available to even adult learners who have already acquired a native language. We show that adult learners recognize words in connected speech when only prosodic cues to word-boundaries are given from languages unfamiliar to the participants. In both artificial and natural speech, adult English speakers, with no prior exposure to the test languages, readily recognized words in natural languages with critically different prosodic patterns, including French, Turkish and Hungarian. We suggest that, even though languages differ in their sound structures, they carry universal prosodic characteristics. Further, these language-invariant prosodic cues provide a universally accessible mechanism for finding words in connected speech. These cues may enable infants to start acquiring words in any language even before they are fine-tuned to the sound structure of their native language.

  8. 形成性评价在大学英语视听说课程中的应用研究%Applicated Study on formative assessment in university English audio-visual-speak class

    Institute of Scientific and Technical Information of China (English)

    赵超; 宋二春; 周小春

    2013-01-01

      随着大学英语教学改革的深入,对评价方法的改革引起越来越多的学者和教师的关注。和传统的终结性评价相比,形成性评价具有重过程,重能力的培养,关注学习个体等优点。由于视听说课程本身的特点,传统的终结性评价无法客观地反映出学生在学习过程中所投入的热情,采取的学习策略,锻炼的语言应用能力,因而引入形成性评价体系对视听说课程的建设和发展具有很大的指导意义。%With the university English teaching reform deeply, more and more scholars and teachers concern on the reform of evaluation methods. Compared with the traditional summative assessment, formative assessment has heavy process, training ability, pay attention to the study of individual etc. Because the characteristic of audio-visual-speak course, traditional summative assessment can not objectively reflect students who put enthusiasm in the process of learning, take learning strategies, train practical ability of language, so the formative evaluation system has great significance in the construction and development of the audio-visual-speak course.

  9. The Role Positioning and Cultivation of Students’ Assistants of Audio-visual Education in Middle School%初中学生电教协助员的角色定位与培养

    Institute of Scientific and Technical Information of China (English)

    刘玉泉

    2014-01-01

    Nowadays, students’assistants of audio-visual education are playing an increasingly important role in the classroom teaching of middle school. They can exert an impact on the whole classroom teaching effect because of the specialness of personnel, time and occasion. By combining the practice for many years,the author of this pa⁃per explores the problems including the selection of assistants, their training and their daily work from the perspec⁃tive of the role positioning of students’assistants of audio-visual education in order to cultivate the assistants, who can play a more significant role in school education and teaching.%在今天的中学课堂教学中,有一个角色发挥着重要作用,这个角色就是学生电教协助员。因人员、时间和场合比较特殊,他们的作用能影响整个课堂教学效果。本文结合多年相关实践,从学生电教协助员角色定位出发,探讨学生电教协助员的选拔、培训和日常工作开展等系列问题,以加大对学生电教协助员的培养力度,使之能在学校教育教学中发挥更大作用。

  10. 教师在基于网络的大学英语视听说教学中的角色定位%On Roles of Teachers in Web-Based College English Audio-Visual and Speaking Teaching

    Institute of Scientific and Technical Information of China (English)

    王辰晖; 杨贤玉

    2012-01-01

    The importance of listening and speaking skills was highlighted in the 2007 version of "College English Curriculum Requirements" released by Ministry of Education,and a teaching mode based on computer and web was also required to be employed by all universities and colleges.For most college English teachers,the teaching mode of web-based college English audio-visual speaking course is the realization of an emerging teaching philosophy.Teachers are supposed to understand their roles in this teaching mode correctly.At the beginning of the college English teaching reform,many English teachers had a misunderstanding of the roles of teacher.Some stuck to the traditional roles,some denied the function of teaching in this new mode totally.In the teaching mode of web-based college English audio-visual speaking course,the roles of teachers are more functional and up-to-date.Teachers are the administrator of the teaching network,the designer of teaching content,the study collaborator of students,the participant in the evaluation and the trainer of language skills.The new roles of teachers require a higher quality of future college English teachers as well.A correct understanding of the roles of teachers in the web-based college English audio-visual and speaking teaching will help to serve the teaching activities and improve the outcomes of teaching in a better and practical way.%2007年教育部发布的《大学英语课程教学要求》突出强调了听说能力的重要性,并要求各高校采用基于计算机和网络的教学模式。在基于网络的大学英语视听说课程的教学模式中,教师的角色与功能更具有时代性与功能性,教师是教学网络的管理者,教学内容的设计者,学生学习的合作者,教学评估的参与者和语言技能的培养者。新的教师角色定位,同时对未来的大学英语教师所应具备的素质也提出了更高的要求。只有正确看待教师在基于网络的大学英语视听说教学

  11. Research on the application of audio-visual-oral cognitive paradigm under net connectivism%网络连接主义视阈下的视听说认知范式

    Institute of Scientific and Technical Information of China (English)

    石小娟

    2011-01-01

    Connectivism offers a new theoretical perspective and constructive framework for studying audio-visual-oral cognitive model. Based on the principle of node and connection as the key factors in forming knowledge network, this paper explores the construction of the interactive information flow of input and output as well as the multi-dimensional network of static and dynamic node resources. Studies show that connectivism can effectively promote learners to connect various internal cognitive nodes and external resources to form an integral cognitive network. The application of web technologies also gives a solid support to multiple assessments of the learning process. Audio-visual-oral cognitive approach under connectivism will provide an applicable language cognitive paradigm featuring that language learning can be autonomous, interactive, connective and communicative in information age.%网络连接主义为英语视听说认知模式研究提供了新的理论阐释视角和构建框架。基于节点和连接是形成知识网络要素的观点,探讨了语言输入一输出互动信息流系统和静态一动态多维节点资源网络系统的构建。研究表明,连接主义能有效地促进学习者连接内部认知节点和外部资源形成整体认知网络,同时网络技术应用为多元化评价学习过程提供了有力支撑,连接主义下的视听说认知模式体现了语言学习自主、互动、连接、交流的特征,将为信息时代语言认知提供一个适用范式。

  12. Temporal visual cues aid speech recognition

    DEFF Research Database (Denmark)

    Zhou, Xiang; Ross, Lars; Lehn-Schiøler, Tue;

    2006-01-01

    that it is the temporal synchronicity of the visual input that aids parsing of the auditory stream. More specifically, we expected that purely temporal information, which does not convey information such as place of articulation may facility word recognition. METHODS: To test this prediction we used temporal features...... of audio to generate an artificial talking-face video and measured word recognition performance on simple monosyllabic words. RESULTS: When presenting words together with the artificial video we find that word recognition is improved over purely auditory presentation. The effect is significant (p...

  13. Promoting smoke-free homes: a novel behavioral intervention using real-time audio-visual feedback on airborne particle levels.

    Directory of Open Access Journals (Sweden)

    Neil E Klepeis

    Full Text Available Interventions are needed to protect the health of children who live with smokers. We pilot-tested a real-time intervention for promoting behavior change in homes that reduces second hand tobacco smoke (SHS levels. The intervention uses a monitor and feedback system to provide immediate auditory and visual signals triggered at defined thresholds of fine particle concentration. Dynamic graphs of real-time particle levels are also shown on a computer screen. We experimentally evaluated the system, field-tested it in homes with smokers, and conducted focus groups to obtain general opinions. Laboratory tests of the monitor demonstrated SHS sensitivity, stability, precision equivalent to at least 1 µg/m(3, and low noise. A linear relationship (R(2 = 0.98 was observed between the monitor and average SHS mass concentrations up to 150 µg/m(3. Focus groups and interviews with intervention participants showed in-home use to be acceptable and feasible. The intervention was evaluated in 3 homes with combined baseline and intervention periods lasting 9 to 15 full days. Two families modified their behavior by opening windows or doors, smoking outdoors, or smoking less. We observed evidence of lower SHS levels in these homes. The remaining household voiced reluctance to changing their smoking activity and did not exhibit lower SHS levels in main smoking areas or clear behavior change; however, family members expressed receptivity to smoking outdoors. This study established the feasibility of the real-time intervention, laying the groundwork for controlled trials with larger sample sizes. Visual and auditory cues may prompt family members to take immediate action to reduce SHS levels. Dynamic graphs of SHS levels may help families make decisions about specific mitigation approaches.

  14. Automatic discrimination between laughter and speech

    NARCIS (Netherlands)

    Truong, K.; Leeuwen, D. van

    2007-01-01

    Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the dev

  15. Real-time speech-driven animation of expressive talking faces

    Science.gov (United States)

    Liu, Jia; You, Mingyu; Chen, Chun; Song, Mingli

    2011-05-01

    In this paper, we present a real-time facial animation system in which speech drives mouth movements and facial expressions synchronously. Considering five basic emotions, a hierarchical structure with an upper layer of emotion classification is established. Based on the recognized emotion label, the under-layer classification at sub-phonemic level has been modelled on the relationship between acoustic features of frames and audio labels in phonemes. Using certain constraint, the predicted emotion labels of speech are adjusted to gain the facial expression labels which are combined with sub-phonemic labels. The combinations are mapped into facial action units (FAUs), and audio-visual synchronized animation with mouth movements and facial expressions is generated by morphing between FAUs. The experimental results demonstrate that the two-layer structure succeeds in both emotion and sub-phonemic classifications, and the synthesized facial sequences reach a comparative convincing quality.

  16. 基于网络平台的学生英语听说能力训练%Research on the Enhancement of Students' English Listening and Speaking Abilities via the Audio-Visual-Speaking System

    Institute of Scientific and Technical Information of China (English)

    戴圣虹

    2012-01-01

    At present, students are very eager to develop their abilities to use English in all-round ways, especially in listening and speaking. Their listening and speaking abilities are practiced via the audio- visual-speaking system of study. This paper reports a survey of non-English majors in Hefei University, from which it is known that this their autonomous study ability as approach can enhance students' listening and speaking abilities, and well.%当前,学生迫切需要提高自己的英语应用能力,尤其是听说能力。利用《大学英语学习系统》对学生进行听说训练,通过对问卷调查、听力和口语测试、网上学习记录等数据进行分析,结果显示:这种训练可以提高学生的英语听说能力,同时也能锻炼学生的自主学习能力。

  17. Means Application and Meaning of Audio-visual Education Programme in the Education of Party School%浅谈电教手段在党校教育中的应用与意义

    Institute of Scientific and Technical Information of China (English)

    钱丽萍

    2009-01-01

    电化教学是利用现代科学技术成果,发展多种能储存、传递声像教育信息的媒体,采用先进的教学方法,控制教学过程的信息,以获得最优的教学效果.针对党校教育面对的群体的特殊性、时代性、实用性,电教手段也有它特殊的应用与意义.%It is to utilize technological achievement of modern science to teach with audiovisual aids, developing many kind can store, transmit the media of the educational information of the audiovideo, adopt the advanced teaching method, control the information of the teaching course,in order to obtain the optimum teaching result. Educate the particularity,era, practicability of the colony faced to the Party school,the audio-visual educa-tion programme means has its special application and meaning too.

  18. The Empirical Research on the Results of Cultural Teaching in the Perspective of Listening and Audio-visual Reform%大学英语视听说教改视域下的文化教学研究

    Institute of Scientific and Technical Information of China (English)

    王一鸣

    2014-01-01

    无论是培养学生的实际运用能力,还是增进学生跨文化交际素质,视听说作为培养学生口语交际能力的一门课程,都具有重要的作用;该课题旨在研究文化教学在视听说课程改革中的教学效果,通过与无文化教学的传统视听说教学方式对比,研究发现融合文化因素的教学方式对学生的口语能力有更大的促进作用,促进作用主要体现在学生口语交际的流利性,准确度和地道性三个方面。%English Listening and speaking course has always been playing a significant role in cultivating students' cross cultural communicative competence as well as practical communication. This paper tends to research on the impacts of cultural teaching compared with the traditional audio-visual teaching method, which shows that the teaching method integrated with cultural fac-tors is prone to have even greater impacts on students oral English and tend to make their speaking more fluent, much more accu-rate and more idiomatic.

  19. 高职英语视听说教材建设的研究与实践%Research and Practice on the Construction of Teaching Materials for Higher Vocational English Audio-Visual-Oral Course

    Institute of Scientific and Technical Information of China (English)

    毕春意

    2014-01-01

    教学的成效首先取决于教的内容,也就是教材。教材建设是高职教学基本建设的重要组成部分,高质量的教材是不断提高教学水平、保障教学质量的基础。目前,市面上的很多视听说教材都不适合高职高专的学生使用,这很不利于教学,因此我们必须有针对性地建设真正合适的教材,才能有效地提高高职学生的英语应用交际能力。%The effectiveness of teaching depends primarily on teaching contents, that is, teaching materials. Teaching material construction is an important part of the capital construction of higher vocational education, as high-quality teaching materials are the foundation to continuously improve the teaching level and protect the teaching quality. At present, a lot of audio-visual-oral teaching materials are not suitable for higher vocational students, which is a disservice to teaching, so we must targetedly construct truly suitable teaching materials, in order to effectively improve higher vocational students' English applied and communicative competence.

  20. Discussion on the Necessity of Applying Audio-visual T-eaching to Physical Education%谈谈学生在体育教学中运用电化教学的必要性

    Institute of Scientific and Technical Information of China (English)

    黄建成

    2013-01-01

    素质教育的全面推行,为学校体育教学的发展提供了一个极好的契机。现在人们不仅把体育作为素质教育的主要内容,而且还把体育作为素质教育的重要手段。在体育课教学中电化教学手段的合理使用,有助于实现体育课的教学目的。%The comprehensive implementation of quality-oriented education has provided an excellent opportunity for school physi-cal education. Nowadays, physical education is regarded as not only a major content but also an important means of quality-ori-ented eduction. A reasonable use of audio-visual teaching means in physical education is conducive to the realization of the teach-ing objectives of physical education.

  1. 基于慕课的高职英语翻转课堂模式探索--以高职英语视听说教学为例%Exploration of Flipped Classroom Model in Higher Vocational College English Teaching based on MOOCs:Take English Audio-Visual and Speaking Teaching in Higher Vocational ;College for Example

    Institute of Scientific and Technical Information of China (English)

    李传瑞

    2015-01-01

    In this paper, based on higher vocational English audio-visual and speaking teaching practice, the author tries to explore the application of lfipped classroom model based on MOOCs in order to improve higher vocational English learning way, improve students’ learning interests and improve the audio-visual and speaking classroom teaching efifciency.%立足高职英语视听说教学实践,探讨基于慕课的翻转课堂教学模式的应用,以期改善高职英语学习方式,提升高职生学习兴趣,提高视听说课堂教学效率。

  2. Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

    Science.gov (United States)

    Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

    2015-07-01

    It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line

  3. Speech Problems

    Science.gov (United States)

    ... of your treatment plan may include seeing a speech therapist , a person who is trained to treat speech disorders. How often you have to see the speech therapist will vary — you'll probably start out seeing ...

  4. A Brief Discussion on How to Effectively Carry out Oral English Activities in Public English Audio-Visual-Oral Classroom in Higher Vocational Colleges%浅谈如何有效开展高职公共英语视听说课堂口语活动

    Institute of Scientific and Technical Information of China (English)

    郑筱筠

    2015-01-01

    高职公共英语视听说课程旨在提高学生的英语交际水平,根据高职学生的特点,在满足高职学生需求的前提下,如何利用有限课堂时间展开一些实用、有效、操作性强的课堂口语活动是视听说教学的研究重点。本文参考建构主义学习理论,探讨了高职公共英语视听说课堂口语活动设计应遵循的原则和相关步骤。%Higher vocational public English audio-visual-oral course is aimed at improving students' English communication a-bility. In accordance with the characteristics of higher vocational students, under the premise of meeting the requirement of higher vocational students, how to use the limited classroom time to car-ry out some practical, effective and operational classroom oral English activities is the research focus of audio-visual-oral teaching. Referring to constructivism learning theory, this paper explores the principles that should be followed and related mea-sures in designing oral English activities in higher vocational public English audio-visual-oral classroom.

  5. Exploring the role of brain oscillations in speech perception in noise: Intelligibility of isochronously retimed speech

    Directory of Open Access Journals (Sweden)

    Vincent Aubanel

    2016-08-01

    Full Text Available A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximise processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioural experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise.

  6. On the New English Audio-visual Teaching Methods based on Network Information Resources%论基于网络信息资源下的新型英语视听教学法

    Institute of Scientific and Technical Information of China (English)

    田丽

    2012-01-01

    当今时代是互联网技术迅猛发展和信息流量日趋扩大的网络信息时代。这个新型的时代所具有的技术的普及性和资讯的全球化,正在给世界各国和各行业提出了前所未有的机遇和挑战。高校教育工作者作为引领人才变革的先驱,也应顺应时代变革的要求,做好参与培养国际竞争高素质人才的准备。本文探索英语视听教学与网络信息资源优势互补的有效途径,对于辅助师生在教学中的信息化意识,推动新型高校教师教育信息化建设都有一定的促进作用。%The present era is the information age of the Internet, the rapid development of technology and its growing network of information flow. Unprecedented opportunities and challenges of this new era the popularity of the technology and the globalization of the information being presented to the world and various industries. College educators as leading personnel changes pioneer, should adapt to changes in the requirements of the times, prepared to participate in the development of international competition in high-quality talent. This article explores the English audio-visual teaching and the complementary advantages of network information resources for secondary teachers and students in teaching information technology awareness, promote education information construction of new university teachers have a certain role in promoting.

  7. New Requirements of Online CET4 for English Audio- Visual-Oral Course%英语四级网考对英语视听说课程的新要求

    Institute of Scientific and Technical Information of China (English)

    王红艳

    2014-01-01

    Required by the "Teaching Requirements" and "Re-form Plan for CET4 and CET6 (trial)"promulgated by the Min-istry of Education, the writer, starting with college E nglish au-dio-visual-oral course, adjusted teaching tasks and made teach-ing reforms to adapt to online CET4, so as to cultivate compre-hensive talents with relatively strong ability of oral expression. The currently new emerging network and multimedia teaching platform can be used to strengthen students' ability of online au-tonomous learning, thus integrating constructivism into computer information technology and constructing a new model of au-tonomous audio-visual-oral learning.%在国家教育部颁布的教学《要求》及《全国大学生四、六级考试改革方案(试行)》的要求下,作者从大学英语视听说课程的角度出发,调整教学任务,为适应四级网考对视听说课程做出教学改革以培养具有较强口语表达能力的综合性人才。利用当下新兴的网络及多媒体教学平台,加强学生的网络自主学习能力,使构建主义和计算机信息技术相结合,建立新的视听说自主学习模式。

  8. Sequential Organization and Room Reverberation for Speech Segregation

    Science.gov (United States)

    2012-02-28

    voiced portions account for about 75-80% of spoken English . Voiced speech is characterized by periodicity (or harmonicity), which has been used as a...onset and offset cues to extract unvoiced speech segments. Acoustic- phonetic features are then used to separate unvoiced speech from nonspeech...estimate is relatively accurate due to weak voiced speech at these frequencies. Based on this analysis and acoustic- phonetic characteristics of

  9. Audio-Visual Classification of Sports Types

    DEFF Research Database (Denmark)

    Gade, Rikke; Abou-Zleikha, Mohamed; Christensen, Mads Græsbøll

    2015-01-01

    In this work we propose a method for classification of sports types from combined audio and visual features ex- tracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modality...... short trajectories are constructed to rep- resent the motion of players. From these, four motion fea- tures are extracted and combined directly with audio fea- tures for classification. A k-nearest neighbour classifier is applied for classification of 180 1-minute video sequences from three sports types...

  10. Audio-visual integration in schizophrenia

    NARCIS (Netherlands)

    Gelder, B.L.M.F. de; Vroomen, J.; Annen, L.; Masthoff, E.D.M.; Hodiamont, P.P.G.

    2003-01-01

    Integration of information provided simultaneously by audition and vision was studied in a group of 18 schizophrenic patients. They were compared to a control group, consisting of 12 normal adults of comparable age and education. By administering two tasks, each focusing on one aspect of audio-visua

  11. Audio-visual integration in schizophrenia.

    NARCIS (Netherlands)

    Gelder, B. de; Vroomen, J.; Annen, L.; Masthof, E.; Hodiamont, P.P.G.

    2003-01-01

    Integration of information provided simultaneously by audition and vision was studied in a group of 18 schizophrenic patients. They were compared to a control group, consisting of 12 normal adults of comparable age and education. By administering two tasks, each focusing on one aspect of audio-visua

  12. P300 audio-visual speller

    Science.gov (United States)

    Belitski, A.; Farquhar, J.; Desain, P.

    2011-04-01

    The Farwell and Donchin matrix speller is well known as one of the highest performing brain-computer interfaces (BCIs) currently available. However, its use of visual stimulation limits its applicability to users with normal eyesight. Alternative BCI spelling systems which rely on non-visual stimulation, e.g. auditory or tactile, tend to perform much more poorly and/or can be very difficult to use. In this paper we present a novel extension of the matrix speller, based on flipping the letter matrix, which allows us to use the same interface for visual, auditory or simultaneous visual and auditory stimuli. In this way we aim to allow users to utilize the best available input modality for their situation, that is use visual + auditory for best performance and move smoothly to purely auditory when necessary, e.g. when disease causes the user's eyesight to deteriorate. Our results on seven healthy subjects demonstrate the effectiveness of this approach, with our modified visual + auditory stimulation slightly out-performing the classic matrix speller. The purely auditory system performance was lower than for visual stimulation, but comparable to other auditory BCI systems.

  13. Acoustic Markers of Prosodic Boundaries in Spanish Spontaneous Alaryngeal Speech

    Science.gov (United States)

    Cuenca, M. H.; Barrio, M. M.

    2010-01-01

    Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy…

  14. Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age.

    Science.gov (United States)

    Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik

    2015-01-01

    Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker's age. Here, we report two experiments on age estimation by "naïve" listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers' natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60-65 years) speakers in comparison with younger (20-25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40-45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed.

  15. Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age

    Directory of Open Access Journals (Sweden)

    Sara eWaller Skoog

    2015-07-01

    Full Text Available Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by naïve listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged and old adults. They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60-65 years speakers in comparison with younger (20-25 years speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40-45 years speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed.

  16. Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age

    Science.gov (United States)

    Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik

    2015-01-01

    Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259

  17. On how the brain decodes vocal cues about speaker confidence.

    Science.gov (United States)

    Jiang, Xiaoming; Pell, Marc D

    2015-05-01

    In speech communication, listeners must accurately decode vocal cues that refer to the speaker's mental state, such as their confidence or 'feeling of knowing'. However, the time course and neural mechanisms associated with online inferences about speaker confidence are unclear. Here, we used event-related potentials (ERPs) to examine the temporal neural dynamics underlying a listener's ability to infer speaker confidence from vocal cues during speech processing. We recorded listeners' real-time brain responses while they evaluated statements wherein the speaker's tone of voice conveyed one of three levels of confidence (confident, close-to-confident, unconfident) or were spoken in a neutral manner. Neural responses time-locked to event onset show that the perceived level of speaker confidence could be differentiated at distinct time points during speech processing: unconfident expressions elicited a weaker P2 than all other expressions of confidence (or neutral-intending utterances), whereas close-to-confident expressions elicited a reduced negative response in the 330-500 msec and 550-740 msec time window. Neutral-intending expressions, which were also perceived as relatively confident, elicited a more delayed, larger sustained positivity than all other expressions in the 980-1270 msec window for this task. These findings provide the first piece of evidence of how quickly the brain responds to vocal cues signifying the extent of a speaker's confidence during online speech comprehension; first, a rough dissociation between unconfident and confident voices occurs as early as 200 msec after speech onset. At a later stage, further differentiation of the exact level of speaker confidence (i.e., close-to-confident, very confident) is evaluated via an inferential system to determine the speaker's meaning under current task settings. These findings extend three-stage models of how vocal emotion cues are processed in speech comprehension (e.g., Schirmer & Kotz, 2006) by

  18. The inhibition of stuttering via the presentation of natural speech and sinusoidal speech analogs.

    Science.gov (United States)

    Saltuklaroglu, Tim; Kalinowski, Joseph

    2006-08-14

    Sensory signals containing speech or gestural (articulatory) information (e.g., choral speech) have repeatedly been found to be highly effective inhibitors of stuttering. Sine wave analogs of speech consist of a trio of changing pure tones representative of formant frequencies. They are otherwise devoid of traditional speech cues, yet have proven to evoke consistent linguistic percepts in listeners. Thus, we investigated the potency of sinusoidal speech for inhibiting stuttering. Ten adults who stutter read while listening to (a) forward-flowing natural speech; (b) forward-flowing sinusoid analogs of natural speech; (c) reversed natural speech; (d) reversed sinusoid analogs of natural speech; and (e) a continuous 1000 Hz pure tone. The levels of stuttering inhibition achieved using the sinusoidal stimuli were potent and not significantly different from those achieved using natural speech (approximately 50% in forward conditions and approximately 25% in the reversed conditions), suggesting that the patterns of undulating pure tones are sufficient to endow sinusoidal sentences with 'quasi-gestural' qualities. These data highlight the sensitivity of a specialized 'phonetic module' for extracting gestural information from sensory stimuli. Stuttering inhibition is thought to occur when perceived gestural information facilitates fluent productions via the engagement of mirror neurons (e.g., in Broca's area), which appear to play a crucial role in our ability to perceive and produce speech.

  19. Plowing Speech

    OpenAIRE

    Zla ba sgrol ma

    2009-01-01

    This file contains a plowing speech and a discussion about the speech This collection presents forty-nine audio files including: several folk song genres; folktales and; local history from the Sman shad Valley of Sde dge county World Oral Literature Project

  20. Speech Indexing

    NARCIS (Netherlands)

    Ordelman, R.J.F.; Jong, de F.M.G.; Leeuwen, van D.A.; Blanken, H.M.; de Vries, A.P.; Blok, H.E.; Feng, L.

    2007-01-01

    This chapter will focus on the automatic extraction of information from the speech in multimedia documents. This approach is often referred to as speech indexing and it can be regarded as a subfield of audio indexing that also incorporates for example the analysis of music and sounds. If the objecti

  1. Speech coding

    Energy Technology Data Exchange (ETDEWEB)

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  2. Multiperson visual focus of attention from head pose and meeting contextual cues.

    Science.gov (United States)

    Ba, Sileye O; Odobez, Jean-Marc

    2011-01-01

    This paper introduces a novel contextual model for the recognition of people's visual focus of attention (VFOA) in meetings from audio-visual perceptual cues. More specifically, instead of independently recognizing the VFOA of each meeting participant from his own head pose, we propose to jointly recognize the participants' visual attention in order to introduce context-dependent interaction models that relate to group activity and the social dynamics of communication. Meeting contextual information is represented by the location of people, conversational events identifying floor holding patterns, and a presentation activity variable. By modeling the interactions between the different contexts and their combined and sometimes contradictory impact on the gazing behavior, our model allows us to handle VFOA recognition in difficult task-based meetings involving artifacts, presentations, and moving people. We validated our model through rigorous evaluation on a publicly available and challenging data set of 12 real meetings (5 hours of data). The results demonstrated that the integration of the presentation and conversation dynamical context using our model can lead to significant performance improvements.

  3. Cue conflicts in context

    DEFF Research Database (Denmark)

    Boeg Thomsen, Ditte; Poulsen, Mads

    2015-01-01

    When learning their first language, children develop strategies for assigning semantic roles to sentence structures, depending on morphosyntactic cues such as case and word order. Traditionally, comprehension experiments have presented transitive clauses in isolation, and crosslinguistically...... in discourse-pragmatically felicitous contexts. Our results extend previous findings of preschoolers’ sensitivity to discourse-contextual cues in sentence comprehension (Hurewitz, 2001; Song & Fisher, 2005) to the basic task of assigning agent and patient roles....

  4. A Virtual Therapist for Speech and Language Therapy.

    Science.gov (United States)

    van Vuuren, Sarel; Cherney, Leora R

    2014-01-01

    A virtual therapist (VT) capable of modeling visible speech and directing speech and language therapy is presented. Three perspectives of practical and clinical use are described. The first is a description of treatment and typical roles that the VT performs in directing participation, practice and performance. The second is a description of techniques for modeling visible speech and implementing tele-rehabilitation. The third is an analysis of performance of a system (AphasiaRx™) for delivering speech and language therapy to people with aphasia, with results presented from a randomized controlled cross-over study in which the VT provided two levels of cuing. Compared to low cue treatment, high cue treatment resulted in 2.3 times faster learning. The paper concludes with a discussion of the benefits of speech and language therapy delivered by the VT.

  5. Infant directed speech and the development of speech perception: enhancing development or an unintended consequence?

    Science.gov (United States)

    McMurray, Bob; Kovack-Lesh, Kristine A; Goodwin, Dresden; McEchron, William

    2013-11-01

    Infant directed speech (IDS) is a speech register characterized by simpler sentences, a slower rate, and more variable prosody. Recent work has implicated it in more subtle aspects of language development. Kuhl et al. (1997) demonstrated that segmental cues for vowels are affected by IDS in a way that may enhance development: the average locations of the extreme "point" vowels (/a/, /i/ and /u/) are further apart in acoustic space. If infants learn speech categories, in part, from the statistical distributions of such cues, these changes may specifically enhance speech category learning. We revisited this by asking (1) if these findings extend to a new cue (Voice Onset Time, a cue for voicing); (2) whether they extend to the interior vowels which are much harder to learn and/or discriminate; and (3) whether these changes may be an unintended phonetic consequence of factors like speaking rate or prosodic changes associated with IDS. Eighteen caregivers were recorded reading a picture book including minimal pairs for voicing (e.g., beach/peach) and a variety of vowels to either an adult or their infant. Acoustic measurements suggested that VOT was different in IDS, but not in a way that necessarily supports better development, and that these changes are almost entirely due to slower rate of speech of IDS. Measurements of the vowel suggested that in addition to changes in the mean, there was also an increase in variance, and statistical modeling suggests that this may counteract the benefit of any expansion of the vowel space. As a whole this suggests that changes in segmental cues associated with IDS may be an unintended by-product of the slower rate of speech and different prosodic structure, and do not necessarily derive from a motivation to enhance development.

  6. Distinctive-feature analyses of the speech of deaf children.

    Science.gov (United States)

    Mencke, E O; Ochsner, G J; Testut, E W

    1985-07-01

    22 children aged 8.5 through 15.5 yrs with HTLs greater than or equal to 90 db in the better ear spoke a carrier phrase before each of 41 monosyllables containing each an initial and a final consonant (23 consonants were represented). Each S repeated the 41-word list 10 times. Speech samples were recorded simultaneously but independently in audio-only and in audio-visual modes, and transcribed by 3 judges using each mode separately. Percent correct speaker-subjects' utterances of target consonants in initial and in final word-positions were scored for presence or absence of distinctive features according to the systems of Chomsky and Halle (1968) and of Fisher and Logemann (1971). Consistently higher correct feature usage was noted for target consonants in the initial rather than in the final word-position for both systems. Further, higher scores were obtained when transcribers could see as well as hear the speaker, but correct usage of a feature was not uniformly a function of the visibility of that feature. Finally, there was no significant increase in correct feature usage as a function of speaker age.

  7. Application and design of audio-visual aids in stomatology teaching cariology, endodontology and operative dentistry in non-stomatology students%直观教学法在非口腔医学专业医学生牙体牙髓病教学中的设计与应用

    Institute of Scientific and Technical Information of China (English)

    倪雪岩; 吕亚林; 曹莹; 臧滔; 董坚; 丁芳; 李若萱

    2014-01-01

    Objective To evaluate the effects of audio-visual aids on stomatology teaching cariology , end-odontology and operative dentistry among non-stomatology students .Methods Totally 77 students from 2010-2011 matriculating classes of the Preventive Medicine Department of Capital Medical University were selected .Di-versified audio-visual aids were used comprehensively in teaching .An examination of theory and a follow-up survey were carried out and analyzed to obtain the feedback of the combined teaching methods .Results The students had better theoretical knowledge of endodontics; mean score was 24.2 ±1.1; questionnaire survey showed that 89.6%(69/77) of students had positive attitude towards the improvement of teaching method .90.9% of the students (70/77) that had audio-visual aids in stomatology teaching had good learning ability .Conclusions Ap-plication of audio-visual aids for stomatology teaching increases the interest in learning and improves the teaching effect.However, the integration should be carefully prepared in combination with cross teaching method and elicit -ation pedagogy in order to accomplish optimistic teaching results .%目的:评价在非口腔医学专业医学生牙体牙髓病教学中设计并实施口腔直观教学法的教学效果。方法以首都医科大学2010、2011级预防医学专业77名学生作为研究对象,授课时综合运用多种直观教学方式与手段,教学结束后,采用理论考核和问卷调查方式评价教学效果,分析学生对口腔直观教学法的评价。结果学生对牙体牙髓病学理论知识掌握较好,平均分为(24.2±1.1)分,问卷调查结果显示,89.6%(69/77)的学生对直观教学法给予肯定。90.9%(70/77)的学生认为应用直观教学法提高了学习能力。结论直观教学法的应用,增强了学习兴趣,提高了教学效果。直观教学法适用于牙体牙髓病学教学,但需要精心设计,将直观教学

  8. Contribution of envelope periodicity to release from speech-on-speech masking

    DEFF Research Database (Denmark)

    Christiansen, Claus; MacDonald, Ewen; Dau, Torsten

    2013-01-01

    Masking release (MR) is the improvement in speech intelligibility for a fluctuating interferer compared to stationary noise. Reduction in MR due to vocoder processing is usually linked to distortions in the temporal fine structure of the stimuli and a corresponding reduction in the fundamental...... frequency (F0) cues. However, it is unclear if envelope periodicity related to F0, produced by the interaction between unresolved harmonics, contributes to MR. In the present study, MR was determined from speech reception thresholds measured in the presence of stationary speech-shaped noise and a competing...

  9. Emotional speech processing at the intersection of prosody and semantics.

    Directory of Open Access Journals (Sweden)

    Rachel Schwartz

    Full Text Available The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task, we compared the relative contributions of processing utterances with single-channel (prosody-only versus multi-channel (prosody and semantic cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing.

  10. Emotional speech processing at the intersection of prosody and semantics.

    Science.gov (United States)

    Schwartz, Rachel; Pell, Marc D

    2012-01-01

    The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information) are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task), we compared the relative contributions of processing utterances with single-channel (prosody-only) versus multi-channel (prosody and semantic) cues on the perception of happy, sad, and angry emotional expressions. Our data show that emotional speech cues produce robust congruency effects on decisions about an emotionally related face target, although no processing advantage occurred when prime stimuli contained multi-channel as opposed to single-channel speech cues. Our data suggest that utterances with prosodic cues alone and utterances with combined prosody and semantic cues both activate knowledge that leads to emotional congruency (priming) effects, but that the convergence of these two information sources does not always heighten access to this knowledge during emotional speech processing.

  11. Reactivity to nicotine cues over repeated cue reactivity sessions.

    Science.gov (United States)

    LaRowe, Steven D; Saladin, Michael E; Carpenter, Matthew J; Upadhyaya, Himanshu P

    2007-12-01

    The present study investigated whether reactivity to nicotine-related cues would attenuate across four experimental sessions held 1 week apart. Participants were nineteen non-treatment seeking, nicotine-dependent males. Cue reactivity sessions were performed in an outpatient research center using in vivo cues consisting of standardized smoking-related paraphernalia (e.g., cigarettes) and neutral comparison paraphernalia (e.g., pencils). Craving ratings were collected before and after both cue presentations while physiological measures (heart rate, skin conductance) were collected before and during the cue presentations. Although craving levels decreased across sessions, smoking-related cues consistently evoked significantly greater increases in craving relative to neutral cues over all four experimental sessions. Skin conductance was higher in response to smoking cues, though this effect was not as robust as that observed for craving. Results suggest that, under the described experimental parameters, craving can be reliably elicited over repeated cue reactivity sessions.

  12. Imparting and (Re-Confirming Order to the World: Authoritative Speech Traditions and Socio-political Assemblies in Spiti, Upper Kinnaur, and Purang in the Past and Present

    Directory of Open Access Journals (Sweden)

    Christian Jahoda

    2016-10-01

    Full Text Available This study focuses on speech traditions and socio-political assemblies in the Tibetan-speaking area of the Spiti Valley in the Northwest Indian state of Himachal Pradesh. Important comparative material is drawn from field research in the adjacent areas of Upper Kinnaur in Himachal Pradesh and from Purang County in the Ngari Prefecture of the Tibet Autonomous Region of China. In accordance with the structural setting, contexts, and functions of these assemblies, where performances of authoritative speeches usually take place, three categories of formal or authoritative speech tradition are identified: those with a primarily state-related political function that occurred in ancient periods, mainly in royal dynastic contexts; the context of community politics, associated mainly with local village contexts in modern times; and, finally, occasions in which mythological and religious functions are foregrounded—settings that may concern either village or monastic Buddhist contexts. Based on the use of written and oral sources (audio-visual recordings made in the field, and the application of social-anthropological and historical methods, selected historical and contemporary examples of such authoritative speech traditions are discussed and analyzed. These include, for example, authoritative speech (molla; mol ba in written Tibetan at a wedding ceremony, and an oracular soliloquy made by the trance-medium of a local protective goddess in Tabo Village in Spiti Valley.

  13. Cues and expressions

    Directory of Open Access Journals (Sweden)

    Thorbjörg Hróarsdóttir

    2005-02-01

    Full Text Available A number of European languages have undergone a change from object-verb to verb-object order. We focus on the change in English and Icelandic, showing that while the structural change was the same, it took place at different times and different ways in the two languages, triggered by different E-language changes. As seen from the English viewpoint, low-level facts of inflection morphology may express the relevant cue for parameters, and so the loss of inflection may lead to a grammar change. This analysis does not carry over to Icelandic, as the loss of OV there took place despite rich case morphology. We aim to show how this can be explained within a cue-style approach, arguing for a universal set of cues. However, the relevant cue may be expressed differently among languages: While it may have been expressed through morphology in English, it as expressed through information structure in Icelandic. In both cases, external effects led to fewer expressions of the relevant (universal cue and a grammar change took place.

  14. Analytic study of the Tadoma method: effects of hand position on segmental speech perception.

    Science.gov (United States)

    Reed, C M; Durlach, N I; Braida, L D; Schultz, M C

    1989-12-01

    In the Tadoma method of communication, deaf-blind individuals receive speech by placing a hand on the face and neck of the talker and monitoring actions associated with speech production. Previous research has documented the speech perception, speech production, and linguistic abilities of highly experienced users of the Tadoma method. The current study was performed to gain further insight into the cues involved in the perception of speech segments through Tadoma. Small-set segmental identification experiments were conducted in which the subjects' access to various types of articulatory information was systematically varied by imposing limitations on the contact of the hand with the face. Results obtained on 3 deaf-blind, highly experienced users of Tadoma were examined in terms of percent-correct scores, information transfer, and reception of speech features for each of sixteen experimental conditions. The results were generally consistent with expectations based on the speech cues assumed to be available in the various hand positions.

  15. Amharic Speech Recognition for Speech Translation

    OpenAIRE

    Melese, Michael; Besacier, Laurent; Meshesha, Million

    2016-01-01

    International audience; The state-of-the-art speech translation can be seen as a cascade of Automatic Speech Recognition, Statistical Machine Translation and Text-To-Speech synthesis. In this study an attempt is made to experiment on Amharic speech recognition for Amharic-English speech translation in tourism domain. Since there is no Amharic speech corpus, we developed a read-speech corpus of 7.43hr in tourism domain. The Amharic speech corpus has been recorded after translating standard Bas...

  16. Hate speech

    Directory of Open Access Journals (Sweden)

    Anne Birgitta Nilsen

    2014-12-01

    Full Text Available The manifesto of the Norwegian terrorist Anders Behring Breivik is based on the “Eurabia” conspiracy theory. This theory is a key starting point for hate speech amongst many right-wing extremists in Europe, but also has ramifications beyond these environments. In brief, proponents of the Eurabia theory claim that Muslims are occupying Europe and destroying Western culture, with the assistance of the EU and European governments. By contrast, members of Al-Qaeda and other extreme Islamists promote the conspiracy theory “the Crusade” in their hate speech directed against the West. Proponents of the latter theory argue that the West is leading a crusade to eradicate Islam and Muslims, a crusade that is similarly facilitated by their governments. This article presents analyses of texts written by right-wing extremists and Muslim extremists in an effort to shed light on how hate speech promulgates conspiracy theories in order to spread hatred and intolerance.The aim of the article is to contribute to a more thorough understanding of hate speech’s nature by applying rhetorical analysis. Rhetorical analysis is chosen because it offers a means of understanding the persuasive power of speech. It is thus a suitable tool to describe how hate speech works to convince and persuade. The concepts from rhetorical theory used in this article are ethos, logos and pathos. The concept of ethos is used to pinpoint factors that contributed to Osama bin Laden's impact, namely factors that lent credibility to his promotion of the conspiracy theory of the Crusade. In particular, Bin Laden projected common sense, good morals and good will towards his audience. He seemed to have coherent and relevant arguments; he appeared to possess moral credibility; and his use of language demonstrated that he wanted the best for his audience.The concept of pathos is used to define hate speech, since hate speech targets its audience's emotions. In hate speech it is the

  17. Audio-visual Feature Fusion Person Identification Based on SVM and Score Normalization%基于SVM和归一化技术的音视频特征融合身份识别

    Institute of Scientific and Technical Information of China (English)

    丁辉; 安今朝

    2012-01-01

    In order to solve the problem of low recognition rate of face recognition and speech recognition under the wicked noise conditions. Based on the studies of feature level fusion theory and combined with Normalization and SVM theory, a novel model for face features and speech features fusion recognition is presented in this paper. First, we extract the face features and speech features correspondingly, then we fuse the two features on the feature level in order to obtain the fusion feature, after the calculation of the distance between the test people and template people we normalize the matching distance so as to reduce the computational and to improve the recognition accuracy. Al the last, we put the normalization matching distance into SVM can we obtain the recognition result. Trie experiment show that the fusion system performs well both in response time and system accuracy especially in noisy background.%针对噪声环境下人脸识别率和说话人识别率低的问题,在研究特征层融合的基础上,结合归一化技术和SVM理论,提出了一种融合人脸和语音的多生物特征识别模型.首先采用离散余弦变换和局部保持投影算法提取人脸特征及SVM方法提取语音特征,在特征层进行融合得到融合特征后,计算测试身份与模板问的距离,为了减少计算量和提高识别性能,对匹配距离进行归一化处理,最后输入到SVM进行识别.仿真结果表明,在噪声环境下,当信噪比降低时,融合识别率要明显高于单个系统的识别率,达到了身份识别的目的.

  18. Speech enhancement

    CERN Document Server

    Benesty, Jacob; Chen, Jingdong

    2006-01-01

    We live in a noisy world! In all applications (telecommunications, hands-free communications, recording, human-machine interfaces, etc.) that require at least one microphone, the signal of interest is usually contaminated by noise and reverberation. As a result, the microphone signal has to be ""cleaned"" with digital signal processing tools before it is played out, transmitted, or stored.This book is about speech enhancement. Different well-known and state-of-the-art methods for noise reduction, with one or multiple microphones, are discussed. By speech enhancement, we mean not only noise red

  19. Performance evaluation of a motor-imagery-based EEG-Brain computer interface using a combined cue with heterogeneous training data in BCI-Naive subjects

    Directory of Open Access Journals (Sweden)

    Lee Youngbum

    2011-10-01

    Full Text Available Abstract Background The subjects in EEG-Brain computer interface (BCI system experience difficulties when attempting to obtain the consistent performance of the actual movement by motor imagery alone. It is necessary to find the optimal conditions and stimuli combinations that affect the performance factors of the EEG-BCI system to guarantee equipment safety and trust through the performance evaluation of using motor imagery characteristics that can be utilized in the EEG-BCI testing environment. Methods The experiment was carried out with 10 experienced subjects and 32 naive subjects on an EEG-BCI system. There were 3 experiments: The experienced homogeneous experiment, the naive homogeneous experiment and the naive heterogeneous experiment. Each experiment was compared in terms of the six audio-visual cue combinations and consisted of 50 trials. The EEG data was classified using the least square linear classifier in case of the naive subjects through the common spatial pattern filter. The accuracy was calculated using the training and test data set. The p-value of the accuracy was obtained through the statistical significance test. Results In the case in which a naive subject was trained by a heterogeneous combined cue and tested by a visual cue, the result was not only the highest accuracy (p Conclusions We propose the use of this measuring methodology of a heterogeneous combined cue for training data and a visual cue for test data by the typical EEG-BCI algorithm on the EEG-BCI system to achieve effectiveness in terms of consistence, stability, cost, time, and resources management without the need for a trial and error process.

  20. Composition: Cue Wheel

    DEFF Research Database (Denmark)

    Bergstrøm-Nielsen, Carl

    2014-01-01

    Cue Rondo is an open composition to be realised by improvising musicians. See more about my composition practise in the entry "Composition - General Introduction". This work is licensed under a Creative Commons "by-nc" License. You may for non-commercial purposes use and distribute it, performance...

  1. Emotional Speech Processing at the Intersection of Prosody and Semantics

    OpenAIRE

    Rachel Schwartz; Pell, Marc D.

    2012-01-01

    The ability to accurately perceive emotions is crucial for effective social interaction. Many questions remain regarding how different sources of emotional cues in speech (e.g., prosody, semantic information) are processed during emotional communication. Using a cross-modal emotional priming paradigm (Facial affect decision task), we compared the relative contributions of processing utterances with single-channel (prosody-only) versus multi-channel (prosody and semantic) cues on the perceptio...

  2. Speech dynamics

    NARCIS (Netherlands)

    Pols, L.C.W.

    2011-01-01

    In order for speech to be informative and communicative, segmental and suprasegmental variation is mandatory. Only this leads to meaningful words and sentences. The building blocks are no stable entities put next to each other (like beads on a string or like printed text), but there are gradual tran

  3. Speech Intelligibility

    Science.gov (United States)

    Brand, Thomas

    Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

  4. An Interaction Between Prosody and Statistics in the Segmentation of Fluent Speech

    Science.gov (United States)

    Shukla, Mohinish; Nespor, Marina; Mehler, Jacques

    2007-01-01

    Sensitivity to prosodic cues might be used to constrain lexical search. Indeed, the prosodic organization of speech is such that words are invariably aligned with phrasal prosodic edges, providing a cue to segmentation. In this paper we devise an experimental paradigm that allows us to investigate the interaction between statistical and prosodic…

  5. A Computational Model of Word Segmentation from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events

    Science.gov (United States)

    Rasanen, Okko

    2011-01-01

    Word segmentation from continuous speech is a difficult task that is faced by human infants when they start to learn their native language. Several studies indicate that infants might use several different cues to solve this problem, including intonation, linguistic stress, and transitional probabilities between subsequent speech sounds. In this…

  6. The Perception of "Sine-Wave Speech" by Adults with Developmental Dyslexia.

    Science.gov (United States)

    Rosner, Burton S.; Talcott, Joel B.; Witton, Caroline; Hogg, James D.; Richardson, Alexandra J.; Hansen, Peter C.; Stein, John F.

    2003-01-01

    "Sine-wave speech" sentences contain only four frequency-modulated sine waves, lacking many acoustic cues present in natural speech. Adults with (n=19) and without (n=14) dyslexia were asked to reproduce orally sine-wave utterances in successive trials. Results suggest comprehension of sine-wave sentences is impaired in some adults with…

  7. Can Prosody Be Used to Discover Hierarchical Structure in Continuous Speech?

    Science.gov (United States)

    Langus, Alan; Marchetto, Erika; Bion, Ricardo Augusto Hoffmann; Nespor, Marina

    2012-01-01

    We tested whether adult listeners can simultaneously keep track of variations in pitch and syllable duration in order to segment continuous speech into phrases and group these phrases into sentences. The speech stream was constructed so that prosodic cues signaled hierarchical structures (i.e., phrases embedded within sentences) and non-adjacent…

  8. Toward a Natural Speech Understanding System

    Science.gov (United States)

    1989-10-01

    Bosshardt, H., & Horman, H. (1982). The influence of suprasegmental information on speech perception of 4 to 6 year old children. Archiv Fur Psychologie...134, 81-104. -hypothesized that children of this stage are not able to use suprasegmental information to integrate a sentence into a single unit. Brokx... suprasegmentals provide cues to linguistic factors such as word stress, syntactic structure, and semantic interpretation. Cosmides, L. (1983). Invariances

  9. Speech communications in noise

    Science.gov (United States)

    1984-07-01

    The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

  10. 《狼图腾》中的“狼图腾”--影片《狼图腾》中“狼”的视听塑造解读%The "Wolf Totem"in the "Wolf Totem"---Audio -visual Shape Interpretation of "Wolf"in the Film "Wolf Totem"

    Institute of Scientific and Technical Information of China (English)

    孙乾蕙

    2015-01-01

    The requirement of the film "Wolf Totem"to the director is quite different from the normal ones.The biggest reason is the important role of "Wolf"film and television performance,the spirit of the wolf,the wolf with the natural fusion and the wolf totem significance is the focus of the film.Based on audio -visual language film and analysis of the paragraph performance of the "Wolf",this paper further expounds the film image of "wolf".%《狼图腾》这部影片对导演的要求异于平常,最大的原因就在于“狼”这个重要角色的影视表现,狼的精神的体现,狼与自然的融合,狼的图腾意义等都是影片表现的重点。文章通过对影片视听语言及“狼”的表现段落的分析,来进一步阐述这部影片对“狼”的形象的塑造。

  11. 78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With... Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications...

  12. How each prosodic boundary cue matters: evidence from german infants.

    Science.gov (United States)

    Wellmann, Caroline; Holzgrefe, Julia; Truckenbrodt, Hubert; Wartenburger, Isabell; Höhle, Barbara

    2012-01-01

    Previous studies have revealed that infants aged 6-10 months are able to use the acoustic correlates of major prosodic boundaries, that is, pitch change, preboundary lengthening, and pause, for the segmentation of the continuous speech signal. Moreover, investigations with American-English- and Dutch-learning infants suggest that processing prosodic boundary markings involves a weighting of these cues. This weighting seems to develop with increasing exposure to the native language and to underlie crosslinguistic variation. In the following, we report the results of four experiments using the headturn preference procedure to explore the perception of prosodic boundary cues in German infants. We presented 8-month-old infants with a sequence of names in two different prosodic groupings, with or without boundary markers. Infants discriminated both sequences when the boundary was marked by all three cues (Experiment 1) and when it was marked by a pitch change and preboundary lengthening in combination (Experiment 2). The presence of a pitch change (Experiment 3) or preboundary lengthening (Experiment 4) as single cues did not lead to a successful discrimination. Our results indicate that pause is not a necessary cue for German infants. Pitch change and preboundary lengthening in combination, but not as single cues, are sufficient. Hence, by 8 months infants only rely on a convergence of boundary markers. Comparisons with adults' performance on the same stimulus materials suggest that the pattern observed with the 8-month-olds is already consistent with that of adults. We discuss our findings with respect to crosslinguistic variation and the development of a language-specific prosodic cue weighting.

  13. How each prosodic boundary cue matters: Evidence from German infants

    Directory of Open Access Journals (Sweden)

    Caroline eWellmann

    2012-12-01

    Full Text Available Previous studies have revealed that infants aged six to ten months are able to use the acoustic correlates of major prosodic boundaries, that is, pitch change, preboundary lengthening, and pause, for the segmentation of the continuous speech signal. Moreover, investigations with American-English- and Dutch-learning infants suggest that processing prosodic boundary markings involves a weighting of these cues. This weighting seems to develop with increasing exposure to the native language and to underlie crosslinguistic variation. In the following, we report the results of four experiments using the headturn preference procedure to explore the perception of prosodic boundary cues in German infants. We presented eight-month-old infants with a sequence of names in two different prosodic groupings, with or without boundary markers. Infants discriminated both sequences when the boundary was marked by all three cues (Experiment 1 and when it was marked by a pitch change and preboundary lengthening in combination (Experiment 2. The presence of a pitch change (Experiment 3 or preboundary lengthening (Experiment 4 as single cues did not lead to a successful discrimination. Our results indicate that pause is not a necessary cue for German infants. Pitch and preboundary lengthening in combination, but not as single cues, are sufficient. Hence, by eight months infants only rely on a convergence of boundary markers. Comparisons with adults’ performance on the same stimulus materials suggest that the pattern observed with the eight-month-olds is already consistent with that of adults. We discuss our findings with respect to crosslinguistic variation and the development of a language-specific prosodic cue weighting.

  14. Going to a Speech Therapist

    Science.gov (United States)

    ... Video: Getting an X-ray Going to a Speech Therapist KidsHealth > For Kids > Going to a Speech Therapist ... therapists (also called speech-language pathologists ). What Do Speech Therapists Help With? Speech therapists help people of all ...

  15. Weighting of Acoustic Cues to a Manner Distinction by Children with and without Hearing Loss

    Science.gov (United States)

    Nittrouer, Susan; Lowenstein, Joanna H.

    2015-01-01

    Purpose: Children must develop optimal perceptual weighting strategies for processing speech in their first language. Hearing loss can interfere with that development, especially if cochlear implants are required. The three goals of this study were to measure, for children with and without hearing loss: (a) cue weighting for a manner distinction,…

  16. Cross-Linguistic Differences in Prosodic Cues to Syntactic Disambiguation in German and English

    Science.gov (United States)

    O'Brien, Mary Grantham; Jackson, Carrie N.; Gardner, Christine E.

    2014-01-01

    This study examined whether late-learning English-German second language (L2) learners and late-learning German-English L2 learners use prosodic cues to disambiguate temporarily ambiguous first language and L2 sentences during speech production. Experiments 1a and 1b showed that English-German L2 learners and German-English L2 learners used a…

  17. Influences of Semantic and Prosodic Cues on Word Repetition and Categorization in Autism

    Science.gov (United States)

    Singh, Leher; Harrow, MariLouise S.

    2014-01-01

    Purpose: To investigate sensitivity to prosodic and semantic cues to emotion in individuals with high-functioning autism (HFA). Method: Emotional prosody and semantics were independently manipulated to assess the relative influence of prosody versus semantics on speech processing. A sample of 10-year-old typically developing children (n = 10) and…

  18. Visual Speech Perception in Children with Language Learning Impairments

    Science.gov (United States)

    Knowland, Victoria C. P.; Evans, Sam; Snell, Caroline; Rosen, Stuart

    2016-01-01

    Purpose: The purpose of the study was to assess the ability of children with developmental language learning impairments (LLIs) to use visual speech cues from the talking face. Method: In this cross-sectional study, 41 typically developing children (mean age: 8 years 0 months, range: 4 years 5 months to 11 years 10 months) and 27 children with…

  19. Spoken Word Recognition of Chinese Words in Continuous Speech

    Science.gov (United States)

    Yip, Michael C. W.

    2015-01-01

    The present study examined the role of positional probability of syllables played in recognition of spoken word in continuous Cantonese speech. Because some sounds occur more frequently at the beginning position or ending position of Cantonese syllables than the others, so these kinds of probabilistic information of syllables may cue the locations…

  20. Speech research

    Science.gov (United States)

    1992-06-01

    Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.

  1. Sound frequency affects speech emotion perception: Results from congenital amusia

    Directory of Open Access Journals (Sweden)

    Sydney eLolli

    2015-09-01

    Full Text Available Congenital amusics, or tone-deaf individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying band-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody (MBEP were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under band-pass and unfiltered speech conditions. Results showed a significant correlation between pitch discrimination threshold and emotion identification accuracy for band-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold > 16 Hz performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between band-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation.

  2. Secure access to patient's health records using SpeechXRays a mutli-channel biometrics platform for user authentication.

    Science.gov (United States)

    Spanakis, Emmanouil G; Spanakis, Marios; Karantanas, Apostolos; Marias, Kostas

    2016-08-01

    The most commonly used method for user authentication in ICT services or systems is the application of identification tools such as passwords or personal identification numbers (PINs). The rapid development in ICT technology regarding smart devices (laptops, tablets and smartphones) has allowed also the advance of hardware components that capture several biometric traits such as fingerprints and voice. These components are aiming among others to overcome weaknesses and flaws of password usage under the prism of improved user authentication with higher level of security, privacy and usability. To this respect, the potential application of biometrics for secure user authentication regarding access in systems with sensitive data (i.e. patient's data from electronic health records) shows great potentials. SpeechXRays aims to provide a user recognition platform based on biometrics of voice acoustics analysis and audio-visual identity verification. Among others, the platform aims to be applied as an authentication tool for medical personnel in order to gain specific access to patient's electronic health records. In this work a short description of SpeechXrays implementation tool regarding eHealth is provided and analyzed. This study explores security and privacy issues, and offers a comprehensive overview of biometrics technology applications in addressing the e-Health security challenges. We present and describe the necessary requirement for an eHealth platform concerning biometric security.

  3. Cortical tracking of hierarchical linguistic structures in connected speech.

    Science.gov (United States)

    Ding, Nai; Melloni, Lucia; Zhang, Hang; Tian, Xing; Poeppel, David

    2016-01-01

    The most critical attribute of human language is its unbounded combinatorial nature: smaller elements can be combined into larger structures on the basis of a grammatical system, resulting in a hierarchy of linguistic units, such as words, phrases and sentences. Mentally parsing and representing such structures, however, poses challenges for speech comprehension. In speech, hierarchical linguistic structures do not have boundaries that are clearly defined by acoustic cues and must therefore be internally and incrementally constructed during comprehension. We found that, during listening to connected speech, cortical activity of different timescales concurrently tracked the time course of abstract linguistic structures at different hierarchical levels, such as words, phrases and sentences. Notably, the neural tracking of hierarchical linguistic structures was dissociated from the encoding of acoustic cues and from the predictability of incoming words. Our results indicate that a hierarchy of neural processing timescales underlies grammar-based internal construction of hierarchical linguistic structure.

  4. Speech production, Psychology of

    NARCIS (Netherlands)

    Schriefers, H.J.; Vigliocco, G.

    2015-01-01

    Research on speech production investigates the cognitive processes involved in transforming thoughts into speech. This article starts with a discussion of the methodological issues inherent to research in speech production that illustrates how empirical approaches to speech production must differ fr

  5. Perception of aircraft Deviation Cues

    Science.gov (United States)

    Martin, Lynne; Azuma, Ronald; Fox, Jason; Verma, Savita; Lozito, Sandra

    2005-01-01

    To begin to address the need for new displays, required by a future airspace concept to support new roles that will be assigned to flight crews, a study of potentially informative display cues was undertaken. Two cues were tested on a simple plan display - aircraft trajectory and flight corridor. Of particular interest was the speed and accuracy with which participants could detect an aircraft deviating outside its flight corridor. Presence of the trajectory cue significantly reduced participant reaction time to a deviation while the flight corridor cue did not. Although non-significant, the flight corridor cue seemed to have a relationship with the accuracy of participants judgments rather than their speed. As this is the second of a series of studies, these issues will be addressed further in future studies.

  6. Dog-directed speech: why do we use it and do dogs pay attention to it?

    Science.gov (United States)

    Ben-Aderet, Tobey; Gallego-Abenza, Mario; Reby, David; Mathevon, Nicolas

    2017-01-11

    Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners.

  7. Speech Enhancement

    DEFF Research Database (Denmark)

    Benesty, Jacob; Jensen, Jesper Rindom; Christensen, Mads Græsbøll;

    of methods and have been introduced in somewhat different contexts. Linear filtering methods originate in stochastic processes, while subspace methods have largely been based on developments in numerical linear algebra and matrix approximation theory. This book bridges the gap between these two classes......Speech enhancement is a classical problem in signal processing, yet still largely unsolved. Two of the conventional approaches for solving this problem are linear filtering, like the classical Wiener filter, and subspace methods. These approaches have traditionally been treated as different classes...... of methods by showing how the ideas behind subspace methods can be incorporated into traditional linear filtering. In the context of subspace methods, the enhancement problem can then be seen as a classical linear filter design problem. This means that various solutions can more easily be compared...

  8. Sound of mind : electrophysiological and behavioural evidence for the role of context, variation and informativity in human speech processing

    NARCIS (Netherlands)

    Nixon, Jessie Sophia

    2014-01-01

    Spoken communication involves transmission of a message which takes physical form in acoustic waves. Within any given language, acoustic cues pattern in language-specific ways along language-specific acoustic dimensions to create speech sound contrasts. These cues are utilized by listeners to discri

  9. The school-based speech-language therapist: choosing multicultural texts.

    Science.gov (United States)

    Moodley, Saloshni; Chetty, Sandhya; Pahl, Jenny

    2005-01-01

    School-based speech-language therapists have a pivotal role in the transformation of education as directed by current education policy. The Revised National Curriculum Statement, for example, foregrounds a multicultural perspective in education, which impacts on the choice of Learning and Teaching Support Materials. Inappropriate support materials could create barriers to learning. Folktales were selected as an example of multicultural Learning and Teaching Support Materials. The responses of 10-year-old mainstream learners to five folktales reflecting a diversity of cultures were explored. Five girls and five boys in Grade 5 participated in the study, which was conducted in three phases. A questionnaire, a focus group interview, and audio-visual recordings were used to gather data. The qualitative method of constant comparison was used to analyse emerging themes. Five main themes were identified. Findings revealed that some participants responded most positively when folktales reflected their culture, gender, or physical characteristics. Participants' views on less familiar cultures were influenced by the mass media. The results highlighted the importance of the text as 'mirror' and as 'window'. The potential of folktales as multicultural Learning and Teaching Support Materials, the powerful influence of the educator on learners' responses, and the need for an anti-bias approach within education are discussed. Implications for future research and practice are highlighted.

  10. Speech therapy with obturator.

    Science.gov (United States)

    Shyammohan, A; Sreenivasulu, D

    2010-12-01

    Rehabilitation of speech is tantamount to closure of defect in cases with velopharyngeal insufficiency. Often the importance of speech therapy is sidelined during the fabrication of obturators. Usually the speech part is taken up only at a later stage and is relegated entirely to a speech therapist without the active involvement of the prosthodontist. The article suggests a protocol for speech therapy in such cases to be done in unison with a prosthodontist.

  11. Markers of Deception in Italian Speech

    Directory of Open Access Journals (Sweden)

    Katelyn eSpence

    2012-10-01

    Full Text Available Lying is a universal activity and the detection of lying a universal concern. Presently, there is great interest in determining objective measures of deception. The examination of speech, in particular, holds promise in this regard; yet, most of what we know about the relationship between speech and lying is based on the assessment of English-speaking participants. Few studies have examined indicators of deception in languages other than English. The world’s languages differ in significant ways, and cross-linguistic studies of deceptive communications are a research imperative. Here we review some of these differences amongst the world’s languages, and provide an overview of a number of recent studies demonstrating that cross-linguistic research is a worthwhile endeavour. In addition, we report the results of an empirical investigation of pitch, response latency, and speech rate as cues to deception in Italian speech. True and false opinions were elicited in an audio-taped interview. A within subjects analysis revealed no significant difference between the average pitch of the two conditions; however, speech rate was significantly slower, while response latency was longer, during deception compared with truth-telling. We explore the implications of these findings and propose directions for future research, with the aim of expanding the cross-linguistic branch of research on markers of deception.

  12. Music and speech prosody: a common rhythm.

    Science.gov (United States)

    Hausen, Maija; Torppa, Ritva; Salmela, Viljami R; Vainio, Martti; Särkämö, Teppo

    2013-01-01

    Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress).

  13. Music and speech prosody: A common rhythm

    Directory of Open Access Journals (Sweden)

    Maija eHausen

    2013-09-01

    Full Text Available Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61 using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress.

  14. Speech Repairs, Intonational Boundaries and Discourse Markers Modeling Speakers' Utterances in Spoken Dialog

    CERN Document Server

    Heeman, P A

    1999-01-01

    In this thesis, we present a statistical language model for resolving speech repairs, intonational boundaries and discourse markers. Rather than finding the best word interpretation for an acoustic signal, we redefine the speech recognition problem to so that it also identifies the POS tags, discourse markers, speech repairs and intonational phrase endings (a major cue in determining utterance units). Adding these extra elements to the speech recognition problem actually allows it to better predict the words involved, since we are able to make use of the predictions of boundary tones, discourse markers and speech repairs to better account for what word will occur next. Furthermore, we can take advantage of acoustic information, such as silence information, which tends to co-occur with speech repairs and intonational phrase endings, that current language models can only regard as noise in the acoustic signal. The output of this language model is a much fuller account of the speaker's turn, with part-of-speech ...

  15. Neural oscillations carry speech rhythm through to comprehension

    Directory of Open Access Journals (Sweden)

    Jonathan E Peelle

    2012-09-01

    Full Text Available A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging—particularly electroencephalography (EEG and magnetoencephalography (MEG—point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and on segment perception (i.e., that the perception of phonemes and words in connected speech are influenced by preceding speech rate. Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in additional recruitment of left hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain.

  16. “多维度、模块化”英语视听说课程教学模式改革研究%On the Multi-Dimensions and Models of English Major′s Audio-Visual Course

    Institute of Scientific and Technical Information of China (English)

    汤琳

    2016-01-01

    Based on the multi-dimensions and models,the English Major′s "Audio-Visual Course"teach-ing models will be a breakthrough comparing to the traditional video course teaching ideology,which will divide the students′learning time into extra-and-in classroom two parts,meanwhile with the help of the controlled and semi-controlled learning task and the group learning,extend the listening-watching learning method into "listening,speaking,reading,writing"four skills learning in practice at the same time.With the group learning study strategy,it will achieve the efficiency of the classroom learning and broaden the ways of language input.Furthermore,the teacher′s role and evaluation principles are also changed.%“多维度、模块化”的英语视听说课堂教学模式,突破传统课堂教学观念,把学生的学习时间延伸为课内和课外两个模块,把传统唯“视听”训练拓展为“听说读写”四个维度技能训练,根据不同的教学内容,以半控制型学习任务为导向,强化以学习小组形式的自主学习,提高课堂效率,同时进行新的学习内容导入。在这一课堂教学模式下,同时改进传统教师角色和课程评价方式。

  17. 城市综合大学项目融入式“英语视听说”课程教学模式探讨%On the Teaching Mode of Project - oriented English Audio - visual Course in Comprehensive City University

    Institute of Scientific and Technical Information of China (English)

    李萍

    2012-01-01

    经过定性与定量研究,课题组提出与城市综合大学城市应用人才培养定位相匹配的项目融入式“英语视听说”课程教学模式。将课程教学模式从知识传授的教师中心模式转化为知识技能掌握并重的学生中心模式,突出教师引导下的学生自我经历、自我发现、自我反思和自我构建。强调通过“行而知、融项目、强能力、重外传”,提高学生为城市经济文化发展提供国际交流与传播的英语应用服务交际能力。%Through a number of qualitative and quantitative researches, researchers have put forward the project - integrated teaching mode for English audio - visual course in comprehensive city university, which meets the demand of the cultivation of the talents. The new mode switches the traditional knowledge- focused and teacher- centered teaching modes into a knowledge- skill- focused and student- centered teaching mode. It lays great emphasis on teacher- guided self- experience, self- discovery, self- reflection and self- construction by the stu- dents. It is hoped that through "learning by doing, participating in doing course - integrated projects, cultivating practical abilities, giving priority to international communication ability", students are able to benefit from this course and gain the needed communication competence to provide service for urban economic and cultural development as well as international exchange and communication.

  18. Orienting asymmetries in dogs' responses to different communicatory components of human speech.

    Science.gov (United States)

    Ratcliffe, Victoria F; Reby, David

    2014-12-15

    It is well established that in human speech perception the left hemisphere (LH) of the brain is specialized for processing intelligible phonemic (segmental) content (e.g., [1-3]), whereas the right hemisphere (RH) is more sensitive to prosodic (suprasegmental) cues. Despite evidence that a range of mammal species show LH specialization when processing conspecific vocalizations, the presence of hemispheric biases in domesticated animals' responses to the communicative components of human speech has never been investigated. Human speech is familiar and relevant to domestic dogs (Canis familiaris), who are known to perceive both segmental phonemic cues and suprasegmental speaker-related and emotional prosodic cues. Using the head-orienting paradigm, we presented dogs with manipulated speech and tones differing in segmental or suprasegmental content and recorded their orienting responses. We found that dogs showed a significant LH bias when presented with a familiar spoken command in which the salience of meaningful phonemic (segmental) cues was artificially increased but a significant RH bias in response to commands in which the salience of intonational or speaker-related (suprasegmental) vocal cues was increased. Our results provide insights into mechanisms of interspecific vocal perception in a domesticated mammal and suggest that dogs may share ancestral or convergent hemispheric specializations for processing the different functional communicative components of speech with human listeners.

  19. Delayed Speech or Language Development

    Science.gov (United States)

    ... to 2-Year-Old Delayed Speech or Language Development KidsHealth > For Parents > Delayed Speech or Language Development ... child is right on schedule. Normal Speech & Language Development It's important to discuss early speech and language ...

  20. Evaluation of multimodal ground cues

    DEFF Research Database (Denmark)

    Nordahl, Rolf; Lecuyer, Anatole; Serafin, Stefania

    2012-01-01

    This chapter presents an array of results on the perception of ground surfaces via multiple sensory modalities,with special attention to non visual perceptual cues, notably those arising from audition and haptics, as well as interactions between them. It also reviews approaches to combining synth...... synthetic multimodal cues, from vision, haptics, and audition, in order to realize virtual experiences of walking on simulated ground surfaces or other features....

  1. Mistaking minds and machines: How speech affects dehumanization and anthropomorphism.

    Science.gov (United States)

    Schroeder, Juliana; Epley, Nicholas

    2016-11-01

    Treating a human mind like a machine is an essential component of dehumanization, whereas attributing a humanlike mind to a machine is an essential component of anthropomorphism. Here we tested how a cue closely connected to a person's actual mental experience-a humanlike voice-affects the likelihood of mistaking a person for a machine, or a machine for a person. We predicted that paralinguistic cues in speech are particularly likely to convey the presence of a humanlike mind, such that removing voice from communication (leaving only text) would increase the likelihood of mistaking the text's creator for a machine. Conversely, adding voice to a computer-generated script (resulting in speech) would increase the likelihood of mistaking the text's creator for a human. Four experiments confirmed these hypotheses, demonstrating that people are more likely to infer a human (vs. computer) creator when they hear a voice expressing thoughts than when they read the same thoughts in text. Adding human visual cues to text (i.e., seeing a person perform a script in a subtitled video clip), did not increase the likelihood of inferring a human creator compared with only reading text, suggesting that defining features of personhood may be conveyed more clearly in speech (Experiments 1 and 2). Removing the naturalistic paralinguistic cues that convey humanlike capacity for thinking and feeling, such as varied pace and intonation, eliminates the humanizing effect of speech (Experiment 4). We discuss implications for dehumanizing others through text-based media, and for anthropomorphizing machines through speech-based media. (PsycINFO Database Record

  2. 78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...

    Science.gov (United States)

    2013-08-15

    ... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With... this document, the Commission amends telecommunications relay services (TRS) mandatory...

  3. Audio-Visual Equipment Depreciation. RDU-75-07.

    Science.gov (United States)

    Drake, Miriam A.; Baker, Martha

    A study was conducted at Purdue University to gather operational and budgetary planning data for the Libraries and Audiovisual Center. The objectives were: (1) to complete a current inventory of equipment including year of purchase, costs, and salvage value; (2) to determine useful life data for general classes of equipment; and (3) to determine…

  4. A Joint Audio-Visual Approach to Audio Localization

    DEFF Research Database (Denmark)

    Jensen, Jesper Rindom; Christensen, Mads Græsbøll

    2015-01-01

    Localization of audio sources is an important research problem, e.g., to facilitate noise reduction. In the recent years, the problem has been tackled using distributed microphone arrays (DMA). A common approach is to apply direction-of-arrival (DOA) estimation on each array (denoted as nodes...

  5. Audio-visual content structuring for automatic summarization

    OpenAIRE

    Rouvier, Mickael

    2011-01-01

    These last years, with the advent of sites such as Youtube, Dailymotion or Blip TV, the number of videos available on the Internet has increased considerably. The size and their lack of structure of these collections limit access to the contents. Sum- marization is one way to produce snippets that extract the essential content and present it as concisely as possible.In this work, we focus on extraction methods for video summary, based on au- dio analysis. We treat various scientific problems ...

  6. Preattentive processing of audio-visual emotional signals

    DEFF Research Database (Denmark)

    Föcker, J.; Gondan, Matthias; Röder, B.

    2011-01-01

    Previous research has shown that redundant information in faces and voices leads to faster emotional categorization compared to incongruent emotional information even when attending to only one modality. The aim of the present study was to test whether these crossmodal effects are predominantly d...

  7. Audio-visual interactions in product sound design

    NARCIS (Netherlands)

    Özcan, E.; Van Egmond, R.

    2010-01-01

    Consistent product experience requires congruity between product properties such as visual appearance and sound. Therefore, for designing appropriate product sounds by manipulating their spectral-temporal structure, product sounds should preferably not be considered in isolation but as an integral p

  8. Utilization of audio-visual aids by family welfare workers.

    Science.gov (United States)

    Naik, V R; Jain, P K; Sharma, B B

    1977-01-01

    Communication efforts have been an important component of the Indian Family Planning Welfare Program since its inception. However, its chief interests in its early years were clinical, until the adoption of the extension approach in 1963. Educational materials were developed, especially in the period 1965-8, to fit mass, group meeting and home visit approaches. Audiovisual aids were developed for use by extension workers, who had previously relied entirely on verbal approaches. This paper examines their use. A questionnaire was designed for workers in motivational programs at 3 levels: Village Level (Family Planning Health Assistant, Auxilliary Nurse-Midwife, Dias), Block Level (Public Health Nurse, Lady Health Visitor, Block Extension Educator), and District (District Extension Educator, District Mass Education and Information Officer). 3 Districts were selected from each State on the basis of overall family planning performance during 1970-2 (good, average, or poor). Units of other agencies were also included on the same basis. Findings: 1) Workers in all 3 categories preferred individual contacts over group meetings or mass approach. 2) 56-64% said they used audiovisual aids "sometimes" (when available). 25% said they used them "many times" and only 15.9% said "rarely." 3) More than 1/2 of workers in each category said they were not properly oriented toward the use of audiovisual aids. Nonavailability of the aids in the market was also cited. About 1/3 of village level and 1/2 of other workers said that the materials were heavy and liable to be damaged. Complexity, inaccuracy and confusion in use were not widely cited (less than 30%).

  9. Audio-visual Training for Lip–reading

    DEFF Research Database (Denmark)

    Gebert, Hermann; Bothe, Hans-Heinrich

    2011-01-01

    a personalized learning process involving media rich content delivered via wireless networks to mobile devices. The main goal of this book is to provide innovative and creative ideas for improving the quality of learning and to explore all new learning- oriented technologies, devices and networks. The topics......This new edited book aims to bring together researchers and developers from various related areas to share their knowledge and experience, to describe current state of the art in mobile and wireless-based adaptive e-learning and to present innovative techniques and solutions that support...... of this book cover useful areas of general knowledge including Technologies for Adaptive Mobile Learning, Integrated Learning and Educational Environments, Pedagogically exploitable Guiding Principles and Practices for Web-based Learning Environments, Adaptive E-learning and Intelligent Tutoring Systems...

  10. Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

    Directory of Open Access Journals (Sweden)

    Petr Motlicek

    2013-01-01

    Full Text Available We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director. Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.

  11. Classification of stop place in consonant-vowel contexts using feature extrapolation of acoustic-phonetic features in telephone speech.

    Science.gov (United States)

    Lee, Jung-Won; Choi, Jeung-Yoon; Kang, Hong-Goo

    2012-02-01

    Knowledge-based speech recognition systems extract acoustic cues from the signal to identify speech characteristics. For channel-deteriorated telephone speech, acoustic cues, especially those for stop consonant place, are expected to be degraded or absent. To investigate the use of knowledge-based methods in degraded environments, feature extrapolation of acoustic-phonetic features based on Gaussian mixture models is examined. This process is applied to a stop place detection module that uses burst release and vowel onset cues for consonant-vowel tokens of English. Results show that classification performance is enhanced in telephone channel-degraded speech, with extrapolated acoustic-phonetic features reaching or exceeding performance using estimated Mel-frequency cepstral coefficients (MFCCs). Results also show acoustic-phonetic features may be combined with MFCCs for best performance, suggesting these features provide information complementary to MFCCs.

  12. Speech 7 through 12.

    Science.gov (United States)

    Nederland Independent School District, TX.

    GRADES OR AGES: Grades 7 through 12. SUBJECT MATTER: Speech. ORGANIZATION AND PHYSICAL APPEARANCE: Following the foreward, philosophy and objectives, this guide presents a speech curriculum. The curriculum covers junior high and Speech I, II, III (senior high). Thirteen units of study are presented for junior high, each unit is divided into…

  13. The effects of noise vocoding on speech quality perception.

    Science.gov (United States)

    Anderson, Melinda C; Arehart, Kathryn H; Kates, James M

    2014-03-01

    Speech perception depends on access to spectral and temporal acoustic cues. Temporal cues include slowly varying amplitude changes (i.e. temporal envelope, TE) and quickly varying amplitude changes associated with the center frequency of the auditory filter (i.e. temporal fine structure, TFS). This study quantifies the effects of TFS randomization through noise vocoding on the perception of speech quality by parametrically varying the amount of original TFS available above 1500Hz. The two research aims were: 1) to establish the role of TFS in quality perception, and 2) to determine if the role of TFS in quality perception differs between subjects with normal hearing and subjects with sensorineural hearing loss. Ratings were obtained from 20 subjects (10 with normal hearing and 10 with hearing loss) using an 11-point quality scale. Stimuli were processed in three different ways: 1) A 32-channel noise-excited vocoder with random envelope fluctuations in the noise carrier, 2) a 32-channel noise-excited vocoder with the noise-carrier envelope smoothed, and 3) removal of high-frequency bands. Stimuli were presented in quiet and in babble noise at 18dB and 12dB signal-to-noise ratios. TFS randomization had a measurable detrimental effect on quality ratings for speech in quiet and a smaller effect for speech in background babble. Subjects with normal hearing and subjects with sensorineural hearing loss provided similar quality ratings for noise-vocoded speech.

  14. Perception of speech in noise: neural correlates.

    Science.gov (United States)

    Song, Judy H; Skoe, Erika; Banai, Karen; Kraus, Nina

    2011-09-01

    The presence of irrelevant auditory information (other talkers, environmental noises) presents a major challenge to listening to speech. The fundamental frequency (F(0)) of the target speaker is thought to provide an important cue for the extraction of the speaker's voice from background noise, but little is known about the relationship between speech-in-noise (SIN) perceptual ability and neural encoding of the F(0). Motivated by recent findings that music and language experience enhance brainstem representation of sound, we examined the hypothesis that brainstem encoding of the F(0) is diminished to a greater degree by background noise in people with poorer perceptual abilities in noise. To this end, we measured speech-evoked auditory brainstem responses to /da/ in quiet and two multitalker babble conditions (two-talker and six-talker) in native English-speaking young adults who ranged in their ability to perceive and recall SIN. Listeners who were poorer performers on a standardized SIN measure demonstrated greater susceptibility to the degradative effects of noise on the neural encoding of the F(0). Particularly diminished was their phase-locked activity to the fundamental frequency in the portion of the syllable known to be most vulnerable to perceptual disruption (i.e., the formant transition period). Our findings suggest that the subcortical representation of the F(0) in noise contributes to the perception of speech in noisy conditions.

  15. 动画电影“意识流”视听语言的表现分析--以《千年女优》为例%Analysis of the “Stream Of Consciousness” Audio-Visual Language in“MillenniumActress”

    Institute of Scientific and Technical Information of China (English)

    李可文

    2015-01-01

    Animation iflms are iflled with illusion and imagination, which can represent the richest and most changeable inner characters of people. Artists always try to express the human stream of consciousness thoughts through all kinds of art forms, whether in ifction, painting or iflm. In the early stage, there were some artist’s movies to show“stream of consciousness”thoughts of character, and then they integrated it into animation iflm art form. Kon Satosh, a famous animation iflm director, directed the iflm“Millennium Actress”which completely demonstrated those skills, showed the mental activity and inner world of the roles vividly. In this paper, the author takes “Millennium Actress”as an example, emphatically analyzes how he applied the audio-visual language skills and combined with animation to show the rich“stream of consciousness”world of characters.%动画电影充满了虚幻性与想象力,最能表现人物丰富的内心世界。艺术家对“意识流”一直都在尝试通过各种形式给予表现,无论是小说、绘画以及电影等。早期就有人拍摄了表现人物“意识流”的电影,随后也有人将“意识流”电影视听语言融入动画电影之中的,日本导演今敏的动画影片《千年女优》就将这种形式技巧表达得淋漓尽致,同时生动、真实地揭示了角色的心理活动及其内心世界。因此以《千年女优》为例,重点探讨动画电影运用视听语言技巧并结合动画艺术以展现人物“意识流”内心世界的。

  16. The timing and effort of lexical access in natural and degraded speech

    Directory of Open Access Journals (Sweden)

    Anita Eva Wagner

    2016-03-01

    Full Text Available Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech.This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners' quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why

  17. Comparison of different speech tasks among adults who stutter and adults who do not stutter

    Directory of Open Access Journals (Sweden)

    Ana Paula Ritto

    2016-03-01

    Full Text Available OBJECTIVES: In this study, we compared the performance of both fluent speakers and people who stutter in three different speaking situations: monologue speech, oral reading and choral reading. This study follows the assumption that the neuromotor control of speech can be influenced by external auditory stimuli in both speakers who stutter and speakers who do not stutter. METHOD: Seventeen adults who stutter and seventeen adults who do not stutter were assessed in three speaking tasks: monologue, oral reading (solo reading aloud and choral reading (reading in unison with the evaluator. Speech fluency and rate were measured for each task. RESULTS: The participants who stuttered had a lower frequency of stuttering during choral reading than during monologue and oral reading. CONCLUSIONS: According to the dual premotor system model, choral speech enhanced fluency by providing external cues for the timing of each syllable compensating for deficient internal cues.

  18. Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients.

    Science.gov (United States)

    Su, Qiaotong; Galvin, John J; Zhang, Guoping; Li, Yongxin; Fu, Qian-Jie

    2016-06-30

    Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users.

  19. Speech in spinocerebellar ataxia.

    Science.gov (United States)

    Schalling, Ellika; Hartelius, Lena

    2013-12-01

    Spinocerebellar ataxias (SCAs) are a heterogeneous group of autosomal dominant cerebellar ataxias clinically characterized by progressive ataxia, dysarthria and a range of other concomitant neurological symptoms. Only a few studies include detailed characterization of speech symptoms in SCA. Speech symptoms in SCA resemble ataxic dysarthria but symptoms related to phonation may be more prominent. One study to date has shown an association between differences in speech and voice symptoms related to genotype. More studies of speech and voice phenotypes are motivated, to possibly aid in clinical diagnosis. In addition, instrumental speech analysis has been demonstrated to be a reliable measure that may be used to monitor disease progression or therapy outcomes in possible future pharmacological treatments. Intervention by speech and language pathologists should go beyond assessment. Clinical guidelines for management of speech, communication and swallowing need to be developed for individuals with progressive cerebellar ataxia.

  20. Using auditory-visual speech to probe the basis of noise-impaired consonant-vowel perception in dyslexia and auditory neuropathy

    Science.gov (United States)

    Ramirez, Joshua; Mann, Virginia

    2005-08-01

    Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.

  1. Digital speech processing using Matlab

    CERN Document Server

    Gopi, E S

    2014-01-01

    Digital Speech Processing Using Matlab deals with digital speech pattern recognition, speech production model, speech feature extraction, and speech compression. The book is written in a manner that is suitable for beginners pursuing basic research in digital speech processing. Matlab illustrations are provided for most topics to enable better understanding of concepts. This book also deals with the basic pattern recognition techniques (illustrated with speech signals using Matlab) such as PCA, LDA, ICA, SVM, HMM, GMM, BPN, and KSOM.

  2. Switching of auditory attention in "cocktail-party" listening: ERP evidence of cueing effects in younger and older adults.

    Science.gov (United States)

    Getzmann, Stephan; Jasny, Julian; Falkenstein, Michael

    2017-02-01

    Verbal communication in a "cocktail-party situation" is a major challenge for the auditory system. In particular, changes in target speaker usually result in declined speech perception. Here, we investigated whether speech cues indicating a subsequent change in target speaker reduce the costs of switching in younger and older adults. We employed event-related potential (ERP) measures and a speech perception task, in which sequences of short words were simultaneously presented by four speakers. Changes in target speaker were either unpredictable or semantically cued by a word within the target stream. Cued changes resulted in a less decreased performance than uncued changes in both age groups. The ERP analysis revealed shorter latencies in the change-related N400 and late positive complex (LPC) after cued changes, suggesting an acceleration in context updating and attention switching. Thus, both younger and older listeners used semantic cues to prepare changes in speaker setting.

  3. Behavioral Cues of Interpersonal Warmth

    Science.gov (United States)

    Bayes, Marjorie A.

    1972-01-01

    The results of this study suggest, first, that interpersonal warmth does seem to be a personality dimension which can be reliably judged and, second, that it was possible to define and demonstrate the relevance of a number of behavioral cues for warmth. (Author)

  4. Optimal cue integration in ants.

    Science.gov (United States)

    Wystrach, Antoine; Mangan, Michael; Webb, Barbara

    2015-10-07

    In situations with redundant or competing sensory information, humans have been shown to perform cue integration, weighting different cues according to their certainty in a quantifiably optimal manner. Ants have been shown to merge the directional information available from their path integration (PI) and visual memory, but as yet it is not clear that they do so in a way that reflects the relative certainty of the cues. In this study, we manipulate the variance of the PI home vector by allowing ants (Cataglyphis velox) to run different distances and testing their directional choice when the PI vector direction is put in competition with visual memory. Ants show progressively stronger weighting of their PI direction as PI length increases. The weighting is quantitatively predicted by modelling the expected directional variance of home vectors of different lengths and assuming optimal cue integration. However, a subsequent experiment suggests ants may not actually compute an internal estimate of the PI certainty, but are using the PI home vector length as a proxy.

  5. Optimal assessment of multiple cues

    NARCIS (Netherlands)

    Fawcett, TW; Johnstone, RA

    2003-01-01

    In a wide range of contexts from mate choice to foraging, animals are required to discriminate between alternative options on the basis of multiple cues. How should they best assess such complex multicomponent stimuli? Here, we construct a model to investigate this problem, focusing on a simple case

  6. Electrical brain imaging evidences left auditory cortex involvement in speech and non-speech discrimination based on temporal features

    Directory of Open Access Journals (Sweden)

    Jancke Lutz

    2007-12-01

    Full Text Available Abstract Background Speech perception is based on a variety of spectral and temporal acoustic features available in the acoustic signal. Voice-onset time (VOT is considered an important cue that is cardinal for phonetic perception. Methods In the present study, we recorded and compared scalp auditory evoked potentials (AEP in response to consonant-vowel-syllables (CV with varying voice-onset-times (VOT and non-speech analogues with varying noise-onset-time (NOT. In particular, we aimed to investigate the spatio-temporal pattern of acoustic feature processing underlying elemental speech perception and relate this temporal processing mechanism to specific activations of the auditory cortex. Results Results show that the characteristic AEP waveform in response to consonant-vowel-syllables is on a par with those of non-speech sounds with analogue temporal characteristics. The amplitude of the N1a and N1b component of the auditory evoked potentials significantly correlated with the duration of the VOT in CV and likewise, with the duration of the NOT in non-speech sounds. Furthermore, current density maps indicate overlapping supratemporal networks involved in the perception of both speech and non-speech sounds with a bilateral activation pattern during the N1a time window and leftward asymmetry during the N1b time window. Elaborate regional statistical analysis of the activation over the middle and posterior portion of the supratemporal plane (STP revealed strong left lateralized responses over the middle STP for both the N1a and N1b component, and a functional leftward asymmetry over the posterior STP for the N1b component. Conclusion The present data demonstrate overlapping spatio-temporal brain responses during the perception of temporal acoustic cues in both speech and non-speech sounds. Source estimation evidences a preponderant role of the left middle and posterior auditory cortex in speech and non-speech discrimination based on temporal

  7. Exploration of Speech Planning and Producing by Speech Error Analysis

    Institute of Scientific and Technical Information of China (English)

    冷卉

    2012-01-01

    Speech error analysis is an indirect way to discover speech planning and producing processes. From some speech errors made by people in their daily life, linguists and learners can reveal the planning and producing processes more easily and clearly.

  8. Indirect Speech Acts

    Institute of Scientific and Technical Information of China (English)

    李威

    2001-01-01

    Indirect speech acts are frequently used in verbal communication, the interpretation of them is of great importance in order to meet the demands of the development of students' communicative competence. This paper, therefore, intends to present Searle' s indirect speech acts and explore the way how indirect speech acts are interpreted in accordance with two influential theories. It consists of four parts. Part one gives a general introduction to the notion of speech acts theory. Part two makes an elaboration upon the conception of indirect speech act theory proposed by Searle and his supplement and development of illocutionary acts. Part three deals with the interpretation of indirect speech acts. Part four draws implication from the previous study and also serves as the conclusion of the dissertation.

  9. Phonetic categorisation and cue weighting in adolescents with Specific Language Impairment (SLI).

    Science.gov (United States)

    Tuomainen, Outi; Stuart, Nichola J; van der Lely, Heather K J

    2015-07-01

    This study investigates phonetic categorisation and cue weighting in adolescents and young adults with Specific Language Impairment (SLI). We manipulated two acoustic cues, vowel duration and F1 offset frequency, that signal word-final stop consonant voicing ([t] and [d]) in English. Ten individuals with SLI (14.0-21.4 years), 10 age-matched controls (CA; 14.6-21.9 years) and 10 non-matched adult controls (23.3-36.0 years) labelled synthetic CVC non-words in an identification task. The results showed that the adolescents and young adults with SLI were less consistent than controls in the identification of the good category representatives. The group with SLI also assigned less weight to vowel duration than the adult controls. However, no direct relationship between phonetic categorisation, cue weighting and language skills was found. These findings indicate that some individuals with SLI have speech perception deficits but they are not necessarily associated with oral language skills.

  10. Zebra finches can use positional and transitional cues to distinguish vocal element strings.

    Science.gov (United States)

    Chen, Jiani; Ten Cate, Carel

    2015-08-01

    Learning sequences is of great importance to humans and non-human animals. Many motor and mental actions, such as singing in birds and speech processing in humans, rely on sequential learning. At least two mechanisms are considered to be involved in such learning. The chaining theory proposes that learning of sequences relies on memorizing the transitions between adjacent items, while the positional theory suggests that learners encode the items according to their ordinal position in the sequence. Positional learning is assumed to dominate sequential learning. However, human infants exposed to a string of speech sounds can learn transitional (chaining) cues. So far, it is not clear whether birds, an increasingly important model for examining vocal processing, can do this. In this study we use a Go-Nogo design to examine whether zebra finches can use transitional cues to distinguish artificially constructed strings of song elements. Zebra finches were trained with sequences differing in transitional and positional information and next tested with novel strings sharing positional and transitional similarities with the training strings. The results show that they can attend to both transitional and positional cues and that their sequential coding strategies can be biased toward transitional cues depending on the learning context. This article is part of a Special Issue entitled: In Honor of Jerry Hogan.

  11. Charisma in business speeches

    DEFF Research Database (Denmark)

    Niebuhr, Oliver; Brem, Alexander; Novák-Tót, Eszter

    2016-01-01

    of the acoustic-prosodic signal, secondly, focuses on business speeches like product presentations, and, thirdly, in doing so, advances the still fairly fragmentary evidence on the prosodic correlates of charismatic speech. We show that the prosodic features of charisma in political speeches also apply...... to business speeches. Consistent with the public opinion, our findings are indicative of Steve Jobs being a more charismatic speaker than Mark Zuckerberg. Beyond previous studies, our data suggest that rhythm and emphatic accentuation are also involved in conveying charisma. Furthermore, the differences...

  12. Principles of speech coding

    CERN Document Server

    Ogunfunmi, Tokunbo

    2010-01-01

    It is becoming increasingly apparent that all forms of communication-including voice-will be transmitted through packet-switched networks based on the Internet Protocol (IP). Therefore, the design of modern devices that rely on speech interfaces, such as cell phones and PDAs, requires a complete and up-to-date understanding of the basics of speech coding. Outlines key signal processing algorithms used to mitigate impairments to speech quality in VoIP networksOffering a detailed yet easily accessible introduction to the field, Principles of Speech Coding provides an in-depth examination of the

  13. Advances in Speech Recognition

    CERN Document Server

    Neustein, Amy

    2010-01-01

    This volume is comprised of contributions from eminent leaders in the speech industry, and presents a comprehensive and in depth analysis of the progress of speech technology in the topical areas of mobile settings, healthcare and call centers. The material addresses the technical aspects of voice technology within the framework of societal needs, such as the use of speech recognition software to produce up-to-date electronic health records, not withstanding patients making changes to health plans and physicians. Included will be discussion of speech engineering, linguistics, human factors ana

  14. When Symbolic Spatial Cues Go before Numbers

    Science.gov (United States)

    Herrera, Amparo; Macizo, Pedro

    2011-01-01

    This work explores the effect of spatial cueing on number processing. Participants performed a parity judgment task. However, shortly before the target number, a cue (arrow pointing to left, arrow pointing to right or a cross) was centrally presented. In Experiment 1, in which responses were lateralized, the cue direction modulated the interaction…

  15. Cue salience influences the use of height cues in reorientation in pigeons (Columba livia).

    Science.gov (United States)

    Du, Yu; Mahdi, Nuha; Paul, Breanne; Spetch, Marcia L

    2016-07-01

    Although orienting ability has been examined with numerous types of cues, most research has focused only on cues from the horizontal plane. The current study investigated pigeons' use of wall height, a vertical cue, in an open-field task and compared it with their use of horizontal cues. Pigeons were trained to locate food in 2 diagonal corners of a rectangular enclosure with 2 opposite high walls as height cues. Before each trial, pigeons were rotated to disorient them. In training, pigeons could use either the horizontal cues from the rectangular enclosure or the height information from the walls to locate the food. In testing, the apparatus was modified to provide (a) horizontal cues only, (b) height cues only, and (c) both height and horizontal cues in conflict. In Experiment 1 the lower and high walls, respectively, were 40 and 80 cm, whereas in Experiment 2 they were made more perceptually salient by shortening them to 20 and 40 cm. Pigeons accurately located the goal corners with horizontal cues alone in both experiments, but they searched accurately with height cues alone only in Experiment 2. When the height cues conflicted with horizontal cues, pigeons preferred the horizontal cues over the height cues in Experiment 1 but not in Experiment 2, suggesting that perceptual salience influences the relative weighting of cues. (PsycINFO Database Record

  16. Application and design of audio-visual aids stomatology teaching in orthodontic non-stomatology students%非口腔医学专业医学生口腔正畸学教学中“口腔直观教学法”的设计与应用

    Institute of Scientific and Technical Information of China (English)

    李若萱; 吕亚林; 王晓庚

    2012-01-01

    Objective This study is to discuss the effects of audio- visual aids stomatology teaching in undergraduate orthodontic training for students majoring in preventive medicine in two credit hours.Methods We selected 85 students from the 2007 and 2008 matriculating classes of the preventive medicine department of Capital Medical University.Using the eight-year orthodontic textbook as our reference,we taught the theory through the multimedia pathway in the first class hour,and implemented teaching by playing situation in the trainee class hour.A follow-up survey was carried out to obtain students' feedback on the combined teaching method.Results Our survey showed that the majority of students realized the goal of using the method and believed their interest in learning orthodontics was significantly enhanced.In fact,they became fascinated by orthodontics in the limited time of the study.Conclusions We concluded that the integration of object teaching combination with situational teaching is of great assistance to orthodontic training; however,the integration must be carefully prepared to ensure student participation,maximize the benefits of integration and improve the course from direct feedback.%目的 在2学时的非口腔医学专业本科学生口腔正畸学教学中设计并实施“口腔直观教学法”,并评价其教学效果.方法 以首都医科大学2007级和2008级预防医学专业85名学生作为研究对象,以八年制口腔正畸学教科书为教材,1学时理论教学采用多媒体形式,1学时见习教学采用情景扮演方式.教学结束后,采用理论考核和问卷调查方式评价教学效果,分析学生对“口腔直观教学法”的反馈评价.结果 学生对口腔正畸学理论知识掌握较好,大部分学生能够明确教学目的.学生认为“口腔直观教学法”增强了对学习口腔正畸学的兴趣,在极其有限的时间内,对口腔正畸学留下了深刻印象.结论 “口腔直观教学法”适合

  17. Speech recognition in natural background noise.

    Directory of Open Access Journals (Sweden)

    Julien Meyer

    Full Text Available In the real world, human speech recognition nearly always involves listening in background noise. The impact of such noise on speech signals and on intelligibility performance increases with the separation of the listener from the speaker. The present behavioral experiment provides an overview of the effects of such acoustic disturbances on speech perception in conditions approaching ecologically valid contexts. We analysed the intelligibility loss in spoken word lists with increasing listener-to-speaker distance in a typical low-level natural background noise. The noise was combined with the simple spherical amplitude attenuation due to distance, basically changing the signal-to-noise ratio (SNR. Therefore, our study draws attention to some of the most basic environmental constraints that have pervaded spoken communication throughout human history. We evaluated the ability of native French participants to recognize French monosyllabic words (spoken at 65.3 dB(A, reference at 1 meter at distances between 11 to 33 meters, which corresponded to the SNRs most revealing of the progressive effect of the selected natural noise (-8.8 dB to -18.4 dB. Our results showed that in such conditions, identity of vowels is mostly preserved, with the striking peculiarity of the absence of confusion in vowels. The results also confirmed the functional role of consonants during lexical identification. The extensive analysis of recognition scores, confusion patterns and associated acoustic cues revealed that sonorant, sibilant and burst properties were the most important parameters influencing phoneme recognition. . Altogether these analyses allowed us to extract a resistance scale from consonant recognition scores. We also identified specific perceptual consonant confusion groups depending of the place in the words (onset vs. coda. Finally our data suggested that listeners may access some acoustic cues of the CV transition, opening interesting perspectives for

  18. Discriminability and Perceptual Saliency of Temporal and Spectral Cues for Final Fricative Consonant Voicing in Simulated Cochlear-Implant and Bimodal Hearing

    Directory of Open Access Journals (Sweden)

    Ying-Yee Kong

    2016-05-01

    Full Text Available Multiple redundant acoustic cues can contribute to the perception of a single phonemic contrast. This study investigated the effect of spectral degradation on the discriminability and perceptual saliency of acoustic cues for identification of word-final fricative voicing in “loss” versus “laws”, and possible changes that occurred when low-frequency acoustic cues were restored. Three acoustic cues that contribute to the word-final /s/-/z/ contrast (first formant frequency [F1] offset, vowel–consonant duration ratio, and consonant voicing duration were systematically varied in synthesized words. A discrimination task measured listeners’ ability to discriminate differences among stimuli within a single cue dimension. A categorization task examined the extent to which listeners make use of a given cue to label a syllable as “loss” versus “laws” when multiple cues are available. Normal-hearing listeners were presented with stimuli that were either unprocessed, processed with an eight-channel noise-band vocoder to approximate spectral degradation in cochlear implants, or low-pass filtered. Listeners were tested in four listening conditions: unprocessed, vocoder, low-pass, and a combined vocoder + low-pass condition that simulated bimodal hearing. Results showed a negative impact of spectral degradation on F1 cue discrimination and a trading relation between spectral and temporal cues in which listeners relied more heavily on the temporal cues for “loss-laws” identification when spectral cues were degraded. Furthermore, the addition of low-frequency fine-structure cues in simulated bimodal hearing increased the perceptual saliency of the F1 cue for “loss-laws” identification compared with vocoded speech. Findings suggest an interplay between the quality of sensory input and cue importance.

  19. Perceptual restoration of masked speech in human cortex

    Science.gov (United States)

    Leonard, Matthew K.; Baud, Maxime O.; Sjerps, Matthias J.; Chang, Edward F.

    2016-01-01

    Humans are adept at understanding speech despite the fact that our natural listening environment is often filled with interference. An example of this capacity is phoneme restoration, in which part of a word is completely replaced by noise, yet listeners report hearing the whole word. The neurological basis for this unconscious fill-in phenomenon is unknown, despite being a fundamental characteristic of human hearing. Here, using direct cortical recordings in humans, we demonstrate that missing speech is restored at the acoustic-phonetic level in bilateral auditory cortex, in real-time. This restoration is preceded by specific neural activity patterns in a separate language area, left frontal cortex, which predicts the word that participants later report hearing. These results demonstrate that during speech perception, missing acoustic content is synthesized online from the integration of incoming sensory cues and the internal neural dynamics that bias word-level expectation and prediction. PMID:27996973

  20. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Feeding Your 1- to 2-Year-Old Speech-Language Therapy KidsHealth > For Parents > Speech-Language Therapy A ... with speech and/or language disorders. Speech Disorders, Language Disorders, and Feeding Disorders A speech disorder refers ...

  1. Speech Compression for Noise-Corrupted Thai Expressive Speech

    Directory of Open Access Journals (Sweden)

    Suphattharachai Chomphan

    2011-01-01

    Full Text Available Problem statement: In speech communication, speech coding aims at preserving the speech quality with lower coding bitrate. When considering the communication environment, various types of noises deteriorates the speech quality. The expressive speech with different speaking styles may cause different speech quality with the same coding method. Approach: This research proposed a study of speech compression for noise-corrupted Thai expressive speech by using two coding methods of CS-ACELP and MP-CELP. The speech material included a hundredmale speech utterances and a hundred female speech utterances. Four speaking styles included enjoyable, sad, angry and reading styles. Five sentences of Thai speech were chosen. Three types of noises were included (train, car and air conditioner. Five levels of each type of noise were varied from 0-20 dB. The subjective test of mean opinion score was exploited in the evaluation process. Results: The experimental results showed that CS-ACELP gave the better speech quality than that of MP-CELP at all three bitrates of 6000, 8600-12600 bps. When considering the levels of noise, the 20-dB noise gave the best speech quality, while 0-dB noise gave the worst speech quality. When considering the speech gender, female speech gave the better results than that of male speech. When considering the types of noise, the air-conditioner noise gave the best speech quality, while the train noise gave the worst speech quality. Conclusion: From the study, it can be seen that coding methods, types of noise, levels of noise, speech gender influence on the coding speech quality.

  2. Children's Responses to Computer-Synthesized Speech in Educational Media: Gender Consistency and Gender Similarity Effects

    Science.gov (United States)

    Lee, Kwan Min; Liao, Katharine; Ryu, Seoungho

    2007-01-01

    This study examines children's social responses to gender cues in synthesized speech in a computer-based instruction setting. Eighty 5th-grade elementary school children were randomly assigned to one of the conditions in a full-factorial 2 (participant gender) x 2 (voice gender) x 2 (content gender) experiment. Results show that children apply…

  3. Durational Patterning at Syntactic and Discourse Boundaries in Mandarin Spontaneous Speech

    Science.gov (United States)

    Fon, Janice; Johnson, Keith; Chen, Sally

    2011-01-01

    This study focused on durational cues (i.e., syllable duration, pause duration, and syllable onset intervals (SOIs)) at discourse boundaries in two dialects of Mandarin, Taiwan and Mainland varieties. Speech was elicited by having 18 participants describe events in "The Pear Story" film. Recorded data were transcribed, labeled, and segmented into…

  4. Free Speech Yearbook 1976.

    Science.gov (United States)

    Phifer, Gregg, Ed.

    The articles collected in this annual address several aspects of First Amendment Law. The following titles are included: "Freedom of Speech As an Academic Discipline" (Franklyn S. Haiman), "Free Speech and Foreign-Policy Decision Making" (Douglas N. Freeman), "The Supreme Court and the First Amendment: 1975-1976"…

  5. Private Speech in Ballet

    Science.gov (United States)

    Johnston, Dale

    2006-01-01

    Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

  6. Tracking Speech Sound Acquisition

    Science.gov (United States)

    Powell, Thomas W.

    2011-01-01

    This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…

  7. Preschool Connected Speech Inventory.

    Science.gov (United States)

    DiJohnson, Albert; And Others

    This speech inventory developed for a study of aurally handicapped preschool children (see TM 001 129) provides information on intonation patterns in connected speech. The inventory consists of a list of phrases and simple sentences accompanied by pictorial clues. The test is individually administered by a teacher-examiner who presents the spoken…

  8. Free Speech. No. 38.

    Science.gov (United States)

    Kane, Peter E., Ed.

    This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds Voted For Schorr Inquiry" by Richard Lyons, "Erosion of the…

  9. Advertising and Free Speech.

    Science.gov (United States)

    Hyman, Allen, Ed.; Johnson, M. Bruce, Ed.

    The articles collected in this book originated at a conference at which legal and economic scholars discussed the issue of First Amendment protection for commercial speech. The first article, in arguing for freedom for commercial speech, finds inconsistent and untenable the arguments of those who advocate freedom from regulation for political…

  10. Representation of speech in human auditory cortex: is it special?

    Science.gov (United States)

    Steinschneider, Mitchell; Nourski, Kirill V; Fishman, Yonatan I

    2013-11-01

    Successful categorization of phonemes in speech requires that the brain analyze the acoustic signal along both spectral and temporal dimensions. Neural encoding of the stimulus amplitude envelope is critical for parsing the speech stream into syllabic units. Encoding of voice onset time (VOT) and place of articulation (POA), cues necessary for determining phonemic identity, occurs within shorter time frames. An unresolved question is whether the neural representation of speech is based on processing mechanisms that are unique to humans and shaped by learning and experience, or is based on rules governing general auditory processing that are also present in non-human animals. This question was examined by comparing the neural activity elicited by speech and other complex vocalizations in primary auditory cortex of macaques, who are limited vocal learners, with that in Heschl's gyrus, the putative location of primary auditory cortex in humans. Entrainment to the amplitude envelope is neither specific to humans nor to human speech. VOT is represented by responses time-locked to consonant release and voicing onset in both humans and monkeys. Temporal representation of VOT is observed both for isolated syllables and for syllables embedded in the more naturalistic context of running speech. The fundamental frequency of male speakers is represented by more rapid neural activity phase-locked to the glottal pulsation rate in both humans and monkeys. In both species, the differential representation of stop consonants varying in their POA can be predicted by the relationship between the frequency selectivity of neurons and the onset spectra of the speech sounds. These findings indicate that the neurophysiology of primary auditory cortex is similar in monkeys and humans despite their vastly different experience with human speech, and that Heschl's gyrus is engaged in general auditory, and not language-specific, processing. This article is part of a Special Issue entitled

  11. The effect of filtered speech feedback on the frequency of stuttering

    Science.gov (United States)

    Rami, Manish Krishnakant

    2000-10-01

    whispered speech conditions all decreased the frequency of stuttering while the approximate glottal source did not. It is suggested that articulatory events, chiefly the encoded speech output of the vocal tract origin, afford effective cues and induces fluent speech in people who stutter.

  12. Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

    DEFF Research Database (Denmark)

    Jørgensen, Søren; Dau, Torsten

    2013-01-01

    The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated...... to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating...... from computational auditory scene analysis and further support the hypothesis that the SNRenv is a powerful metric for speech intelligibility prediction....

  13. The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for Mandarin Sentence Recognition

    Directory of Open Access Journals (Sweden)

    Yang Guo

    2017-01-01

    Full Text Available Acoustic temporal envelope (E cues containing speech information are distributed across the frequency spectrum. To investigate the relative weight of E cues in different frequency regions for Mandarin sentence recognition, E information was extracted from 30 contiguous bands across the range of 80–7,562 Hz using Hilbert decomposition and then allocated to five frequency regions. Recognition scores were obtained with acoustic E cues from 1 or 2 random regions from 40 normal-hearing listeners. While the recognition scores ranged from 8.2% to 16.3% when E information from only one region was available, the scores ranged from 57.9% to 87.7% when E information from two frequency regions was presented, suggesting a synergistic effect among the temporal E cues in different frequency regions. Next, the relative contributions of the E information from the five frequency regions to sentence perception were computed using a least-squares approach. The results demonstrated that, for Mandarin Chinese, a tonal language, the temporal E cues of Frequency Region 1 (80–502 Hz and Region 3 (1,022–1,913 Hz contributed more to the intelligence of sentence recognition than other regions, particularly the region of 80–502 Hz, which contained fundamental frequency (F0 information.

  14. Environmental Contamination of Normal Speech.

    Science.gov (United States)

    Harley, Trevor A.

    1990-01-01

    Environmentally contaminated speech errors (irrelevant words or phrases derived from the speaker's environment and erroneously incorporated into speech) are hypothesized to occur at a high level of speech processing, but with a relatively late insertion point. The data indicate that speech production processes are not independent of other…

  15. Speech processing in mobile environments

    CERN Document Server

    Rao, K Sreenivasa

    2014-01-01

    This book focuses on speech processing in the presence of low-bit rate coding and varying background environments. The methods presented in the book exploit the speech events which are robust in noisy environments. Accurate estimation of these crucial events will be useful for carrying out various speech tasks such as speech recognition, speaker recognition and speech rate modification in mobile environments. The authors provide insights into designing and developing robust methods to process the speech in mobile environments. Covering temporal and spectral enhancement methods to minimize the effect of noise and examining methods and models on speech and speaker recognition applications in mobile environments.

  16. An Exploration of Rhythmic Grouping of Speech Sequences by French- and German-Learning Infants.

    Science.gov (United States)

    Abboub, Nawal; Boll-Avetisyan, Natalie; Bhatara, Anjali; Höhle, Barbara; Nazzi, Thierry

    2016-01-01

    Rhythm in music and speech can be characterized by a constellation of several acoustic cues. Individually, these cues have different effects on rhythmic perception: sequences of sounds alternating in duration are perceived as short-long pairs (weak-strong/iambic pattern), whereas sequences of sounds alternating in intensity or pitch are perceived as loud-soft, or high-low pairs (strong-weak/trochaic pattern). This perceptual bias-called the Iambic-Trochaic Law (ITL)-has been claimed to be an universal property of the auditory system applying in both the music and the language domains. Recent studies have shown that language experience can modulate the effects of the ITL on rhythmic perception of both speech and non-speech sequences in adults, and of non-speech sequences in 7.5-month-old infants. The goal of the present study was to explore whether language experience also modulates infants' grouping of speech. To do so, we presented sequences of syllables to monolingual French- and German-learning 7.5-month-olds. Using the Headturn Preference Procedure (HPP), we examined whether they were able to perceive a rhythmic structure in sequences of syllables that alternated in duration, pitch, or intensity. Our findings show that both French- and German-learning infants perceived a rhythmic structure when it was cued by duration or pitch but not intensity. Our findings also show differences in how these infants use duration and pitch cues to group syllable sequences, suggesting that pitch cues were the easier ones to use. Moreover, performance did not differ across languages, failing to reveal early language effects on rhythmic perception. These results contribute to our understanding of the origin of rhythmic perception and perceptual mechanisms shared across music and speech, which may bootstrap language acquisition.

  17. Prosodic cues to word order: what level of representation?

    Directory of Open Access Journals (Sweden)

    Carline eBernard

    2012-10-01

    Full Text Available Within language, systematic correlations exist between syntactic structure and prosody. Prosodic prominence, for instance, falls on the complement and not the head of syntactic phrases, and its realization depends on the phrasal position of the prominent element. Thus, in Japanese, a functor-final language, prominence is phrase-initial and realized as increased pitch (^Tōkyō ni ‘Tokyo to’, whereas in French, English or Italian, functor-initial languages, it manifests itself as phrase-final lengthening (to Rome. Prosody is readily available in the linguistic signal even to the youngest infants. It has, therefore, been proposed that young learners might be able to exploit its correlations with syntax to bootstrap language structure. In this study, we tested this hypothesis, investigating how 8-month-old monolingual French infants processed an artificial grammar manipulating the relative position of prosodic prominence and word frequency. In Condition 1, we created a speech stream in which the two cues, prosody and frequency, were aligned, frequent words being prosodically non-prominent and infrequent ones being prominent, as is the case in natural language (functors are prosodically minimal compared to content words. In Condition 2, the two cues were misaligned, with frequent words carrying prosodic prominence, unlike in natural language. After familiarization with the aligned or the misaligned stream in a headturn preference procedure, we tested infants’ preference for test items having a frequent word initial or a frequent word final word order. We found that infants’ familiarized with the aligned stream showed the expected preference for the frequent word initial test items, mimicking the functor-initial word order of French. Infants in the misaligned condition showed no preference. These results suggest that infants are able to use word frequency and prosody as early cues to word order and they integrate them into a coherent

  18. Global Freedom of Speech

    DEFF Research Database (Denmark)

    Binderup, Lars Grassme

    2007-01-01

    , as opposed to a legal norm, that curbs exercises of the right to free speech that offend the feelings or beliefs of members from other cultural groups. The paper rejects the suggestion that acceptance of such a norm is in line with liberal egalitarian thinking. Following a review of the classical liberal...... egalitarian reasons for free speech - reasons from overall welfare, from autonomy and from respect for the equality of citizens - it is argued that these reasons outweigh the proposed reasons for curbing culturally offensive speech. Currently controversial cases such as that of the Danish Cartoon Controversy...

  19. Surface Flow from Visual Cues

    OpenAIRE

    Petit, Benjamin,; Letouzey, Antoine; Boyer, Edmond; Franco, Jean-Sébastien

    2011-01-01

    International audience; In this paper we study the estimation of dense, instantaneous 3D motion fields over a non-rigidly moving surface observed by multi-camera systems. The motivation arises from multi-camera applications that require motion information, for arbitrary subjects, in order to perform tasks such as surface tracking or segmentation. To this aim, we present a novel framework that allows to efficiently compute dense 3D displacement fields using low level visual cues and geometric con...

  20. The Rhetoric in English Speech

    Institute of Scientific and Technical Information of China (English)

    马鑫

    2014-01-01

    English speech has a very long history and always attached importance of people highly. People usually give a speech in economic activities, political forums and academic reports to express their opinions to investigate or persuade others. English speech plays a rather important role in English literature. The distinct theme of speech should attribute to the rhetoric. It discusses parallelism, repetition and rhetorical question in English speech, aiming to help people appreciate better the charm of them.

  1. Audiovisual integration for speech during mid-childhood: electrophysiological evidence.

    Science.gov (United States)

    Kaganovich, Natalya; Schumaker, Jennifer

    2014-12-01

    Previous studies have demonstrated that the presence of visual speech cues reduces the amplitude and latency of the N1 and P2 event-related potential (ERP) components elicited by speech stimuli. However, the developmental trajectory of this effect is not yet fully mapped. We examined ERP responses to auditory, visual, and audiovisual speech in two groups of school-age children (7-8-year-olds and 10-11-year-olds) and in adults. Audiovisual speech led to the attenuation of the N1 and P2 components in all groups of participants, suggesting that the neural mechanisms underlying these effects are functional by early school years. Additionally, while the reduction in N1 was largest over the right scalp, the P2 attenuation was largest over the left and midline scalp. The difference in the hemispheric distribution of the N1 and P2 attenuation supports the idea that these components index at least somewhat disparate neural processes within the context of audiovisual speech perception.

  2. Speech intelligibility in hospitals.

    Science.gov (United States)

    Ryherd, Erica E; Moeller, Michael; Hsu, Timothy

    2013-07-01

    Effective communication between staff members is key to patient safety in hospitals. A variety of patient care activities including admittance, evaluation, and treatment rely on oral communication. Surprisingly, published information on speech intelligibility in hospitals is extremely limited. In this study, speech intelligibility measurements and occupant evaluations were conducted in 20 units of five different U.S. hospitals. A variety of unit types and locations were studied. Results show that overall, no unit had "good" intelligibility based on the speech intelligibility index (SII > 0.75) and several locations found to have "poor" intelligibility (SII speech intelligibility across a variety of hospitals and unit types, offers some evidence of the positive impact of absorption on intelligibility, and identifies areas for future research.

  3. Anxiety and ritualized speech

    Science.gov (United States)

    Lalljee, Mansur; Cook, Mark

    1975-01-01

    The experiment examines the effects on a number of words that seem irrelevant to semantic communication. The Units of Ritualized Speech (URSs) considered are: 'I mean', 'in fact', 'really', 'sort of', 'well' and 'you know'. (Editor)

  4. Speech disorders - children

    Science.gov (United States)

    ... this page: //medlineplus.gov/ency/article/001430.htm Speech disorders - children To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  5. Speech impairment (adult)

    Science.gov (United States)

    ... this page: //medlineplus.gov/ency/article/003204.htm Speech impairment (adult) To use the sharing features on ... 2017, A.D.A.M., Inc. Duplication for commercial use must be authorized in writing by ADAM ...

  6. Cueing the Virtual Storyteller: Analysis of cue phrase usage in fairy tales

    NARCIS (Netherlands)

    Penning, Manon; Theune, Mariët; Busemann, S.

    2007-01-01

    An existing taxonomy of Dutch cue phrases, designed for use in story generation, was validated by analysing cue phrase usage in a corpus of classical fairy tales. The analysis led to some adaptations of the original taxonomy.

  7. Speech Compression and Synthesis

    Science.gov (United States)

    1980-10-01

    phonological rules combined with diphone improved the algorithms used by the phonetic synthesis prog?Im for gain normalization and time... phonetic vocoder, spectral template. i0^Th^TreprtTörc"u’d1sTuV^ork for the past two years on speech compression’and synthesis. Since there was an...from Block 19: speech recognition, pnoneme recogmtion. initial design for a phonetic recognition program. We also recorded ana partially labeled a

  8. Recognizing GSM Digital Speech

    OpenAIRE

    2005-01-01

    The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source c...

  9. The Practice and Reflection of "Project Teaching":Taking Electronic Audio-Visual Technology Major of Anhui Broadcasting Movie and Television College for an Example%“项目教学”的实践与思考——以安徽广播影视职业技术学院电子声像技术专业为例

    Institute of Scientific and Technical Information of China (English)

    孙博文

    2011-01-01

    Since 2009 when Electronic Audio-Visual Technology Major of Anhui Broadcasting Movie and Television College unfolded the workflow-based project teaching reform,the major has undertaken a lot of exploration and practice,together with the revise of personnel training scheme,the increasing of practice class rate.Besides,all core courses are demanded to compile workflow-based project teaching course syllabus.The teaching reform is mingled with achievements and,of course,some problems.In regard to these problems,the teaching and research section has undertaken a lot of teaching research,and proposed some feasible methods.%安徽广播影视职业技术学院电子声像专业自2009年开展以工作流程为导向的项目教学改革以来,该专业从教学内容到教学方法方面进行了大量的探索和实践,重新修订了人才培养方案,增大了实践课课时的比例,核心课程均要求编写以工作流程为导向的项目教学课程大纲。在教学改革中,既有成绩,也发现了一些问题。针对这些问题,教研室进行了大量的教研,提出了一些切实可行的方法。

  10. Cue-switch costs in task-switching: cue priming or control processes?

    Science.gov (United States)

    Grange, James A; Houghton, George

    2010-09-01

    In the explicitly cued task-switching paradigm, two cues per task allow separation of costs associated with switching cues from costs of switching tasks. Whilst task-switch costs have become controversial, cue-switch costs are robust. The processes that contribute to cue-switch costs are under-specified in the literature: they could reflect perceptual priming of cue properties, or priming of control processes that form relevant working memory (WM) representations of task demands. Across two experiments we manipulated cue-transparency in an attention-switching design to test the contrasting hypotheses of cue-switch costs, and show that such costs emerge from control processes of establishing relevant WM representations, rather than perceptual priming of the cue itself. When the cues were maximally transparent, cue-switch costs were eradicated. We discuss the results in terms of recent theories of cue encoding, and provide a formal definition of cue-transparency in switching designs and its relation to WM representations that guide task performance.

  11. Effects of Adaptation Rate and Noise Suppression on the Intelligibility of Compressed-Envelope Based Speech.

    Directory of Open Access Journals (Sweden)

    Ying-Hui Lai

    Full Text Available Temporal envelope is the primary acoustic cue used in most cochlear implant (CI speech processors to elicit speech perception for patients fitted with CI devices. Envelope compression narrows down envelope dynamic range and accordingly degrades speech understanding abilities of CI users, especially under challenging listening conditions (e.g., in noise. A new adaptive envelope compression (AEC strategy was proposed recently, which in contrast to the traditional static envelope compression, is effective at enhancing the modulation depth of envelope waveform by making best use of its dynamic range and thus improving the intelligibility of envelope-based speech. The present study further explored the effect of adaptation rate in envelope compression on the intelligibility of compressed-envelope based speech. Moreover, since noise reduction is another essential unit in modern CI systems, the compatibility of AEC and noise reduction was also investigated. In this study, listening experiments were carried out by presenting vocoded sentences to normal hearing listeners for recognition. Experimental results demonstrated that the adaptation rate in envelope compression had a notable effect on the speech intelligibility performance of the AEC strategy. By specifying a suitable adaptation rate, speech intelligibility could be enhanced significantly in noise compared to when using static envelope compression. Moreover, results confirmed that the AEC strategy was suitable for combining with noise reduction to improve the intelligibility of envelope-based speech in noise.

  12. Effects of Adaptation Rate and Noise Suppression on the Intelligibility of Compressed-Envelope Based Speech.

    Science.gov (United States)

    Lai, Ying-Hui; Tsao, Yu; Chen, Fei

    2015-01-01

    Temporal envelope is the primary acoustic cue used in most cochlear implant (CI) speech processors to elicit speech perception for patients fitted with CI devices. Envelope compression narrows down envelope dynamic range and accordingly degrades speech understanding abilities of CI users, especially under challenging listening conditions (e.g., in noise). A new adaptive envelope compression (AEC) strategy was proposed recently, which in contrast to the traditional static envelope compression, is effective at enhancing the modulation depth of envelope waveform by making best use of its dynamic range and thus improving the intelligibility of envelope-based speech. The present study further explored the effect of adaptation rate in envelope compression on the intelligibility of compressed-envelope based speech. Moreover, since noise reduction is another essential unit in modern CI systems, the compatibility of AEC and noise reduction was also investigated. In this study, listening experiments were carried out by presenting vocoded sentences to normal hearing listeners for recognition. Experimental results demonstrated that the adaptation rate in envelope compression had a notable effect on the speech intelligibility performance of the AEC strategy. By specifying a suitable adaptation rate, speech intelligibility could be enhanced significantly in noise compared to when using static envelope compression. Moreover, results confirmed that the AEC strategy was suitable for combining with noise reduction to improve the intelligibility of envelope-based speech in noise.

  13. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    Directory of Open Access Journals (Sweden)

    İlhan ERDEM

    2013-06-01

    Full Text Available Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered speech, language, medical and psychological conditions as well as acquisitions also be caused by many factors. Speaking, is the collective work of many organs, such as an orchestra. Mental dimension of the speech disorder which is a very complex skill so it must be found which of these obstacles inhibit conversation. Speech disorder is a defect in speech flow, rhythm, tizliğinde, beats, the composition and vocalization. In this study, speech disorders such as articulation disorders, stuttering, aphasia, dysarthria, a local dialect speech, , language and lip-laziness, rapid speech peech defects in a term of language skills. This causes of speech disorders were investigated and presented suggestions for remedy was discussed.

  14. Practical speech user interface design

    CERN Document Server

    Lewis, James R

    2010-01-01

    Although speech is the most natural form of communication between humans, most people find using speech to communicate with machines anything but natural. Drawing from psychology, human-computer interaction, linguistics, and communication theory, Practical Speech User Interface Design provides a comprehensive yet concise survey of practical speech user interface (SUI) design. It offers practice-based and research-based guidance on how to design effective, efficient, and pleasant speech applications that people can really use. Focusing on the design of speech user interfaces for IVR application

  15. Guiding Attention by Cooperative Cues

    Institute of Scientific and Technical Information of China (English)

    KangWoo Lee

    2008-01-01

    A common assumption in visual attention is based on the rationale of "limited capacity of information pro-ceasing". From this view point there is little consideration of how different information channels or modules are cooperating because cells in processing stages are forced to compete for the limited resource. To examine the mechanism behind the cooperative behavior of information channels, a computational model of selective attention is implemented based on two hypotheses. Unlike the traditional view of visual attention, the cooperative behavior is assumed to be a dynamic integration process between the bottom-up and top-down information. Furthermore, top-down information is assumed to provide a contextual cue during selection process and to guide the attentional allocation among many bottom-up candidates. The result from a series of simulation with still and video images showed some interesting properties that could not be explained by the competitive aspect of selective attention alone.

  16. How rats combine temporal cues.

    Science.gov (United States)

    Guilhardi, Paulo; Keen, Richard; MacInnis, Mika L M; Church, Russell M

    2005-05-31

    The procedures for classical and operant conditioning, and for many timing procedures, involve the delivery of reinforcers that may be related to the time of previous reinforcers and responses, and to the time of onsets and terminations of stimuli. The behavior resulting from such procedures can be described as bouts of responding that occur in some pattern at some rate. A packet theory of timing and conditioning is described that accounts for such behavior under a wide range of procedures. Applications include the food searching by rats in Skinner boxes under conditions of fixed and random reinforcement, brief and sustained stimuli, and several response-food contingencies. The approach is used to describe how multiple cues from reinforcers and stimuli combine to determine the rate and pattern of response bouts.

  17. Speech-Language Therapy (For Parents)

    Science.gov (United States)

    ... Speech-language pathologists (SLPs), often informally known as speech therapists, are professionals educated in the study of human ... Palate Hearing Evaluation in Children Going to a Speech Therapist Stuttering Hearing Impairment Speech Problems Cleft Lip and ...

  18. Cross-Cultural Nonverbal Cue Immersive Training

    Science.gov (United States)

    2008-12-01

    Global Assessment Orlando, Florida, 32809 + University of Central Florida Orlando, Florida, 32816 ++ Army Research Institute...technologies incorporating mixed reality training may be used to promote social cooperative learning. 1. INTRODUCTION As a global community...communicated either consciously or unconsciously through various forms of nonverbal cues such as body posture and facial expressions. Nonverbal cues

  19. Dylan Pritchett, Storyteller. Cue Sheet for Students.

    Science.gov (United States)

    Evans, Karen L. B.

    Designed to be used before and after attending a storytelling performance by Dylan Pritchett, this cue sheet presents information about the performance and suggests activities that can be done with classmates, friends, or family members. The cue sheet discusses where and why people tell stories, what makes a story good for telling, what makes a…

  20. Children's recognition of emotions from vocal cues

    NARCIS (Netherlands)

    Sauter, D.A.; Panattoni, C.; Happé, F.

    2013-01-01

    Emotional cues contain important information about the intentions and feelings of others. Despite a wealth of research into children's understanding of facial signals of emotions, little research has investigated the developmental trajectory of interpreting affective cues in the voice. In this study

  1. The minor third communicates sadness in speech, mirroring its use in music.

    Science.gov (United States)

    Curtis, Meagan E; Bharucha, Jamshed J

    2010-06-01

    There is a long history of attempts to explain why music is perceived as expressing emotion. The relationship between pitches serves as an important cue for conveying emotion in music. The musical interval referred to as the minor third is generally thought to convey sadness. We reveal that the minor third also occurs in the pitch contour of speech conveying sadness. Bisyllabic speech samples conveying four emotions were recorded by 9 actresses. Acoustic analyses revealed that the relationship between the 2 salient pitches of the sad speech samples tended to approximate a minor third. Participants rated the speech samples for perceived emotion, and the use of numerous acoustic parameters as cues for emotional identification was modeled using regression analysis. The minor third was the most reliable cue for identifying sadness. Additional participants rated musical intervals for emotion, and their ratings verified the historical association between the musical minor third and sadness. These findings support the theory that human vocal expressions and music share an acoustic code for communicating sadness.

  2. Explicit authenticity and stimulus features interact to modulate BOLD response induced by emotional speech.

    Science.gov (United States)

    Drolet, Matthis; Schubotz, Ricarda I; Fischer, Julia

    2013-06-01

    Context has been found to have a profound effect on the recognition of social stimuli and correlated brain activation. The present study was designed to determine whether knowledge about emotional authenticity influences emotion recognition expressed through speech intonation. Participants classified emotionally expressive speech in an fMRI experimental design as sad, happy, angry, or fearful. For some trials, stimuli were cued as either authentic or play-acted in order to manipulate participant top-down belief about authenticity, and these labels were presented both congruently and incongruently to the emotional authenticity of the stimulus. Contrasting authentic versus play-acted stimuli during uncued trials indicated that play-acted stimuli spontaneously up-regulate activity in the auditory cortex and regions associated with emotional speech processing. In addition, a clear interaction effect of cue and stimulus authenticity showed up-regulation in the posterior superior temporal sulcus and the anterior cingulate cortex, indicating that cueing had an impact on the perception of authenticity. In particular, when a cue indicating an authentic stimulus was followed by a play-acted stimulus, additional activation occurred in the temporoparietal junction, probably pointing to increased load on perspective taking in such trials. While actual authenticity has a significant impact on brain activation, individual belief about stimulus authenticity can additionally modulate the brain response to differences in emotionally expressive speech.

  3. Automatic speech recognition An evaluation of Google Speech

    OpenAIRE

    Stenman, Magnus

    2015-01-01

    The use of speech recognition is increasing rapidly and is now available in smart TVs, desktop computers, every new smart phone, etc. allowing us to talk to computers naturally. With the use in home appliances, education and even in surgical procedures accuracy and speed becomes very important. This thesis aims to give an introduction to speech recognition and discuss its use in robotics. An evaluation of Google Speech, using Google’s speech API, in regards to word error rate and translation ...

  4. Seeing the talker’s face supports executive processing of speech in steady state noise

    Directory of Open Access Journals (Sweden)

    Sushmit eMishra

    2013-11-01

    Full Text Available Listening to speech in noise depletes cognitive resources, affecting speech processing. The present study investigated how remaining resources or cognitive spare capacity (CSC can be deployed by young adults with normal hearing. We administered a test of CSC (CSCT, Mishra et al., 2013 along with a battery of established cognitive tests to 20 participants with normal hearing. In the CSCT, lists of two-digit numbers were presented with and without visual cues in quiet, as well as in steady-state and speech-like noise at a high intelligibility level. In low load conditions, two numbers were recalled according to instructions inducing executive processing (updating, inhibition and in high load conditions the participants were additionally instructed to recall one extra number, which was the always the first item in the list. In line with previous findings, results showed that CSC was sensitive to memory load and executive function but generally not related to working memory capacity. Furthermore, CSCT scores in quiet were lowered by visual cues, probably due to distraction. In steady-state noise, the presence of visual cues improved CSCT scores, probably by enabling better encoding. Contrary to our expectation, CSCT performance was disrupted more in steady-state than speech-like noise, although only without visual cues, possibly because selective attention could be used to ignore the speech-like background and provide an enriched representation of target items in working memory similar to that obtained in quiet. This interpretation is supported by a consistent association between CSCT scores and updating skills.

  5. Cross-linguistic differences in prosodic cues to syntactic disambiguation in German and English.

    Science.gov (United States)

    O'Brien, Mary Grantham; Jackson, Carrie N; Gardner, Christine E

    2014-01-01

    This study examined whether late-learning English-German L2 learners and late-learning German-English L2 learners use prosodic cues to disambiguate temporarily ambiguous L1 and L2 sentences during speech production. Experiments 1a and 1b showed that English-German L2 learners and German-English L2 learners used a pitch rise and pitch accent to disambiguate prepositional phrase-attachment sentences in German. However, the same participants, as well as monolingual English speakers, only used pitch accent to disambiguate similar English sentences. Taken together, these results indicate the L2 learners used prosody to disambiguate sentences in both of their languages and did not fully transfer cues to disambiguation from their L1 to their L2. The results have implications for the acquisition of L2 prosody and the interaction between prosody and meaning in L2 production.

  6. Tackling the complexity in speech

    DEFF Research Database (Denmark)

    section includes four carefully selected chapters. They deal with facets of speech production, speech acoustics, and/or speech perception or recognition, place them in an integrated phonetic-phonological perspective, and relate them in more or less explicit ways to aspects of speech technology. Therefore......, we hope that this volume can help speech scientists with traditional training in phonetics and phonology to keep up with the latest developments in speech technology. In the opposite direction, speech researchers starting from a technological perspective will hopefully get inspired by reading about...... the questions, phenomena, and communicative functions that are currently addressed in phonetics and phonology. Either way, the future of speech research lies in international, interdisciplinary collaborations, and our volume is meant to reflect and facilitate such collaborations...

  7. Denial Denied: Freedom of Speech

    Directory of Open Access Journals (Sweden)

    Glen Newey

    2009-12-01

    Full Text Available Free speech is a widely held principle. This is in some ways surprising, since formal and informal censorship of speech is widespread, and rather different issues seem to arise depending on whether the censorship concerns who speaks, what content is spoken or how it is spoken. I argue that despite these facts, free speech can indeed be seen as a unitary principle. On my analysis, the core of the free speech principle is the denial of the denial of speech, whether to a speaker, to a proposition, or to a mode of expression. Underlying free speech is the principle of freedom of association, according to which speech is both a precondition of future association (e.g. as a medium for negotiation and a mode of association in its own right. I conclude by applying this account briefly to two contentious issues: hate speech and pornography.

  8. Synchronization by the hand: The sight of gestures modulates low-frequency activity in brain responses to continuous speech

    Directory of Open Access Journals (Sweden)

    Emmanuel eBiau

    2015-09-01

    Full Text Available During social interactions, speakers often produce spontaneous gestures to accompany their speech. These coordinated body movements convey communicative intentions, and modulate how listeners perceive the message in a subtle, but important way. In the present perspective, we put the focus on the role that congruent non-verbal information from beat gestures may play in the neural responses to speech. Whilst delta-theta oscillatory brain responses reflect the time-frequency structure of the speech signal, we argue that beat gestures promote phase resetting at relevant word onsets. This mechanism may facilitate the anticipation of associated acoustic cues relevant for prosodic/syllabic-based segmentation in speech perception. We report recently published data supporting this hypothesis, and discuss the potential of beats (and gestures in general for further studies investigating continuous AV speech processing through low-frequency oscillations.

  9. A Letter to the Parent(s) of a Child with Developmental Apraxia of Speech. Part IV: Treatment of DAS.

    Science.gov (United States)

    Hall, Penelope K.

    2000-01-01

    One of a series of letters to parents of children with developmental apraxia of speech (DAS), this letter discusses the treatment of DAS including linguistic approaches, motor-programming approaches, a combination of linguistic and motor-programming approaches, and treatment approaches that include specific sensory and gestural cueing techniques.…

  10. RECOGNISING SPEECH ACTS

    Directory of Open Access Journals (Sweden)

    Phyllis Kaburise

    2012-09-01

    Full Text Available Speech Act Theory (SAT, a theory in pragmatics, is an attempt to describe what happens during linguistic interactions. Inherent within SAT is the idea that language forms and intentions are relatively formulaic and that there is a direct correspondence between sentence forms (for example, in terms of structure and lexicon and the function or meaning of an utterance. The contention offered in this paper is that when such a correspondence does not exist, as in indirect speech utterances, this creates challenges for English second language speakers and may result in miscommunication. This arises because indirect speech acts allow speakers to employ various pragmatic devices such as inference, implicature, presuppositions and context clues to transmit their messages. Such devices, operating within the non-literal level of language competence, may pose challenges for ESL learners.

  11. Speech spectrogram expert

    Energy Technology Data Exchange (ETDEWEB)

    Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

    1983-01-01

    Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

  12. Protection limits on free speech

    Institute of Scientific and Technical Information of China (English)

    李敏

    2014-01-01

    Freedom of speech is one of the basic rights of citizens should receive broad protection, but in the real context of China under what kind of speech can be protected and be restricted, how to grasp between state power and free speech limit is a question worth considering. People tend to ignore the freedom of speech and its function, so that some of the rhetoric cannot be demonstrated in the open debates.

  13. A measure for assessing the effects of audiovisual speech integration.

    Science.gov (United States)

    Altieri, Nicholas; Townsend, James T; Wenger, Michael J

    2014-06-01

    We propose a measure of audiovisual speech integration that takes into account accuracy and response times. This measure should prove beneficial for researchers investigating multisensory speech recognition, since it relates to normal-hearing and aging populations. As an example, age-related sensory decline influences both the rate at which one processes information and the ability to utilize cues from different sensory modalities. Our function assesses integration when both auditory and visual information are available, by comparing performance on these audiovisual trials with theoretical predictions for performance under the assumptions of parallel, independent self-terminating processing of single-modality inputs. We provide example data from an audiovisual identification experiment and discuss applications for measuring audiovisual integration skills across the life span.

  14. Visually guided auditory attention in a dynamic "cocktail-party" speech perception task: ERP evidence for age-related differences.

    Science.gov (United States)

    Getzmann, Stephan; Wascher, Edmund

    2017-02-01

    Speech understanding in the presence of concurring sound is a major challenge especially for older persons. In particular, conversational turn-takings usually result in switch costs, as indicated by declined speech perception after changes in the relevant target talker. Here, we investigated whether visual cues indicating the future position of a target talker may reduce the costs of switching in younger and older adults. We employed a speech perception task, in which sequences of short words were simultaneously presented by three talkers, and analysed behavioural measures and event-related potentials (ERPs). Informative cues resulted in increased performance after a spatial change in target talker compared to uninformative cues, not indicating the future target position. Especially the older participants benefited from knowing the future target position in advance, indicated by reduced response times after informative cues. The ERP analysis revealed an overall reduced N2, and a reduced P3b to changes in the target talker location in older participants, suggesting reduced inhibitory control and context updating. On the other hand, a pronounced frontal late positive complex (f-LPC) to the informative cues indicated increased allocation of attentional resources to changes in target talker in the older group, in line with the decline-compensation hypothesis. Thus, knowing where to listen has the potential to compensate for age-related decline in attentional switching in a highly variable cocktail-party environment.

  15. Action experience changes attention to kinematic cues

    Directory of Open Access Journals (Sweden)

    Courtney eFilippi

    2016-02-01

    Full Text Available The current study used remote corneal reflection eye-tracking to examine the relationship between motor experience and action anticipation in 13-month-old infants. To measure online anticipation of actions infants watched videos where the actor’s hand provided kinematic information (in its orientation about the type of object that the actor was going to reach for. The actor’s hand orientation either matched the orientation of a rod (congruent cue or did not match the orientation of the rod (incongruent cue. To examine relations between motor experience and action anticipation, we used a 2 (reach first vs. observe first x 2 (congruent kinematic cue vs. incongruent kinematic cue between-subjects design. We show that 13-month-old infants in the observe first condition spontaneously generate rapid online visual predictions to congruent hand orientation cues and do not visually anticipate when presented incongruent cues. We further demonstrate that the speed that these infants generate predictions to congruent motor cues is correlated with their own ability to pre-shape their hands. Finally, we demonstrate that following reaching experience, infants generate rapid predictions to both congruent and incongruent hand shape cues—suggesting that short-term experience changes attention to kinematics.

  16. Designing speech for a recipient

    DEFF Research Database (Denmark)

    Fischer, Kerstin

    is investigated on three candidates for so-called ‘simplified registers’: speech to children (also called motherese or baby talk), speech to foreigners (also called foreigner talk) and speech to robots. The volume integrates research from various disciplines, such as psychology, sociolinguistics...

  17. Abortion and compelled physician speech.

    Science.gov (United States)

    Orentlicher, David

    2015-01-01

    Informed consent mandates for abortion providers may infringe the First Amendment's freedom of speech. On the other hand, they may reinforce the physician's duty to obtain informed consent. Courts can promote both doctrines by ensuring that compelled physician speech pertains to medical facts about abortion rather than abortion ideology and that compelled speech is truthful and not misleading.

  18. Awareness of rhythm patterns in speech and music in children with specific language impairments

    Directory of Open Access Journals (Sweden)

    Ruth eCumming

    2015-12-01

    Full Text Available Children with specific language impairments (SLIs show impaired perception and production of language, and also show impairments in perceiving auditory cues to rhythm (amplitude rise time [ART] and sound duration and in tapping to a rhythmic beat. Here we explore potential links between language development and rhythm perception in 45 children with SLI and 50 age-matched controls. We administered three rhythmic tasks, a musical beat detection task, a tapping-to-music task, and a novel music/speech task, which varied rhythm and pitch cues independently or together in both speech and music. Via low-pass filtering, the music sounded as though it was played from a low-quality radio and the speech sounded as though it was muffled (heard behind the door. We report data for all of the SLI children (N = 45, IQ varying, as well as for two independent subgroupings with intact IQ. One subgroup, Pure SLI, had intact phonology and reading (N=16, the other, SLI PPR (N=15, had impaired phonology and reading. When IQ varied (all SLI children, we found significant group differences in all the rhythmic tasks. For the Pure SLI group, there were rhythmic impairments in the tapping task only. For children with SLI and poor phonology (SLI PPR, group differences were found in all of the filtered speech/music AXB tasks. We conclude that difficulties with rhythmic cues in both speech and music are present in children with SLIs, but that some rhythmic measures are more sensitive than others. The data are interpreted within a ‘prosodic phrasing’ hypothesis, and we discuss the potential utility of rhythmic and musical interventions in remediating speech and language difficulties in children.

  19. SPEECH DISORDERS ENCOUNTERED DURING SPEECH THERAPY AND THERAPY TECHNIQUES

    OpenAIRE

    2013-01-01

    Speech which is a physical and mental process, agreed signs and sounds to create a sense of mind to the message that change . Process to identify the sounds of speech it is essential to know the structure and function of various organs which allows to happen the conversation. Speech is a physical and mental process so many factors can lead to speech disorders. Speech disorder can be about language acquisitions as well as it can be caused medical and psychological many factors. Disordered sp...

  20. Cues of maternal condition influence offspring selfishness.

    Directory of Open Access Journals (Sweden)

    Janine W Y Wong

    Full Text Available The evolution of parent-offspring communication was mostly studied from the perspective of parents responding to begging signals conveying information about offspring condition. Parents should respond to begging because of the differential fitness returns obtained from their investment in offspring that differ in condition. For analogous reasons, offspring should adjust their behavior to cues/signals of parental condition: parents that differ in condition pay differential costs of care and, hence, should provide different amounts of food. In this study, we experimentally tested in the European earwig (Forficula auricularia if cues of maternal condition affect offspring behavior in terms of sibling cannibalism. We experimentally manipulated female condition by providing them with different amounts of food, kept nymph condition constant, allowed for nymph exposure to chemical maternal cues over extended time, quantified nymph survival (deaths being due to cannibalism and extracted and analyzed the females' cuticular hydrocarbons (CHC. Nymph survival was significantly affected by chemical cues of maternal condition, and this effect depended on the timing of breeding. Cues of poor maternal condition enhanced nymph survival in early broods, but reduced nymph survival in late broods, and vice versa for cues of good condition. Furthermore, female condition affected the quantitative composition of their CHC profile which in turn predicted nymph survival patterns. Thus, earwig offspring are sensitive to chemical cues of maternal condition and nymphs from early and late broods show opposite reactions to the same chemical cues. Together with former evidence on maternal sensitivities to condition-dependent nymph chemical cues, our study shows context-dependent reciprocal information exchange about condition between earwig mothers and their offspring, potentially mediated by cuticular hydrocarbons.

  1. Hemispheric Asymmetry of Endogenous Neural Oscillations in Young Children: Implications for Hearing Speech In Noise.

    Science.gov (United States)

    Thompson, Elaine C; Woodruff Carr, Kali; White-Schwoch, Travis; Tierney, Adam; Nicol, Trent; Kraus, Nina

    2016-01-25

    Speech signals contain information in hierarchical time scales, ranging from short-duration (e.g., phonemes) to long-duration cues (e.g., syllables, prosody). A theoretical framework to understand how the brain processes this hierarchy suggests that hemispheric lateralization enables specialized tracking of acoustic cues at different time scales, with the left and right hemispheres sampling at short (25 ms; 40 Hz) and long (200 ms; 5 Hz) periods, respectively. In adults, both speech-evoked and endogenous cortical rhythms are asymmetrical: low-frequency rhythms predominate in right auditory cortex, and high-frequency rhythms in left auditory cortex. It is unknown, however, whether endogenous resting state oscillations are similarly lateralized in children. We investigated cortical oscillations in children (3-5 years; N = 65) at rest and tested our hypotheses that this temporal asymmetry is evident early in life and facilitates recognition of speech in noise. We found a systematic pattern of increasing leftward asymmetry for higher frequency oscillations; this pattern was more pronounced in children who better perceived words in noise. The observed connection between left-biased cortical oscillations in phoneme-relevant frequencies and speech-in-noise perception suggests hemispheric specialization of endogenous oscillatory activity may support speech processing in challenging listening environments, and that this infrastructure is present during early childhood.

  2. Cue-Specific Reactivity in Experienced Gamblers

    OpenAIRE

    2009-01-01

    To examine whether gambling cue reactivity is cue-specific, 47 scratch-off lottery players and 47 horse race gamblers were presented with video clips of their preferred and non-preferred modes of gambling, and two control stimuli including an exciting car race and a mental stressor task while heart rates, excitement, and urge to gamble were being measured. Heart rates for both groups of gamblers were highest to the mental stressor and did not differ in response to the other three cues. Excite...

  3. Intelligibility of time-compressed speech: the effect of uniform versus non-uniform time-compression algorithms.

    Science.gov (United States)

    Schlueter, Anne; Lemke, Ulrike; Kollmeier, Birger; Holube, Inga

    2014-03-01

    For assessing hearing aid algorithms, a method is sought to shift the threshold of a speech-in-noise test to (mostly positive) signal-to-noise ratios (SNRs) that allow discrimination across algorithmic settings and are most relevant for hearing-impaired listeners in daily life. Hence, time-compressed speech with higher speech rates was evaluated to parametrically increase the difficulty of the test while preserving most of the relevant acoustical speech cues. A uniform and a non-uniform algorithm were used to compress the sentences of the German Oldenburg Sentence Test at different speech rates. In comparison, the non-uniform algorithm exhibited greater deviations from the targeted time compression, as well as greater changes of the phoneme duration, spectra, and modulation spectra. Speech intelligibility for fast Oldenburg sentences in background noise at different SNRs was determined with 48 normal-hearing listeners. The results confirmed decreasing intelligibility with increasing speech rate. Speech had to be compressed to more than 30% of its original length to reach 50% intelligibility at positive SNRs. Characteristics influencing the discrimination ability of the test for assessing effective SNR changes were investigated. Subjective and objective measures indicated a clear advantage of the uniform algorithm in comparison to the non-uniform algorithm for the application in speech-in-noise tests.

  4. Speech transmission index from running speech: A neural network approach

    Science.gov (United States)

    Li, F. F.; Cox, T. J.

    2003-04-01

    Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

  5. Social orienting of children with autism to facial expressions and speech: a study with a wearable eye-tracker in naturalistic settings

    OpenAIRE

    2013-01-01

    This study investigates attention orienting to social stimuli in children with Autism Spectrum Conditions (ASC) during dyadic social interactions taking place in real-life settings. We study the effect of social cues that differ in complexity and distinguish between social cues produced by facial expressions of emotion and those produced during speech. We record the children's gazes using a head-mounted eye-tracking device and report on a detailed and quantitative analysis of the motion...

  6. Speech After Banquet

    Science.gov (United States)

    Yang, Chen Ning

    2013-05-01

    I am usually not so short of words, but the previous speeches have rendered me really speechless. I have known and admired the eloquence of Freeman Dyson, but I did not know that there is a hidden eloquence in my colleague George Sterman...

  7. Speech and Hearing Therapy.

    Science.gov (United States)

    Sakata, Reiko; Sakata, Robert

    1978-01-01

    In the public school, the speech and hearing therapist attempts to foster child growth and development through the provision of services basic to awareness of self and others, management of personal and social interactions, and development of strategies for coping with the handicap. (MM)

  8. The Commercial Speech Doctrine.

    Science.gov (United States)

    Luebke, Barbara F.

    In its 1942 ruling in the "Valentine vs. Christensen" case, the Supreme Court established the doctrine that commercial speech is not protected by the First Amendment. In 1975, in the "Bigelow vs. Virginia" case, the Supreme Court took a decisive step toward abrogating that doctrine, by ruling that advertising is not stripped of…

  9. Perception of Emotion in Conversational Speech by Younger and Older Listeners

    Directory of Open Access Journals (Sweden)

    Juliane eSchmidt

    2016-05-01

    Full Text Available This study investigated whether age and/or differences in hearing sensitivity influence the perception of the emotion dimensions arousal (calm vs. aroused and valence (positive vs. negative attitude in conversational speech. To that end, this study specifically focused on the relationship between participants’ ratings of short affective utterances and the utterances’ acoustic parameters (pitch, intensity, and articulation rate known to be associated with the emotion dimensions arousal and valence. Stimuli consisted of short utterances taken from a corpus of conversational speech. In two rating tasks, younger and older adults either rated arousal or valence using a 5-point scale. Mean intensity was found to be the main cue participants used in the arousal task (i.e., higher mean intensity cueing higher levels of arousal while mean F0 was the main cue in the valence task (i.e., higher mean F0 being interpreted as more negative. Even though there were no overall age group differences in arousal or valence ratings, compared to younger adults, older adults responded less strongly to mean intensity differences cueing arousal and responded more strongly to differences in mean F0 cueing valence. Individual hearing sensitivity among the older adults did not modify the use of mean intensity as an arousal cue. However, individual hearing sensitivity generally affected valence ratings and modified the use of mean F0. We conclude that age differences in the interpretation of mean F0 as a cue for valence are likely due to age-related hearing loss, whereas age differences in rating arousal do not seem to be driven by hearing sensitivity differences between age groups (as measured by pure-tone audiometry.

  10. Kin-informative recognition cues in ants

    DEFF Research Database (Denmark)

    Nehring, Volker; Evison, Sophie E F; Santorelli, Lorenzo A

    2011-01-01

    Although social groups are characterized by cooperation, they are also often the scene of conflict. In non-clonal systems, the reproductive interests of group members will differ and individuals may benefit by exploiting the cooperative efforts of other group members. However, such selfish...... behaviour is thought to be rare in one of the classic examples of cooperation--social insect colonies--because the colony-level costs of individual selfishness select against cues that would allow workers to recognize their closest relatives. In accord with this, previous studies of wasps and ants have...... found little or no kin information in recognition cues. Here, we test the hypothesis that social insects do not have kin-informative recognition cues by investigating the recognition cues and relatedness of workers from four colonies of the ant Acromyrmex octospinosus. Contrary to the theoretical...

  11. Metaheuristic applications to speech enhancement

    CERN Document Server

    Kunche, Prajna

    2016-01-01

    This book serves as a basic reference for those interested in the application of metaheuristics to speech enhancement. The major goal of the book is to explain the basic concepts of optimization methods and their use in heuristic optimization in speech enhancement to scientists, practicing engineers, and academic researchers in speech processing. The authors discuss why it has been a challenging problem for researchers to develop new enhancement algorithms that aid in the quality and intelligibility of degraded speech. They present powerful optimization methods to speech enhancement that can help to solve the noise reduction problems. Readers will be able to understand the fundamentals of speech processing as well as the optimization techniques, how the speech enhancement algorithms are implemented by utilizing optimization methods, and will be given the tools to develop new algorithms. The authors also provide a comprehensive literature survey regarding the topic.

  12. Keeping eyes peeled: guppies exposed to chemical alarm cue are more responsive to ambiguous visual cues

    OpenAIRE

    Stephenson, Jessica Frances

    2016-01-01

    Abstract Information received from the visual and chemical senses is qualitatively different. For prey species in aquatic environments, visual cues are spatially and temporally reliable but risky as the prey and predator must often be in close proximity. Chemical cues, by contrast, can be distorted by currents or linger and thus provide less reliable spatial and temporal information, but can be detected from a safe distance. Chemical cues are therefore often the first detected and may provide...

  13. A Mobile Phone based Speech Therapist

    OpenAIRE

    Pandey, Vinod K.; Pande, Arun; Kopparapu, Sunil Kumar

    2016-01-01

    Patients with articulatory disorders often have difficulty in speaking. These patients need several speech therapy sessions to enable them speak normally. These therapy sessions are conducted by a specialized speech therapist. The goal of speech therapy is to develop good speech habits as well as to teach how to articulate sounds the right way. Speech therapy is critical for continuous improvement to regain normal speech. Speech therapy sessions require a patient to travel to a hospital or a ...

  14. Gender differences in craving and cue reactivity to smoking and negative affect/stress cues.

    Science.gov (United States)

    Saladin, Michael E; Gray, Kevin M; Carpenter, Matthew J; LaRowe, Steven D; DeSantis, Stacia M; Upadhyaya, Himanshu P

    2012-01-01

    There is evidence that women may be less successful when attempting to quit smoking than men. One potential contributory cause of this gender difference is differential craving and stress reactivity to smoking- and negative affect/stress-related cues. The present human laboratory study investigated the effects of gender on reactivity to smoking and negative affect/stress cues by exposing nicotine dependent women (n = 37) and men (n = 53) smokers to two active cue types, each with an associated control cue: (1) in vivo smoking cues and in vivo neutral control cues, and (2) imagery-based negative affect/stress script and a neutral/relaxing control script. Both before and after each cue/script, participants provided subjective reports of smoking-related craving and affective reactions. Heart rate (HR) and skin conductance (SC) responses were also measured. Results indicated that participants reported greater craving and SC in response to smoking versus neutral cues and greater subjective stress in response to the negative affect/stress versus neutral/relaxing script. With respect to gender differences, women evidenced greater craving, stress and arousal ratings and lower valence ratings (greater negative emotion) in response to the negative affect/stressful script. While there were no gender differences in responses to smoking cues, women trended towards higher arousal ratings. Implications of the findings for treatment and tobacco-related morbidity and mortality are discussed.

  15. Nipping cue reactivity in the bud: baclofen prevents limbic activation elicited by subliminal drug cues.

    Science.gov (United States)

    Young, Kimberly A; Franklin, Teresa R; Roberts, David C S; Jagannathan, Kanchana; Suh, Jesse J; Wetherill, Reagan R; Wang, Ze; Kampman, Kyle M; O'Brien, Charles P; Childress, Anna Rose

    2014-04-02

    Relapse is a widely recognized and difficult to treat feature of the addictions. Substantial evidence implicates cue-triggered activation of the mesolimbic dopamine system as an important contributing factor. Even drug cues presented outside of conscious awareness (i.e., subliminally) produce robust activation within this circuitry, indicating the sensitivity and vulnerability of the brain to potentially problematic reward signals. Because pharmacological agents that prevent these early cue-induced responses could play an important role in relapse prevention, we examined whether baclofen-a GABAB receptor agonist that reduces mesolimbic dopamine release and conditioned drug responses in laboratory animals-could inhibit mesolimbic activation elicited by subliminal cocaine cues in cocaine-dependent individuals. Twenty cocaine-dependent participants were randomized to receive baclofen (60 mg/d; 20 mg t.i.d.) or placebo. Event-related BOLD fMRI and a backward-masking paradigm were used to examine the effects of baclofen on subliminal cocaine (vs neutral) cues. Sexual and aversive cues were included to examine specificity. We observed that baclofen-treated participants displayed significantly less activation in response to subliminal cocaine (vs neutral) cues, but not sexual or aversive (vs neutral) cues, than placebo-treated participants in a large interconnected bilateral cluster spanning the ventral striatum, ventral pallidum, amygdala, midbrain, and orbitofrontal cortex (voxel threshold p baclofen may inhibit the earliest type of drug cue-induced motivational processing-that which occurs outside of awareness-before it evolves into a less manageable state.

  16. Learning foreign sounds in an alien world: videogame training improves non-native speech categorization.

    Science.gov (United States)

    Lim, Sung-joo; Holt, Lori L

    2011-01-01

    Although speech categories are defined by multiple acoustic dimensions, some are perceptually weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: Increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information, and players' responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5h across 5 days exhibited improvements in /r/-/l/ perception on par with 2-4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights.

  17. Speech Motor Control in Fluent and Dysfluent Speech Production of an Individual with Apraxia of Speech and Broca's Aphasia

    Science.gov (United States)

    van Lieshout, Pascal H. H. M.; Bose, Arpita; Square, Paula A.; Steele, Catriona M.

    2007-01-01

    Apraxia of speech (AOS) is typically described as a motor-speech disorder with clinically well-defined symptoms, but without a clear understanding of the underlying problems in motor control. A number of studies have compared the speech of subjects with AOS to the fluent speech of controls, but only a few have included speech movement data and if…

  18. Direct and Indirect Cues to Knowledge States during Word Learning

    Science.gov (United States)

    Saylor, Megan M.; Carroll, C. Brooke

    2009-01-01

    The present study investigated three-year-olds' sensitivity to direct and indirect cues to others' knowledge states for word learning purposes. Children were given either direct, physical cues to knowledge or indirect, verbal cues to knowledge. Preschoolers revealed a better ability to learn words from a speaker following direct, physical cues to…

  19. The relative importance of temporal envelope information for intelligibility prediction: a study on cochlear-implant vocoded speech.

    Science.gov (United States)

    Chen, Fei

    2011-10-01

    Vocoder simulation has been long applied as an effective tool to assess factors influencing the intelligibility of cochlear implants listeners. Considering that the temporal envelope information contained in contiguous bands of vocoded speech is correlated and redundant, this study examined the hypothesis that the intelligibility measure evaluating the distortions from a small number of selected envelope cues is sufficient to well predict the intelligibility scores. The speech intelligibility data from 80 conditions was collected from vocoder simulation experiments involving 22 normal-hearing listeners. The relative importance of temporal envelope information in cochlear-implant vocoded speech was modeled by correlating its speech-transmission indices (STIs) with the intelligibility scores. The relative importance pattern was subsequently utilized to determine a binary weight vector for STIs of all envelopes to compute the index predicting the speech intelligibility. A high correlation (r=0.95) was obtained when selecting a small number (e.g., 4 out of 20) of temporal envelope cues from disjoint bands to predict the intelligibility of cochlear-implant vocoded speech.

  20. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals

    Directory of Open Access Journals (Sweden)

    Shahram Moradi

    2016-06-01

    Full Text Available The present study compared elderly hearing aid (EHA users (n = 20 with elderly normal-hearing (ENH listeners (n = 20 in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context.

  1. Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition.

    Science.gov (United States)

    Stevenson, Ryan A; Nelms, Caitlin E; Baum, Sarah H; Zurkovsky, Lilia; Barense, Morgan D; Newhouse, Paul A; Wallace, Mark T

    2015-01-01

    Over the next 2 decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio. For whole-word recognition, older adults relative to younger adults showed greater multisensory gains at intermediate SNRs but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as signal-to-noise ratio decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy-aging populations and that deficits begin to emerge only at the more complex word-recognition level of speech signals.

  2. Deficits in audiovisual speech perception in normal aging emerge at the level of whole-word recognition

    Science.gov (United States)

    Stevenson, Ryan A.; Nelms, Caitlin; Baum, Sarah H.; Zurkovsky, Lilia; Barense, Morgan D.; Newhouse, Paul A.; Wallace, Mark T.

    2014-01-01

    Over the next two decades, a dramatic shift in the demographics of society will take place, with a rapid growth in the population of older adults. One of the most common complaints with healthy aging is a decreased ability to successfully perceive speech, particularly in noisy environments. In such noisy environments, the presence of visual speech cues (i.e., lip movements) provide striking benefits for speech perception and comprehension, but previous research suggests that older adults gain less from such audiovisual integration than their younger peers. To determine at what processing level these behavioral differences arise in healthy-aging populations, we administered a speech-in-noise task to younger and older adults. We compared the perceptual benefits of having speech information available in both the auditory and visual modalities and examined both phoneme and whole-word recognition across varying levels of signal-to-noise ratio (SNR). For whole-word recognition, older relative to younger adults showed greater multisensory gains at intermediate SNRs, but reduced benefit at low SNRs. By contrast, at the phoneme level both younger and older adults showed approximately equivalent increases in multisensory gain as SNR decreased. Collectively, the results provide important insights into both the similarities and differences in how older and younger adults integrate auditory and visual speech cues in noisy environments, and help explain some of the conflicting findings in previous studies of multisensory speech perception in healthy aging. These novel findings suggest that audiovisual processing is intact at more elementary levels of speech perception in healthy aging populations, and that deficits begin to emerge only at the more complex, word-recognition level of speech signals. PMID:25282337

  3. Sensorimotor Interactions in Speech Learning

    Directory of Open Access Journals (Sweden)

    Douglas M Shiller

    2011-10-01

    Full Text Available Auditory input is essential for normal speech development and plays a key role in speech production throughout the life span. In traditional models, auditory input plays two critical roles: 1 establishing the acoustic correlates of speech sounds that serve, in part, as the targets of speech production, and 2 as a source of feedback about a talker's own speech outcomes. This talk will focus on both of these roles, describing a series of studies that examine the capacity of children and adults to adapt to real-time manipulations of auditory feedback during speech production. In one study, we examined sensory and motor adaptation to a manipulation of auditory feedback during production of the fricative “s”. In contrast to prior accounts, adaptive changes were observed not only in speech motor output but also in subjects' perception of the sound. In a second study, speech adaptation was examined following a period of auditory–perceptual training targeting the perception of vowels. The perceptual training was found to systematically improve subjects' motor adaptation response to altered auditory feedback during speech production. The results of both studies support the idea that perceptual and motor processes are tightly coupled in speech production learning, and that the degree and nature of this coupling may change with development.

  4. An Eye Tracking Comparison of External Pointing Cues and Internal Continuous Cues in Learning with Complex Animations

    Science.gov (United States)

    Boucheix, Jean-Michel; Lowe, Richard K.

    2010-01-01

    Two experiments used eye tracking to investigate a novel cueing approach for directing learner attention to low salience, high relevance aspects of a complex animation. In the first experiment, comprehension of a piano mechanism animation containing spreading-colour cues was compared with comprehension obtained with arrow cues or no cues. Eye…

  5. Variation and Synthetic Speech

    CERN Document Server

    Miller, C; Massey, N; Miller, Corey; Karaali, Orhan; Massey, Noel

    1997-01-01

    We describe the approach to linguistic variation taken by the Motorola speech synthesizer. A pan-dialectal pronunciation dictionary is described, which serves as the training data for a neural network based letter-to-sound converter. Subsequent to dictionary retrieval or letter-to-sound generation, pronunciations are submitted a neural network based postlexical module. The postlexical module has been trained on aligned dictionary pronunciations and hand-labeled narrow phonetic transcriptions. This architecture permits the learning of individual postlexical variation, and can be retrained for each speaker whose voice is being modeled for synthesis. Learning variation in this way can result in greater naturalness for the synthetic speech that is produced by the system.

  6. Speech is Golden

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter

    2014-01-01

    Most of the Danish municipalities are ready to begin to adopt automatic speech recognition, but at the same time remain nervous following a long series of bad business cases in the recent past. Complaints are voiced over costly licences and low service levels, typical effects of a de facto monopoly...... on the supply side. The present article reports on a new public action strategy which has taken shape in the course of 2013-14. While Denmark is a small language area, our public sector is well organised and has considerable purchasing power. Across this past year, Danish local authorities have organised around...... of the present article, in the role of economically neutral advisers. The aim of the initiative is to pave the way for the first profitable contract in the field - which we hope to see in 2014 - an event which would precisely break the present deadlock and open up a billion EUR market for speech technology...

  7. Relative saliency of pitch versus phonetic cues in infancy

    Science.gov (United States)

    Cardillo, Gina; Kuhl, Patricia; Sundara, Megha

    2005-09-01

    Infants in their first year are highly sensitive to different acoustic components of speech, including phonetic detail and pitch information. The present investigation examined whether relative sensitivity to these two dimensions changes during this period, as the infant acquires language-specific phonetic categories. If pitch and phonetic discrimination are hierarchical, then the relative salience of pitch and phonetic change may become reversed between 8 and 12 months of age. Thirty-two- and 47-week-old infants were tested using an auditory preference paradigm in which they first heard a recording of a person singing a 4-note song (i.e., ``go-bi-la-tu'') and were then presented with both the familiar and an unfamiliar, modified version of that song. Modifications were either a novel pitch order (keeping syllables constant) or a novel syllable order (keeping melody constant). Compared to the younger group, older infants were predicted to show greater relative sensitivity to syllable order than pitch order, in accordance with an increased tendency to attend to linguistically relevant information (phonetic patterns) as opposed to cues that are initially more salient (pitch patterns). Preliminary data show trends toward the predicted interaction, with preference patterns commensurate with previously reported data. [Work supported by the McDonnell Foundation and NIH.

  8. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  9. Neurophysiology of speech differences in childhood apraxia of speech.

    Science.gov (United States)

    Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

    2014-01-01

    Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes.

  10. Hiding Information under Speech

    Science.gov (United States)

    2005-12-12

    as it arrives in real time, and it disappears as fast as it arrives. Furthermore, our cognitive process for translating audio sounds to the meaning... steganography , whose goal is to make the embedded data completely undetectable. In addi- tion, we must dismiss the idea of hiding data by using any...therefore, an image has more room to hide data; and (2) speech steganography has not led to many money-making commercial businesses. For these two

  11. Speech Quality Measurement

    Science.gov (United States)

    1977-06-10

    noise test , t=2 for t1-v low p’ass f lit er te st ,and t 3 * or theit ADP(NI cod ing tevst ’*s is the sub lec nube 0l e tet Bostz- Av L b U0...a 1ý...it aepa rate, speech clu.1 t laboratory and controlled by the NOVA 830 computoer . Bach of tho stations has a CRT, .15 response buttons, a "rad button

  12. Contribution of auditory working memory to speech understanding in mandarin-speaking cochlear implant users.

    Directory of Open Access Journals (Sweden)

    Duoduo Tao

    importance of voice pitch cues (albeit poorly coded by the CI did not influence the relationship between working memory and speech perception.

  13. Investigation of objective measures for intelligibility prediction of noise-reduced speech for Chinese, Japanese, and English.

    Science.gov (United States)

    Li, Junfeng; Xia, Risheng; Ying, Dongwen; Yan, Yonghong; Akagi, Masato

    2014-12-01

    Many objective measures have been reported to predict speech intelligibility in noise, most of which were designed and evaluated with English speech corpora. Given the different perceptual cues used by native listeners of different languages, examining whether there is any language effect when the same objective measure is used to predict speech intelligibility in different languages is of great interest, particularly when non-linear noise-reduction processing is involved. In the present study, an extensive evaluation is taken of objective measures for speech intelligibility prediction of noisy speech processed by noise-reduction algorithms in Chinese, Japanese, and English. Of all the objective measures tested, the short-time objective intelligibility (STOI) measure produced the most accurate results in speech intelligibility prediction for Chinese, while the normalized covariance metric (NCM) and middle-level coherence speech intelligibility index ( CSIIm) incorporating the signal-dependent band-importance functions (BIFs) produced the most accurate results for Japanese and English, respectively. The objective measures that performed best in predicting the effect of non-linear noise-reduction processing in speech intelligibility were found to be the BIF-modified NCM measure for Chinese, the STOI measure for Japanese, and the BIF-modified CSIIm measure for English. Most of the objective measures examined performed differently even under the same conditions for different languages.

  14. Biophysical Cueing and Vascular Endothelial Cell Behavior

    Directory of Open Access Journals (Sweden)

    Joshua A. Wood

    2010-03-01

    Full Text Available Human vascular endothelial cells (VEC line the vessels of the body and are critical for the maintenance of vessel integrity and trafficking of biochemical cues. They are fundamental structural elements and are central to the signaling environment. Alterations in the normal functioning of the VEC population are associated with a number of vascular disorders among which are some of the leading causes of death in both the United States and abroad. VECs attach to their underlying stromal elements through a specialization of the extracellular matrix, the basement membrane. The basement membrane provides signaling cues to the VEC through its chemical constituents, by serving as a reservoir for cytoactive factors and through its intrinsic biophysical properties. This specialized matrix is composed of a topographically rich 3D felt-like network of fibers and pores on the nano (1–100 nm and submicron (100–1,000 nm size scale. The basement membrane provides biophysical cues to the overlying VECs through its intrinsic topography as well as through its local compliance (relative stiffness. These biophysical cues modulate VEC adhesion, migration, proliferation, differentiation, and the cytoskeletal signaling network of the individual cells. This review focuses on the impact of biophysical cues on VEC behaviors and demonstrates the need for their consideration in future vascular studies and the design of improved prosthetics.

  15. Speech recognition in university classrooms

    OpenAIRE

    Wald, Mike; Bain, Keith; Basson, Sara H

    2002-01-01

    The LIBERATED LEARNING PROJECT (LLP) is an applied research project studying two core questions: 1) Can speech recognition (SR) technology successfully digitize lectures to display spoken words as text in university classrooms? 2) Can speech recognition technology be used successfully as an alternative to traditional classroom notetaking for persons with disabilities? This paper addresses these intriguing questions and explores the underlying complex relationship between speech recognition te...

  16. Speech Recognition on Mobile Devices

    DEFF Research Database (Denmark)

    Tan, Zheng-Hua; Lindberg, Børge

    2010-01-01

    The enthusiasm of deploying automatic speech recognition (ASR) on mobile devices is driven both by remarkable advances in ASR technology and by the demand for efficient user interfaces on such devices as mobile phones and personal digital assistants (PDAs). This chapter presents an overview of ASR...... in the mobile context covering motivations, challenges, fundamental techniques and applications. Three ASR architectures are introduced: embedded speech recognition, distributed speech recognition and network speech recognition. Their pros and cons and implementation issues are discussed. Applications within...... command and control, text entry and search are presented with an emphasis on mobile text entry....

  17. Effect of stimuli presentation method on perception of room size using only acoustic cues

    Science.gov (United States)

    Hunt, Jeffrey Barnabas

    People listen to music and speech in a large variety of rooms and many room parameters, including the size of the room, can drastically affect how well the speech is understood or the music enjoyed. While multi-modal (typically hearing and sight) tests may be more realistic, in order to isolate what acoustic cues listeners use to determine the size of a room, a listening-only tests is conducted here. Nearly all of the studies to-date on the perception of room volume using acoustic cues have presented the stimuli only over headphones and these studies have reported that, in most cases, the perceived room volume is more highly correlated with the perceived reverberation (reverberance) than with actual room volume. While reverberance may be a salient acoustic cue used for the determination or room size, the actual sound field in a room is not accurately reproduced when presented over headphones and it is thought that some of the complexities of the sound field that relate to perception of geometric volume, specifically directional information of reflections, may be lost. It is possible that the importance of reverberance may be overemphasized when using only headphones to present stimuli so a comparison of room-size perception is proposed where the sound field (from modeled and recorded impulse responses) is presented both over headphones and also over a surround system using higher order ambisonics to more accurately produce directional sound information. Major results are that, in this study, no difference could be seen between the two presentation methods and that reverberation time is highly correlated to room-size perception while real room size is not.

  18. Huntington's Disease: Speech, Language and Swallowing

    Science.gov (United States)

    ... Disease Society of America Huntington's Disease Youth Organization Movement Disorder Society National Institute of Neurological Disorders and Stroke Typical Speech and Language Development Learning More Than One Language Adult Speech and Language Child Speech and Language Swallowing ...

  19. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Dansereau Richard M

    2007-01-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  20. Differential Gaze Patterns on Eyes and Mouth During Audiovisual Speech Segmentation.

    Science.gov (United States)

    Lusk, Laina G; Mitchel, Aaron D

    2016-01-01

    Speech is inextricably multisensory: both auditory and visual components provide critical information for all aspects of speech processing, including speech segmentation, the visual components of which have been the target of a growing number of studies. In particular, a recent study (Mitchel and Weiss, 2014) established that adults can utilize facial cues (i.e., visual prosody) to identify word boundaries in fluent speech. The current study expanded upon these results, using an eye tracker to identify highly attended facial features of the audiovisual display used in Mitchel and Weiss (2014). Subjects spent the most time watching the eyes and mouth. A significant trend in gaze durations was found with the longest gaze duration on the mouth, followed by the eyes and then the nose. In addition, eye-gaze patterns changed across familiarization as subjects learned the word boundaries, showing decreased attention to the mouth in later blocks while attention on other facial features remained consistent. These findings highlight the importance of the visual component of speech processing and suggest that the mouth may play a critical role in visual speech segmentation.

  1. A Maximum Likelihood Estimation of Vocal-Tract-Related Filter Characteristics for Single Channel Speech Separation

    Directory of Open Access Journals (Sweden)

    Mohammad H. Radfar

    2006-11-01

    Full Text Available We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA. For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

  2. High visual resolution matters in audiovisual speech perception, but only for some.

    Science.gov (United States)

    Alsius, Agnès; Wayne, Rachel V; Paré, Martin; Munhall, Kevin G

    2016-07-01

    The basis for individual differences in the degree to which visual speech input enhances comprehension of acoustically degraded speech is largely unknown. Previous research indicates that fine facial detail is not critical for visual enhancement when auditory information is available; however, these studies did not examine individual differences in ability to make use of fine facial detail in relation to audiovisual speech perception ability. Here, we compare participants based on their ability to benefit from visual speech information in the presence of an auditory signal degraded with noise, modulating the resolution of the visual signal through low-pass spatial frequency filtering and monitoring gaze behavior. Participants who benefited most from the addition of visual information (high visual gain) were more adversely affected by the removal of high spatial frequency information, compared to participants with low visual gain, for materials with both poor and rich contextual cues (i.e., words and sentences, respectively). Differences as a function of gaze behavior between participants with the highest and lowest visual gains were observed only for words, with participants with the highest visual gain fixating longer on the mouth region. Our results indicate that the individual variance in audiovisual speech in noise performance can be accounted for, in part, by better use of fine facial detail information extracted from the visual signal and increased fixation on mouth regions for short stimuli. Thus, for some, audiovisual speech perception may suffer when the visual input (in addition to the auditory signal) is less than perfect.

  3. Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language

    Science.gov (United States)

    Narayanan, Shrikanth; Georgiou, Panayiotis G.

    2013-01-01

    The expression and experience of human behavior are complex and multimodal and characterized by individual and contextual heterogeneity and variability. Speech and spoken language communication cues offer an important means for measuring and modeling human behavior. Observational research and practice across a variety of domains from commerce to healthcare rely on speech- and language-based informatics for crucial assessment and diagnostic information and for planning and tracking response to an intervention. In this paper, we describe some of the opportunities as well as emerging methodologies and applications of human behavioral signal processing (BSP) technology and algorithms for quantitatively understanding and modeling typical, atypical, and distressed human behavior with a specific focus on speech- and language-based communicative, affective, and social behavior. We describe the three important BSP components of acquiring behavioral data in an ecologically valid manner across laboratory to real-world settings, extracting and analyzing behavioral cues from measured data, and developing models offering predictive and decision-making support. We highlight both the foundational speech and language processing building blocks as well as the novel processing and modeling opportunities. Using examples drawn from specific real-world applications ranging from literacy assessment and autism diagnostics to psychotherapy for addiction and marital well being, we illustrate behavioral informatics applications of these signal processing techniques that contribute to quantifying higher level, often subjectively described, human behavior in a domain-sensitive fashion. PMID:24039277

  4. Hearing impairment and audiovisual speech integration ability: a case study report.

    Science.gov (United States)

    Altieri, Nicholas; Hudock, Daniel

    2014-01-01

    Research in audiovisual speech perception has demonstrated that sensory factors such as auditory and visual acuity are associated with a listener's ability to extract and combine auditory and visual speech cues. This case study report examined audiovisual integration using a newly developed measure of capacity in a sample of hearing-impaired listeners. Capacity assessments are unique because they examine the contribution of reaction-time (RT) as well as accuracy to determine the extent to which a listener efficiently combines auditory and visual speech cues relative to independent race model predictions. Multisensory speech integration ability was examined in two experiments: an open-set sentence recognition and a closed set speeded-word recognition study that measured capacity. Most germane to our approach, capacity illustrated speed-accuracy tradeoffs that may be predicted by audiometric configuration. Results revealed that some listeners benefit from increased accuracy, but fail to benefit in terms of speed on audiovisual relative to unisensory trials. Conversely, other listeners may not benefit in the accuracy domain but instead show an audiovisual processing time benefit.

  5. Parameter masks for close talk speech segregation using deep neural networks

    Directory of Open Access Journals (Sweden)

    Jiang Yi

    2015-01-01

    Full Text Available A deep neural networks (DNN based close talk speech segregation algorithm is introduced. One nearby microphone is used to collect the target speech as close talk indicated, and another microphone is used to get the noise in environments. The time and energy difference between the two microphones signal is used as the segregation cue. A DNN estimator on each frequency channel is used to calculate the parameter masks. The parameter masks represent the target speech energy in each time frequency (T-F units. Experiment results show the good performance of the proposed system. The signal to noise ratio (SNR improvement is 8.1 dB on 0 dB noisy environment.

  6. An Improved Speech Enhancement Method based on Teager Energy Operator and Perceptual Wavelet Packet Decomposition

    Directory of Open Access Journals (Sweden)

    Huan Zhao

    2011-06-01

    Full Text Available According to the distribution characteristic of noise and clean speech signal in the frequency domain, a new speech enhancement method based on teager energy operator (TEO and perceptual wavelet packet decomposition (PWPD is proposed. Firstly, a modified Mask construction method is made to protect the acoustic cues at the low frequencies. Then a level-dependent parameter is introduced to further adjust the thresholds in light of the noise distribution feature. At last the sub-bands which have very little influence are set directly 0 to improve the signal-to-noise ratio (SNR and reduce the computation load. Simulation results show that, under different kinds of noise environments, this new method not only enhances the signal-to-noise ratio (SNR and perceptual evaluation of speech quality (PESQ, but also reduces the computation load, which is very advantageous for real-time realizing.

  7. Coordinated sensor cueing for chemical plume detection

    Science.gov (United States)

    Abraham, Nathan J.; Jensenius, Andrea M.; Watkins, Adam S.; Hawthorne, R. Chad; Stepnitz, Brian J.

    2011-05-01

    This paper describes an organic data fusion and sensor cueing approach for Chemical, Biological, Radiological, and Nuclear (CBRN) sensors. The Joint Warning and Reporting Network (JWARN) uses a hardware component referred to as the JWARN Component Interface Device (JCID). The Edgewood Chemical and Biological Center has developed a small footprint and open architecture solution for the JCID capability called JCID-on-a-Chip (JoaC). The JoaC program aims to reduce the cost and complexity of the JCID by shrinking the necessary functionality down to a small single board computer. This effort focused on development of a fusion and cueing algorithm organic to the JoaC hardware. By embedding this capability in the JoaC, sensors have the ability to receive and process cues from other sensors without the use of a complex and costly centralized infrastructure. Additionally, the JoaC software is hardware agnostic, as evidenced by its drop-in inclusion in two different system-on-a-chip platforms including Windows CE and LINUX environments. In this effort, a partnership between JPM-CA, JHU/APL, and the Edgewood Chemical and Biological Center (ECBC), the authors implemented and demonstrated a new algorithm for cooperative detection and localization of a chemical agent plume. This experiment used a pair of mobile Joint Services Lightweight Standoff Chemical Agent Detector (JSLSCAD) units which were controlled by fusion and cueing algorithms hosted on a JoaC. The algorithms embedded in the JoaC enabled the two sensor systems to perform cross cueing and cooperatively form a higher fidelity estimate of chemical releases by combining sensor readings. Additionally, each JSLSCAD had the ability to focus its search on smaller regions than those required by a single sensor system by using the cross cue information from the other sensor.

  8. Suppression of the µ rhythm during speech and non-speech discrimination revealed by independent component analysis: implications for sensorimotor integration in speech processing.

    Directory of Open Access Journals (Sweden)

    Andrew Bowers

    Full Text Available BACKGROUND: Constructivist theories propose that articulatory hypotheses about incoming phonetic targets may function to enhance perception by limiting the possibilities for sensory analysis. To provide evidence for this proposal, it is necessary to map ongoing, high-temporal resolution changes in sensorimotor activity (i.e., the sensorimotor μ rhythm to accurate speech and non-speech discrimination performance (i.e., correct trials.. METHODS: Sixteen participants (15 female and 1 male were asked to passively listen to or actively identify speech and tone-sweeps in a two-force choice discrimination task while the electroencephalograph (EEG was recorded from 32 channels. The stimuli were presented at signal-to-noise ratios (SNRs in which discrimination accuracy was high (i.e., 80-100% and low SNRs producing discrimination performance at chance. EEG data were decomposed using independent component analysis and clustered across participants using principle component methods in EEGLAB. RESULTS: ICA revealed left and right sensorimotor µ components for 14/16 and 13/16 participants respectively that were identified on the basis of scalp topography, spectral peaks, and localization to the precentral and postcentral gyri. Time-frequency analysis of left and right lateralized µ component clusters revealed significant (pFDR<.05 suppression in the traditional beta frequency range (13-30 Hz prior to, during, and following syllable discrimination trials. No significant differences from baseline were found for passive tasks. Tone conditions produced right µ beta suppression following stimulus onset only. For the left µ, significant differences in the magnitude of beta suppression were found for correct speech discrimination trials relative to chance trials following stimulus offset. CONCLUSIONS: Findings are consistent with constructivist, internal model theories proposing that early forward motor models generate predictions about likely phonemic units

  9. Cognitive Cues are More Compelling than Facial Cues in Determining Adults' Reactions towards Young Children

    Directory of Open Access Journals (Sweden)

    Carlos Hernández Blasi

    2015-04-01

    Full Text Available Previous research has demonstrated the significant influence that both children's facial features (Lorenz, 1943 and children's cognitive expressions (Bjorklund, Hernández Blasi, and Periss, 2010 have on adults' perception of young children. However, until now, these two types of cues have been studied independently. The present study contrasted these two types of cues simultaneously in a group of college students. To this purpose, we designed five experimental conditions (Consistent, Inconsistent, Mature-Face, Immature-Face, and Faces-Only in which we varied the presentation of a series of mature and immature vignettes (including two previously studied types of thinking: natural thinking and supernatural thinking associated with a series of more mature and less mature children's faces. Performance in these conditions was contrasted with data from a Vignettes-Only condition taken from Bjorklund et al. (2010. Results indicated that cognitive cues were more powerful than facial cues in determining adults' perceptions of young children. From an evolutionary developmental perspective, we suggest that facial cues are more relevant to adults during infancy than during the preschool period, when, with the development of spoken language, the verbalized expressions of children's thoughts become the principal cues influencing adults' perceptions, with facial cues playing a more secondary role.

  10. The proactive bilingual brain: Using interlocutor identity to generate predictions for language processing.

    Science.gov (United States)

    Martin, Clara D; Molnar, Monika; Carreiras, Manuel

    2016-05-13

    The present study investigated the proactive nature of the human brain in language perception. Specifically, we examined whether early proficient bilinguals can use interlocutor identity as a cue for language prediction, using an event-related potentials (ERP) paradigm. Participants were first familiarized, through video segments, with six novel interlocutors who were either monolingual or bilingual. Then, the participants completed an audio-visual lexical decision task in which all the interlocutors uttered words and pseudo-words. Critically, the speech onset started about 350 ms after the beginning of the video. ERP waves between the onset of the visual presentation of the interlocutors and the onset of their speech significantly differed for trials where the language was not predictable (bilingual interlocutors) and trials where the language was predictable (monolingual interlocutors), revealing that visual interlocutor identity can in fact function as a cue for language prediction, even before the onset of the auditory-linguistic signal.

  11. Feeling backwards? How temporal order in speech affects the time course of vocal emotion recognition

    Directory of Open Access Journals (Sweden)

    Simon eRigoulot

    2013-06-01

    Full Text Available Recent studies suggest that the time course for recognizing vocal expressions of basic emotion in speech varies significantly by emotion type, implying that listeners uncover acoustic evidence about emotions at different rates in speech (e.g., fear is recognized most quickly whereas happiness and disgust are recognized relatively slowly, Pell and Kotz, 2011. To investigate whether vocal emotion recognition is largely dictated by the amount of time listeners are exposed to speech or the position of critical emotional cues in the utterance, 40 English participants judged the meaning of emotionally-inflected pseudo-utterances presented in a gating paradigm, where utterances were gated as a function of their syllable structure in segments of increasing duration from the end of the utterance (i.e., gated ‘backwards’. Accuracy for detecting six target emotions in each gate condition and the mean identification point for each emotion in milliseconds were analyzed and compared to results from Pell & Kotz (2011. We again found significant emotion-specific differences in the time needed to accurately recognize emotions from speech prosody, and new evidence that utterance-final syllables tended to facilitate listeners’ accuracy in many conditions when compared to utterance-initial syllables. The time needed to recognize fear, anger, sadness, and neutral from speech cues was not influenced by how utterances were gated, although happiness and disgust were recognized significantly faster when listeners heard the end of utterances first. Our data provide new clues about the relative time course for recognizing vocally-expressed emotions within the 400-1200 millisecond time window, while highlighting that emotion recognition from prosody can be shaped by the temporal properties of speech.

  12. Teaching Speech Acts

    Directory of Open Access Journals (Sweden)

    Teaching Speech Acts

    2007-01-01

    Full Text Available In this paper I argue that pragmatic ability must become part of what we teach in the classroom if we are to realize the goals of communicative competence for our students. I review the research on pragmatics, especially those articles that point to the effectiveness of teaching pragmatics in an explicit manner, and those that posit methods for teaching. I also note two areas of scholarship that address classroom needs—the use of authentic data and appropriate assessment tools. The essay concludes with a summary of my own experience teaching speech acts in an advanced-level Portuguese class.

  13. PESQ Based Speech Intelligibility Measurement

    NARCIS (Netherlands)

    Beerends, J.G.; Buuren, R.A. van; Vugt, J.M. van; Verhave, J.A.

    2009-01-01

    Several measurement techniques exist to quantify the intelligibility of a speech transmission chain. In the objective domain, the Articulation Index [1] and the Speech Transmission Index STI [2], [3], [4], [5] have been standardized for predicting intelligibility. The STI uses a signal that contains

  14. Separating Underdetermined Convolutive Speech Mixtures

    DEFF Research Database (Denmark)

    Pedersen, Michael Syskind; Wang, DeLiang; Larsen, Jan

    2006-01-01

    a method for underdetermined blind source separation of convolutive mixtures. The proposed framework is applicable for separation of instantaneous as well as convolutive speech mixtures. It is possible to iteratively extract each speech signal from the mixture by combining blind source separation...

  15. Perceptual Learning of Interrupted Speech

    NARCIS (Netherlands)

    Benard, Michel Ruben; Başkent, Deniz

    2013-01-01

    The intelligibility of periodically interrupted speech improves once the silent gaps are filled with noise bursts. This improvement has been attributed to phonemic restoration, a top-down repair mechanism that helps intelligibility of degraded speech in daily life. Two hypotheses were investigated u

  16. Disentangling attention from action in the emotional spatial cueing task.

    Science.gov (United States)

    Mulckhuyse, Manon; Crombez, Geert

    2014-01-01

    In the emotional spatial cueing task, a peripheral cue--either emotional or non-emotional--is presented before target onset. A stronger cue validity effect with an emotional relative to a non-emotional cue (i.e., more efficient responding to validly cued targets relative to invalidly cued targets) is taken as an indication of emotional modulation of attentional processes. However, results from previous emotional spatial cueing studies are not consistent. Some studies find an effect at the validly cued location (shorter reaction times compared to a non-emotional cue), whereas other studies find an effect at the invalidly cued location (longer reaction times compared to a non-emotional cue). In the current paper, we explore which parameters affect emotional modulation of the cue validity effect in the spatial cueing task. Results from five experiments in healthy volunteers led to the conclusion that a threatening spatial cue did not affect attention processes but rather indicate that motor processes are affected. A possible mechanism might be that a strong aversive cue stimulus decreases reaction times by means of stronger action preparation. Consequently, in case of a spatially congruent response with the peripheral cue, a stronger cue validity effect could be obtained due to stronger response priming. The implications for future research are discussed.

  17. Speech Compression Using Multecirculerletet Transform

    Directory of Open Access Journals (Sweden)

    Sulaiman Murtadha

    2012-01-01

    Full Text Available Compressing the speech reduces the data storage requirements, leading to reducing the time of transmitting the digitized speech over long-haul links like internet. To obtain best performance in speech compression, wavelet transforms require filters that combine a number of desirable properties, such as orthogonality and symmetry.The MCT bases functions are derived from GHM bases function using 2D linear convolution .The fast computation algorithm methods introduced here added desirable features to the current transform. We further assess the performance of the MCT in speech compression application. This paper discusses the effect of using DWT and MCT (one and two dimension on speech compression. DWT and MCT performances in terms of compression ratio (CR, mean square error (MSE and peak signal to noise ratio (PSNR are assessed. Computer simulation results indicate that the two dimensions MCT offer a better compression ratio, MSE and PSNR than DWT.

  18. Cue-specific reactivity in experienced gamblers.

    Science.gov (United States)

    Wulfert, Edelgard; Maxson, Julie; Jardin, Bianca

    2009-12-01

    To examine whether gambling cue reactivity is cue-specific, 47 scratch-off lottery players and 47 horse race gamblers were presented with video clips of their preferred and nonpreferred modes of gambling, and two control stimuli including an exciting car race and a mental stressor task while heart rates, excitement, and urge to gamble were being measured. Heart rates for both groups of gamblers were highest to the mental stressor and did not differ in response to the other three cues. Excitement for both groups was highest in response to the action cues (horse race and car chase). Urge to gamble was significantly higher for each group to their preferred mode of gambling. A post hoc exploratory analysis comparing social gamblers (n = 54) and probable pathological gamblers (n = 40) revealed a similar pattern of responses. However, pathological gamblers reported overall significantly higher urges to gamble than social gamblers. As urges have been shown to play a pivotal role in addictive behaviors and relapse, the current findings may have implications for the development of gambling problems and relapse after successful treatment.

  19. Effects of similarity on environmental context cueing.

    Science.gov (United States)

    Smith, Steven M; Handy, Justin D; Angello, Genna; Manzano, Isabel

    2014-01-01

    Three experiments examined the prediction that context cues which are similar to study contexts can facilitate episodic recall, even if those cues are never seen before the recall test. Environmental context cueing effects have typically produced such small effect sizes that influences of moderating factors, such as the similarity between encoding and retrieval contexts, would be difficult to observe experimentally. Videos of environmental contexts, however, can be used to produce powerful context-dependent memory effects, particularly when only one memory target is associated with each video context, intentional item-context encoding is encouraged, and free recall tests are used. Experiment 1 showed that a not previously viewed video of the study context provided an effective recall cue, although it was not as effective as the originally viewed video context. Experiments 2 and 3 showed that videos of environments that were conceptually similar to encoding contexts (e.g., both were videos of ball field games) also cued recall, but not as well if the encoding contexts were given specific labels (e.g., "home run") incompatible with test contexts (e.g., a soccer scene). A fourth experiment that used incidental item-context encoding showed that video context reinstatement has a robust effect on paired associate memory, indicating that the video context reinstatement effect does not depend on interactive item-context encoding or free recall testing.

  20. Visual Cues and Listening Effort: Individual Variability

    Science.gov (United States)

    Picou, Erin M.; Ricketts, Todd A; Hornsby, Benjamin W. Y.

    2011-01-01

    Purpose: To investigate the effect of visual cues on listening effort as well as whether predictive variables such as working memory capacity (WMC) and lipreading ability affect the magnitude of listening effort. Method: Twenty participants with normal hearing were tested using a paired-associates recall task in 2 conditions (quiet and noise) and…

  1. Development of cue integration in human navigation.

    Science.gov (United States)

    Nardini, Marko; Jones, Peter; Bedford, Rachael; Braddick, Oliver

    2008-05-06

    Mammalian navigation depends both on visual landmarks and on self-generated (e.g., vestibular and proprioceptive) cues that signal the organism's own movement [1-5]. When these conflict, landmarks can either reset estimates of self-motion or be integrated with them [6-9]. We asked how humans combine these information sources and whether children, who use both from a young age [10-12], combine them as adults do. Participants attempted to return an object to its original place in an arena when given either visual landmarks only, nonvisual self-motion information only, or both. Adults, but not 4- to 5-year-olds or 7- to 8-year-olds, reduced their response variance when both information sources were available. In an additional "conflict" condition that measured relative reliance on landmarks and self-motion, we predicted behavior under two models: integration (weighted averaging) of the cues and alternation between them. Adults' behavior was predicted by integration, in which the cues were weighted nearly optimally to reduce variance, whereas children's behavior was predicted by alternation. These results suggest that development of individual spatial-representational systems precedes development of the capacity to combine these within a common reference frame. Humans can integrate spatial cues nearly optimally to navigate, but this ability depends on an extended developmental process.

  2. Early syllabic segmentation of fluent speech by infants acquiring French.

    Directory of Open Access Journals (Sweden)

    Louise Goyet

    Full Text Available Word form segmentation abilities emerge during the first year of life, and it has been proposed that infants initially rely on two types of cues to extract words from fluent speech: Transitional Probabilities (TPs and rhythmic units. The main goal of the present study was to use the behavioral method of the Headturn Preference Procedure (HPP to investigate again rhythmic segmentation of syllabic units by French-learning infants at the onset of segmentation abilities (around 8 months given repeated failure to find syllabic segmentation at such a young age. The second goal was to explore the interaction between the use of TPs and syllabic units for segmentation by French-learning infants. The rationale was that decreasing TP cues around target syllables embedded in bisyllabic words would block bisyllabic word segmentation and facilitate the observation of syllabic segmentation. In Experiments 1 and 2, infants were tested in a condition of moderate TP decrease; no evidence of either syllabic or bisyllabic word segmentation was found. In Experiment 3, infants were tested in a condition of more marked TP decrease, and a novelty syllabic segmentation effect was observed. Therefore, the present study first establishes early syllabic segmentation in French-learning infants, bringing support from a syllable-based language to the proposal that rhythmic units are used at the onset of segmentation abilities. Second, it confirms that French-learning infants are sensitive to TP cues. Third, it demonstrates that they are sensitive to the relative weight of TP and rhythmic cues, explaining why effects of syllabic segmentation are not observed in context of high TPs. These findings are discussed in relation to theories of word segmentation bootstrapping, and the larger debate about statistically- versus prosodically-based accounts of early language acquisition.

  3. Early syllabic segmentation of fluent speech by infants acquiring French.

    Science.gov (United States)

    Goyet, Louise; Nishibayashi, Léo-Lyuki; Nazzi, Thierry

    2013-01-01

    Word form segmentation abilities emerge during the first year of life, and it has been proposed that infants initially rely on two types of cues to extract words from fluent speech: Transitional Probabilities (TPs) and rhythmic units. The main goal of the present study was to use the behavioral method of the Headturn Preference Procedure (HPP) to investigate again rhythmic segmentation of syllabic units by French-learning infants at the onset of segmentation abilities (around 8 months) given repeated failure to find syllabic segmentation at such a young age. The second goal was to explore the interaction between the use of TPs and syllabic units for segmentation by French-learning infants. The rationale was that decreasing TP cues around target syllables embedded in bisyllabic words would block bisyllabic word segmentation and facilitate the observation of syllabic segmentation. In Experiments 1 and 2, infants were tested in a condition of moderate TP decrease; no evidence of either syllabic or bisyllabic word segmentation was found. In Experiment 3, infants were tested in a condition of more marked TP decrease, and a novelty syllabic segmentation effect was observed. Therefore, the present study first establishes early syllabic segmentation in French-learning infants, bringing support from a syllable-based language to the proposal that rhythmic units are used at the onset of segmentation abilities. Second, it confirms that French-learning infants are sensitive to TP cues. Third, it demonstrates that they are sensitive to the relative weight of TP and rhythmic cues, explaining why effects of syllabic segmentation are not observed in context of high TPs. These findings are discussed in relation to theories of word segmentation bootstrapping, and the larger debate about statistically- versus prosodically-based accounts of early language acquisition.

  4. PCA-Based Speech Enhancement for Distorted Speech Recognition

    Directory of Open Access Journals (Sweden)

    Tetsuya Takiguchi

    2007-09-01

    Full Text Available We investigated a robust speech feature extraction method using kernel PCA (Principal Component Analysis for distorted speech recognition. Kernel PCA has been suggested for various image processing tasks requiring an image model, such as denoising, where a noise-free image is constructed from a noisy input image. Much research for robust speech feature extraction has been done, but it remains difficult to completely remove additive or convolution noise (distortion. The most commonly used noise-removal techniques are based on the spectraldomain operation, and then for speech recognition, the MFCC (Mel Frequency Cepstral Coefficient is computed, where DCT (Discrete Cosine Transform is applied to the mel-scale filter bank output. This paper describes a new PCA-based speech enhancement algorithm using kernel PCA instead of DCT, where the main speech element is projected onto low-order features, while the noise or distortion element is projected onto high-order features. Its effectiveness is confirmed by word recognition experiments on distorted speech.

  5. The (unclear effects of invalid retro-cues.

    Directory of Open Access Journals (Sweden)

    Marcel eGressmann

    2016-03-01

    Full Text Available Studies with the retro-cue paradigm have shown that validly cueing objects in visual working memory long after encoding can still benefit performance on subsequent change detection tasks. With regard to the effects of invalid cues, the literature is less clear. Some studies reported costs, others did not. We here revisit two recent studies that made interesting suggestions concerning invalid retro-cues: One study suggested that costs only occur for larger set sizes, and another study suggested that inclusion of invalid retro-cues diminishes the retro-cue benefit. New data from one experiment and a reanalysis of published data are provided to address these conclusions. The new data clearly show costs (and benefits that were independent of set size, and the reanalysis suggests no influence of the inclusion of invalid retro-cues on the retro-cue benefit. Thus, previous interpretations may be taken with some caution at present.

  6. Who wants to be a blabbermouth? Prosodic cues to correct answers in the WWTBAM quiz show scenario

    DEFF Research Database (Denmark)

    Niebuhr, Oliver

    2016-01-01

    Starting from previous research on the prosodic patterns of emotion, psychological stress and deceptive speech, the paper investigates whether quizmasters convey telltale cues to correct answers in the popular four alternatives (a/b/c/d) framework of "Who Wants to Be a Millionaire?" (WWTBAM). We...... telltale signs of correct answers. These telltale signs were consistent across all quizmasters, but complex insofar as they differed across question positions (a/b/c/d) could not be found in the introductory letters. Cues to correct answers involved timing and range of F0 and intensity patterns, speaking...... rate, and degree of final lengthening; pause durations between answers and introductory letters were irrelevant. The results are discussed with respect to their implications for real quizshows and the elicitation of emotions and stress in the lab....

  7. The ability of left- and right-hemisphere damaged individuals to produce prosodic cues to disambiguate Korean idiomatic sentences

    Directory of Open Access Journals (Sweden)

    Seung-Yun Yang

    2014-05-01

    Three speech language pathologists with training in phonetics participated as raters for vocal qualities. Nasality was significantly salient vocal quality of idiomatic utterances. Conclusion The findings support that (1 LHD negatively affected the production of durational cues and RHD negatively affected the production of fundamental frequency cues in idiomatic-literal contrasts; (2 healthy listeners successfully identified idiomatic and literal versions of ambiguous sentences produced by healthy speakers but not by RHD speakers; (3 Productions in brain-damaged participants approximated HC’s measures in the repetition tasks, but not in the elicitation tasks; (4 Nasal voice quality was judged to be associated with idiomatic utterances in all groups of participants. Findings agree with previous studies indicating HC’s abilities to discriminate literal versus idiomatic meanings in ditropically ambiguous idioms, as well as deficient processing of pitch production and impaired pragmatic ability in RHD.

  8. The function of consciousness in multisensory integration.

    Science.gov (United States)

    Palmer, Terry D; Ramsey, Ashley K

    2012-12-01

    The function of consciousness was explored in two contexts of audio-visual speech, cross-modal visual attention guidance and McGurk cross-modal integration. Experiments 1, 2, and 3 utilized a novel cueing paradigm in which two different flash suppressed lip-streams cooccured with speech sounds matching one of these streams. A visual target was then presented at either the audio-visually congruent or incongruent location. Target recognition differed for the congruent versus incongruent trials, and the nature of this difference depended on the probabilities of a target appearing at these respective locations. Thus, even though the lip-streams were never consciously perceived, they were nevertheless meaningfully integrated with the consciously perceived sounds, and participants learned to guide their attention according to statistical regularities between targets and these unconsciously perceived cross-modal cues. In Experiment 4, McGurk stimuli were presented in which the lip-streams were either flash suppressed (4a) or unsuppressed (4b), and the McGurk effect was found to vanish under conditions of flash suppression. Overall, these results suggest a simple yet fundamental principle regarding the function of consciousness in multisensory integration - cross-modal effects can occur in the absence of consciousness, but the influencing modality must be consciously perceived for its information to cross modalities.

  9. Predicting the Attitude Flow in Dialogue Based on Multi-Modal Speech Cues

    DEFF Research Database (Denmark)

    Juel Henrichsen, Peter; Allwood, Jens

    2013-01-01

    We present our experiments on attitude detection based on annotated multi-modal dialogue data1. Our long-term goal is to establish a computational model able to predict the attitudinal patterns in humanhuman dialogue. We believe, such prediction algorithms are useful tools in the pursuit...

  10. Speech Presentation Cues Moderate Frontal EEG Asymmetry in Socially Withdrawn Young Adults

    Science.gov (United States)

    Cole, Claire; Zapp, Daniel J.; Nelson, S. Katherine; Perez-Edgar, Koraly

    2012-01-01

    Socially withdrawn individuals display solitary behavior across wide contexts with both unfamiliar and familiar peers. This tendency to withdraw may be driven by either past or anticipated negative social encounters. In addition, socially withdrawn individuals often exhibit right frontal electroencephalogram (EEG) asymmetry at baseline and when…

  11. Hate Speech/Free Speech: Using Feminist Perspectives To Foster On-Campus Dialogue.

    Science.gov (United States)

    Cornwell, Nancy; Orbe, Mark P.; Warren, Kiesha

    1999-01-01

    Explores the complex issues inherent in the tension between hate speech and free speech, focusing on the phenomenon of hate speech on college campuses. Describes the challenges to hate speech made by critical race theorists and explains how a feminist critique can reorient the parameters of hate speech. (SLD)

  12. The Stylistic Analysis of Public Speech

    Institute of Scientific and Technical Information of China (English)

    李龙

    2011-01-01

    Public speech is a very important part in our daily life.The ability to deliver a good public speech is something we need to learn and to have,especially,in the service sector.This paper attempts to analyze the style of public speech,in the hope of providing inspiration to us whenever delivering such a speech.

  13. Linguistic Units and Speech Production Theory.

    Science.gov (United States)

    MacNeilage, Peter F.

    This paper examines the validity of the concept of linguistic units in a theory of speech production. Substantiating data are drawn from the study of the speech production process itself. Secondarily, an attempt is made to reconcile the postulation of linguistic units in speech production theory with their apparent absence in the speech signal.…

  14. Automated Speech Rate Measurement in Dysarthria

    Science.gov (United States)

    Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

    2015-01-01

    Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…

  15. Coevolution of Human Speech and Trade

    NARCIS (Netherlands)

    Horan, R.D.; Bulte, E.H.; Shogren, J.F.

    2008-01-01

    We propose a paleoeconomic coevolutionary explanation for the origin of speech in modern humans. The coevolutionary process, in which trade facilitates speech and speech facilitates trade, gives rise to multiple stable trajectories. While a `trade-speech¿ equilibrium is not an inevitable outcome for

  16. Connected Speech Processes in Australian English.

    Science.gov (United States)

    Ingram, J. C. L.

    1989-01-01

    Explores the role of Connected Speech Processes (CSP) in accounting for sociolinguistically significant dimensions of speech variation, and presents initial findings on the distribution of CSPs in the speech of Australian adolescents. The data were gathered as part of a wider survey of speech of Brisbane school children. (Contains 26 references.)…

  17. Cortical encoding and neurophysiological tracking of intensity and pitch cues signaling English stress patterns in native and nonnative speakers.

    Science.gov (United States)

    Chung, Wei-Lun; Bidelman, Gavin M

    2016-01-01

    We examined cross-language differences in neural encoding and tracking of intensity and pitch cues signaling English stress patterns. Auditory mismatch negativities (MMNs) were recorded in English and Mandarin listeners in response to contrastive English pseudowords whose primary stress occurred either on the first or second syllable (i.e., "nocTICity" vs. "NOCticity"). The contrastive syllable stress elicited two consecutive MMNs in both language groups, but English speakers demonstrated larger responses to stress patterns than Mandarin speakers. Correlations between the amplitude of ERPs and continuous changes in the running intensity and pitch of speech assessed how well each language group's brain activity tracked these salient acoustic features of lexical stress. We found that English speakers' neural responses tracked intensity changes in speech more closely than Mandarin speakers (higher brain-acoustic correlation). Findings demonstrate more robust and precise processing of English stress (intensity) patterns in early auditory cortical responses of native relative to nonnative speakers.

  18. Configural Effect in Multiple-Cue Probability Learning

    Science.gov (United States)

    Edgell, Stephen E.; Castellan, N. John, Jr.

    1973-01-01

    In a nonmetric multiple-cue probability learning task involving 2 binary cue dimensions, it was found that Ss can learn to use configural or pattern information (a) when only the configural information is relevant, and in addition to the configural information, one or both of the cue dimensions are relevant. (Author/RK)

  19. Effects of Typographical Cues on Reading and Recall of Text.

    Science.gov (United States)

    Lorch, Robert F., Jr.; And Others

    1995-01-01

    Effects of typographical cues on text memory were investigated in 2 experiments involving 204 college students. Findings demonstrated that effects of typographical cues on memory were mediated by effects on attention during reading. Typographical cues appeared to increase attention only to the signaled content, resulting in better memory. (SLD)

  20. Responsiveness of Nigerian Students to Pictorial Depth Cues.

    Science.gov (United States)

    Evans, G. S.; Seddon, G. M.

    1978-01-01

    Three groups of Nigerian high school and college students were tested for response to four pictorial depth cues. Students had more difficulty with cues concerning the relative size of objects and the foreshortening of straight lines than with cues involving overlap of lines and distortion of the angles between lines. (Author/JEG)

  1. ARMA Modelling for Whispered Speech

    Institute of Scientific and Technical Information of China (English)

    Xue-li LI; Wei-dong ZHOU

    2010-01-01

    The Autoregressive Moving Average (ARMA) model for whispered speech is proposed. Compared with normal speech, whispered speech has no fundamental frequency because of the glottis being semi-opened and turbulent flow being created, and formant shifting exists in the lower frequency region due to the narrowing of the tract in the false vocal fold regions and weak acoustic coupling with the subglottal system. Analysis shows that the effect of the subglottal system is to introduce additional pole-zero pairs into the vocal tract transfer function. Theoretically, the method based on an ARMA process is superior to that based on an AR process in the spectral analysis of the whispered speech. Two methods, the least squared modified Yule-Walker likelihood estimate (LSMY) algorithm and the Frequency-Domain Steiglitz-Mcbride (FDSM) algorithm, are applied to the ARMA model for the whispered speech. The performance evaluation shows that the ARMA model is much more appropriate for representing the whispered speech than the AR model, and the FDSM algorithm provides a more accurate estimation of the whispered speech spectral envelope than the LSMY algorithm with higher computational complexity.

  2. Speech recognition from spectral dynamics

    Indian Academy of Sciences (India)

    Hynek Hermansky

    2011-10-01

    Information is carried in changes of a signal. The paper starts with revisiting Dudley’s concept of the carrier nature of speech. It points to its close connection to modulation spectra of speech and argues against short-term spectral envelopes as dominant carriers of the linguistic information in speech. The history of spectral representations of speech is briefly discussed. Some of the history of gradual infusion of the modulation spectrum concept into Automatic recognition of speech (ASR) comes next, pointing to the relationship of modulation spectrum processing to wellaccepted ASR techniques such as dynamic speech features or RelAtive SpecTrAl (RASTA) filtering. Next, the frequency domain perceptual linear prediction technique for deriving autoregressive models of temporal trajectories of spectral power in individual frequency bands is reviewed. Finally, posterior-based features, which allow for straightforward application of modulation frequency domain information, are described. The paper is tutorial in nature, aims at a historical global overview of attempts for using spectral dynamics in machine recognition of speech, and does not always provide enough detail of the described techniques. However, extensive references to earlier work are provided to compensate for the lack of detail in the paper.

  3. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference

    Science.gov (United States)

    Lee, Byeongwook; Cho, Kwang-Hyun

    2016-11-01

    Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment. Conventional segmentation techniques primarily segment speech using a fixed frame size for computational simplicity. However, this approach is insufficient for capturing the quasi-regular structure of speech, which causes substantial recognition failure in noisy environments. How does the brain handle quasi-regular structured speech and maintain high recognition performance under any circumstance? Recent neurophysiological studies have suggested that the phase of neuronal oscillations in the auditory cortex contributes to accurate speech recognition by guiding speech segmentation into smaller units at different timescales. A phase-locked relationship between neuronal oscillation and the speech envelope has recently been obtained, which suggests that the speech envelope provides a foundation for multi-timescale speech segmental information. In this study, we quantitatively investigated the role of the speech envelope as a potential temporal reference to segment speech using its instantaneous phase information. We evaluated the proposed approach by the achieved information gain and recognition performance in various noisy environments. The results indicate that the proposed segmentation scheme not only extracts more information from speech but also provides greater robustness in a recognition test.

  4. INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM

    Directory of Open Access Journals (Sweden)

    J. SANGEETHA

    2015-02-01

    Full Text Available This paper provides an interface between the machine translation and speech synthesis system for converting English speech to Tamil text in English to Tamil speech to speech translation system. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. Still speech synthesis system has not yet been measured. In this paper, we focus on integration of machine translation and speech synthesis, and report a subjective evaluation to investigate the impact of speech synthesis, machine translation and the integration of machine translation and speech synthesis components. Here we implement a hybrid machine translation (combination of rule based and statistical machine translation and concatenative syllable based speech synthesis technique. In order to retain the naturalness and intelligibility of synthesized speech Auto Associative Neural Network (AANN prosody prediction is used in this work. The results of this system investigation demonstrate that the naturalness and intelligibility of the synthesized speech are strongly influenced by the fluency and correctness of the translated text.

  5. A Survey on Speech Enhancement Methodologies

    Directory of Open Access Journals (Sweden)

    Ravi Kumar. K

    2016-12-01

    Full Text Available Speech enhancement is a technique which processes the noisy speech signal. The aim of speech enhancement is to improve the perceived quality of speech and/or to improve its intelligibility. Due to its vast applications in mobile telephony, VOIP, hearing aids, Skype and speaker recognition, the challenges in speech enhancement have grown over the years. It is more challenging to suppress back ground noise that effects human communication in noisy environments like airports, road works, traffic, and cars. The objective of this survey paper is to outline the single channel speech enhancement methodologies used for enhancing the speech signal which is corrupted with additive background noise and also discuss the challenges and opportunities of single channel speech enhancement. This paper mainly focuses on transform domain techniques and supervised (NMF, HMM speech enhancement techniques. This paper gives frame work for developments in speech enhancement methodologies

  6. Speech enhancement theory and practice

    CERN Document Server

    Loizou, Philipos C

    2013-01-01

    With the proliferation of mobile devices and hearing devices, including hearing aids and cochlear implants, there is a growing and pressing need to design algorithms that can improve speech intelligibility without sacrificing quality. Responding to this need, Speech Enhancement: Theory and Practice, Second Edition introduces readers to the basic problems of speech enhancement and the various algorithms proposed to solve these problems. Updated and expanded, this second edition of the bestselling textbook broadens its scope to include evaluation measures and enhancement algorithms aimed at impr

  7. Computational neuroanatomy of speech production.

    Science.gov (United States)

    Hickok, Gregory

    2012-01-05

    Speech production has been studied predominantly from within two traditions, psycholinguistics and motor control. These traditions have rarely interacted, and the resulting chasm between these approaches seems to reflect a level of analysis difference: whereas motor control is concerned with lower-level articulatory control, psycholinguistics focuses on higher-level linguistic processing. However, closer examination of both approaches reveals a substantial convergence of ideas. The goal of this article is to integrate psycholinguistic and motor control approaches to speech production. The result of this synthesis is a neuroanatomically grounded, hierarchical state feedback control model of speech production.

  8. Steganalysis of recorded speech

    Science.gov (United States)

    Johnson, Micah K.; Lyu, Siwei; Farid, Hany

    2005-03-01

    Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.

  9. Speech recovery device

    Energy Technology Data Exchange (ETDEWEB)

    Frankle, Christen M.

    2004-04-20

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  10. Speech recovery device

    Energy Technology Data Exchange (ETDEWEB)

    Frankle, Christen M.

    2000-10-19

    There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

  11. Join Cost for Unit Selection Speech Synthesis

    OpenAIRE

    Vepa, Jithendra

    2004-01-01

    Undoubtedly, state-of-the-art unit selection-based concatenative speech systems produce very high quality synthetic speech. this is due to a large speech database containing many instances of each speech unit, with a varied and natural distribution of prosodic and spectral characteristics. the join cost, which measures how well two units can be joined together is one of the main criteria for selecting appropriate units from this large speech database. The ideal join cost is one that measur...

  12. Preventive measures in speech and language therapy

    OpenAIRE

    Slokar, Polona

    2014-01-01

    Preventive care plays an important role in speech and language therapy. Through training, a speech and language therapist informs the expert and the general public about his efforts in the field of feeding, speech and language development, as well as about the missing elements that may appear in relation to communication and feeding. A speech and language therapist is also responsible for early detection of irregularities and of those factors which affect speech and language development. To a...

  13. Speech distortion measure based on auditory properties

    Institute of Scientific and Technical Information of China (English)

    CHEN Guo; HU Xiulin; ZHANG Yunyu; ZHU Yaoting

    2000-01-01

    The Perceptual Spectrum Distortion (PSD), based on auditory properties of human being, is presented to measure speech distortion. The PSD measure calculates the speech distortion distance by simulating the auditory properties of human being and converting short-time speech power spectrum to auditory perceptual spectrum. Preliminary simulative experiments in comparison with the Itakura measure have been done. The results show that the PSD measure is a perferable speech distortion measure and more consistent with subjective assessment of speech quality.

  14. Speech of people with autism: Echolalia and echolalic speech

    OpenAIRE

    Błeszyński, Jacek Jarosław

    2013-01-01

    Speech of people with autism is recognised as one of the basic diagnostic, therapeutic and theoretical problems. One of the most common symptoms of autism in children is echolalia, described here as being of different types and severity. This paper presents the results of studies into different levels of echolalia, both in normally developing children and in children diagnosed with autism, discusses the differences between simple echolalia and echolalic speech - which can be considered to b...

  15. Introspective responses to cues and motivation to reduce cigarette smoking influence state and behavioral responses to cue exposure.

    Science.gov (United States)

    Veilleux, Jennifer C; Skinner, Kayla D

    2016-09-01

    In the current study, we aimed to extend smoking cue-reactivity research by evaluating delay discounting as an outcome of cigarette cue exposure. We also separated introspection in response to cues (e.g., self-reporting craving and affect) from cue exposure alone, to determine if introspection changes behavioral responses to cigarette cues. Finally, we included measures of quit motivation and resistance to smoking to assess motivational influences on cue exposure. Smokers were invited to participate in an online cue-reactivity study. Participants were randomly assigned to view smoking images or neutral images, and were randomized to respond to cues with either craving and affect questions (e.g., introspection) or filler questions. Following cue exposure, participants completed a delay discounting task and then reported state affect, craving, and resistance to smoking, as well as an assessment of quit motivation. We found that after controlling for trait impulsivity, participants who introspected on craving and affect showed higher delay discounting, irrespective of cue type, but we found no effect of response condition on subsequent craving (e.g., craving reactivity). We also found that motivation to quit interacted with experimental conditions to predict state craving and state resistance to smoking. Although asking about craving during cue exposure did not increase later craving, it resulted in greater delaying of discounted rewards. Overall, our findings suggest the need to further assess the implications of introspection and motivation on behavioral outcomes of cue exposure.

  16. Effects of compatible versus competing rhythmic grouping on errors and timing variability in speech.

    Science.gov (United States)

    Katsika, Argyro; Shattuck-Hufnagel; Mooshammer, Christine; Tiede, Mark; Goldstein, Louis

    2014-12-01

    In typical speech words are grouped into prosodic constituents. This study investigates how such grouping interacts with segmental sequencing patterns in the production of repetitive word sequences. We experimentally manipulated grouping behavior using a rhythmic repetition task to elicit speech for perceptual and acoustic analysis to test the hypothesis that prosodic structure and patterns of segmental alternation can interact in the production planning process. Talkers produced alternating sequences of two words (top cop) and non-alternating controls (top top and cop cop), organized into six-word sequences. These sequences were further organized into prosodic groupings of three two-word pairs or two three-word triples by means of visual cues and audible metronome clicks. Results for six speakers showed more speech errors in triples, that is, when pairwise word alternation was mismatched with prosodic subgrouping in triples. This result suggests that the planning process for the segmental units of an utterance interacts with the planning process for the prosodic grouping of its words. It also highlights the importance of extending commonly used experimental speech elicitation methods to include more complex prosodic patterns, in order to evoke the kinds of interaction between prosodic structure and planning that occur in the production of lexical forms in continuous communicative speech.

  17. The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

    Directory of Open Access Journals (Sweden)

    Fang Liu

    Full Text Available Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.

  18. How Auditory Information Influences Volitional Control in Binocular Rivalry: Modulation of a Top-Down Attentional Effect

    Directory of Open Access Journals (Sweden)

    Manuel Vidal

    2011-10-01

    / although it could be equivalent to promoting lips uttering /ada/. Our findings suggest that at higher-level processing stages, auditory cues do interact with the perceptual decision and with the dominance mechanism involved during visual rivalry. These results are discussed according to the individual differences in the audio-visual integration for speech perception. We propose a descriptive model based on known characteristics of binocular rivalry, which accounts for most of these findings. In this model, the top-down attentional control (volition is modulated by lower-level audio-visual matching.

  19. Cue-Based Feeding in the NICU.

    Science.gov (United States)

    Whetten, Cynthia H

    In NICU settings, caring for neonates born as early as 23 weeks gestation presents unique challenges for caregivers. Traditionally, preterm infants who are learning to orally feed take a predetermined volume of breast milk or formula at scheduled intervals, regardless of their individual ability to coordinate each feeding. Evidence suggests that this volume-driven feeding model should be replaced with a more individualized, developmentally appropriate practice. Evidence from the literature suggests that preterm infants fed via cue-based feeding reach full oral feeding status faster than their volume-feeding counterparts and have shorter lengths of stay in the hospital. Changing practice to infant-driven or cue-based feedings in the hospital setting requires staff education, documentation, and team-based communication.

  20. Cleaning MEG artifacts using external cues.

    Science.gov (United States)

    Tal, I; Abeles, M

    2013-07-15

    Using EEG, ECoG, MEG, and microelectrodes to record brain activity is prone to multiple artifacts. The main power line (mains line), video equipment, mechanical vibrations and activities outside the brain are the most common sources of artifacts. MEG amplitudes are low, and even small artifacts distort recordings. In this study, we show how these artifacts can be efficiently removed by recording external cues during MEG recordings. These external cues are subsequently used to register the precise times or spectra of the artifacts. The results indicate that these procedures preserve both the spectra and the time domain wave-shapes of the neuromagnetic signal, while successfully reducing the contribution of the artifacts to the target signals without reducing the rank of the data.